检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

510 篇 会议
49 篇 期刊文献
1 册 图书

馆藏范围

560 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

464 篇 工学
- 413 篇 软件工程
- 359 篇 计算机科学与技术...
- 21 篇 电子科学与技术（可...
- 21 篇 控制科学与工程
- 9 篇 信息与通信工程
- 5 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 3 篇 生物医学工程（可授...
- 2 篇 力学（可授工学、理...
- 2 篇 建筑学
- 2 篇 土木工程
- 2 篇 农业工程
- 1 篇 冶金工程
87 篇 理学
- 78 篇 数学
- 12 篇 系统科学
- 7 篇 统计学（可授理学、...
- 4 篇 生物学
- 2 篇 物理学
- 2 篇 化学
- 1 篇 大气科学
- 1 篇 地质学
26 篇 管理学
- 19 篇 管理科学与工程(可...
- 14 篇 工商管理
- 7 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 教育学
- 2 篇 教育学
2 篇 农学
- 2 篇 作物学

主题

74 篇 performance
72 篇 parallel process...
62 篇 parallel program...
44 篇 algorithms
42 篇 languages
35 篇 design
26 篇 parallel algorit...
25 篇 gpu
14 篇 computer program...
13 篇 parallel computi...
13 篇 parallel
12 篇 experimentation
12 篇 measurement
10 篇 mpi
10 篇 transactional me...
9 篇 graphics process...
9 篇 theory
9 篇 concurrency
8 篇 synchronization
7 篇 multithreading

机构

13 篇 carnegie mellon ...
7 篇 indiana univ blo...
4 篇 univ wisconsin d...
4 篇 univ chinese aca...
4 篇 univ illinois ur...
4 篇 swiss fed inst t...
4 篇 mit csail united...
4 篇 shanghai jiao to...
4 篇 mit comp sci & a...
4 篇 rice university
4 篇 univ rochester r...
4 篇 purdue univ w la...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 massachusetts in...
3 篇 ohio state univ ...
3 篇 carnegie mellon ...
3 篇 inria rocquencou...
3 篇 itmo univ st pet...
3 篇 tsinghua univ pe...

作者

9 篇 chen haibo
8 篇 hoefler torsten
8 篇 blelloch guy e.
8 篇 agrawal kunal
7 篇 garland michael
7 篇 leiserson charle...
6 篇 sun yihan
6 篇 zhai jidong
6 篇 shun julian
6 篇 mellor-crummey j...
5 篇 rainey mike
5 篇 miller barton p.
5 篇 krishnamoorthy s...
5 篇 tsigas philippas
5 篇 padua david
5 篇 nikolopoulos dim...
5 篇 lam monica s.
5 篇 valero mateo
5 篇 scott michael l.
4 篇 taura kenjiro

语言

521 篇 英文
39 篇 其他

检索条件"任意字段=2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 560 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Tiles: A new language mechanism for heterogeneous parallelism 2015

Tiles: A new language mechanism for heterogeneous parallelis...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Chen, Yifeng Cui, Xiang Mei, Hong HCST Key Lab. School of EECS Peking University Beijing100871 China

ISBN: (纸本)9781450332057

This paper studies the essence of heterogeneity from the perspective of language mechanism design. The proposed mechanism, called tiles, is a program construct that bridges two relative levels of computation: an outer level of source data in larger, slower or more distributed memory and an inner level of data blocks in smaller, faster or more localized memory.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An OpenACC-based unified programming model for multi-accelerator systems 2015

An OpenACC-based unified programming model for multi-acceler...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Kim, Jungwon Lee, Seyong Vetter, Jeffrey S. Oak Ridge National Laboratory United States Georgia Institute of Technology United States

ISBN: (纸本)9781450332057

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.

关键词： Supercomputers

来源：评论

学校读者我要写书评

暂无评论

The lock-free k-LSM relaxed priority queue 2015

The lock-free k-LSM relaxed priority queue

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Wimmer, Martin Gruber, Jakob Träff, Jesper Larsson Tsigas, Philippas Faculty of Informatics Parallel Computing Vienna University of Technology Vienna/Wien1040 Austria Computer Science and Engineering Chalmers University of Technology Göteborg412 96 Sweden

ISBN: (纸本)9781450332057

We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ+1 smallest keys instead of only a minimal one, where ρ is a parameter that can be configured at runtime. It is built from a logarithmic number of sorted arrays, similar to log-structured merge-trees (LSM). For keys added and removed by the same thread the behavior is identical to a non-relaxed priority queue. We compare to state-of-the-art lock-free priority queues with both relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A library for portable and composable data locality optimizations for NUMA systems 2015

A library for portable and composable data locality optimiza...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Majo, Zoltan Gross, Thomas R. Department of Computer Science ETH Zurich Switzerland

ISBN: (纸本)9781450332057

Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Optimizing NUMA memory system performance is difficult and costly for three principal reasons: (1) today's programming languages/libraries have no explicit support for NUMA systems, (2) NUMA optimizations are not portable, and (3) optimizations are not composable (i.e., they can become ineffective or worsen performance in environments that support composable parallel software). This paper presents TBB-NUMA, a parallel programming library based on Intel Threading Building Blocks (TBB) that supports portable and composable NUMA-aware programming. TBB-NUMA provides a model of task affinity that captures a programmer's insights on mapping tasks to resources. NUMA-awareness affects all layers of the library (i.e., resource management, task scheduling, and high-level parallel algorithm templates) and requires close coupling between all these layers. Optimizations implemented with TBB-NUMA (for a set of standard benchmark programs) result in up to 44% performance improvement over standard TBB, but more important, optimized programs are portable across different NUMA architectures and preserve data locality also when composed with other parallel computations. Copyright 2015 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A collection-oriented programming model for performance portability 2015

A collection-oriented programming model for performance port...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Muralidharan, Saurav Garland, Michael Catanzaro, Bryan Sidelnik, Albert Hall, Mary University of Utah Salt Lake CityUT United States NVIDIA Corporation Santa ClaraCA United States Baidu Inc. SunnyvaleCA United States

ISBN: (纸本)9781450332057

This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operators. Surge exposes a code generation interface, decoupled from the core computation, that enables programmers and autotuners to easily generate multiple implementations of the same computation on various parallel architectures such as multi-core CPUs and GPUs. By decoupling computations from architecture-specific implementation, programmers can target multiple architectures more easily, and generate a search space that facilitates optimization and customization for specific architectures. We express in Surge four real-world benchmarks from domains such as sparse linear-algebra and machine learning and from the same performance-portable specification, generate OpenMP and CUDA C++ implementations. Surge generates efficient, scalable code which achieves up to 1.32x speedup over handcrafted, well-optimized CUDA code.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

A framework for practical parallel fast matrix multiplication 2015

A framework for practical parallel fast matrix multiplicatio...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Benson, Austin R. Ballard, Grey Institute for Computational and Mathematical Engineering Stanford University StanfordCA United States Sandia National Laboratories LivermoreCA United States

ISBN: (纸本)9781450332057

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes. Furthermore, we show that the best choice of fast algorithm depends not only on the size of the matrices but also the shape. We develop a code generation tool to automatically implement multiple sequential and shared-memory parallel variants of each fast algorithm, including our novel parallelization scheme. This allows us to rapidly benchmark over 20 fast algorithms on several problem sizes. Furthermore, we discuss a number of practical implementation issues for these algorithms on shared-memory machines that can direct further research on making fast algorithms practical. Copyright 2015 acm.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

A programming model and runtime system for significance-aware energy-efficient computing 2015

A programming model and runtime system for significance-awar...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Vassiliadis, Vassilis Parasyris, Konstantinos Chalios, Charalambos Antonopoulos, Christos D. Lalis, Spyros Bellas, Nikolaos Vandierendonck, Hans Nikolopoulos, Dimitrios S. Electrical and Computer Eng. Dept. University of Thessaly Greece Greece Queen's University Belfast United Kingdom

ISBN: (纸本)9781450332057

We introduce a task-based programming model and runtime system that exploit the observation that not all parts of a program are equally significant for the accuracy of the end-result, in order to trade off the quality of program outputs for increased energy-efficiency. This is done in a structured and flexible way, allowing for easy exploitation of different points in the quality/energy space, without adversely affecting application performance. The runtime system can apply a number of different policies to decide whether it will execute less-significant tasks accurately or approximately. The experimental evaluation indicates that our system can achieve an energy reduction of up to 83% compared with a fully accurate execution and up to 35% compared with an approximate version employing loop perforation. At the same time, our approach always results in graceful quality degradation.

关键词： Energy efficiency

来源：评论

学校读者我要写书评

暂无评论

Combining phase identification and statistic modeling for automated parallel benchmark generation 2015

Combining phase identification and statistic modeling for au...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Jin, Ye Liu, Mingliang Ma, Xiaosong Liu, Qing Logan, Jeremy Podhorszki, Norbert Choi, Jong Youl Klasky, Scott NCSU United States QCRI Qatar Oak Ridge National Lab. United States

ISBN: (纸本)9781450332057

parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

Barrier elision for production parallel programs 2015

Barrier elision for production parallel programs

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Chabbi, Milind Lavrijsen, Wim De Jong, Wibe Sen, Koushik Mellor-Crummey, John Iancu, Costin Rice University United States Lawrence Berkeley National Laboratory United States UC Berkeley United States

ISBN: (纸本)9781450332057

Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts;after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem - a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack. Copyright 2015 acm.

关键词： Computational chemistry

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for global states enumeration in concurrent systems 2015

A parallel algorithm for global states enumeration in concur...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Chang, Yen-Jung Garg, Vijay K. Department of Electrical and Computer Engineering University of Texas Austin United States

ISBN: (纸本)9781450332057

Verifying the correctness of the executions of a concurrent program is difficult because of its nondeterministic behavior. One of the verification methods is predicate detection, which predicts whether the user specified condition (predicate) could become true in any global states of the program. The method is predictive because it generates inferred execution paths from the observed execution path and then checks the predicate on the global states of inferred paths. One important part of predicate detection is global states enumeration, which generates the global states on inferred paths. Cooper and Marzullo gave the first enumeration algorithm based on a breadth first strategy (BFS). Later, many algorithms have been proposed to improve space and time complexity. None of them, however, takes parallelism into consideration. In this paper, we present the first parallel and online algorithm, named ParaMount, for global state enumeration. Our experimental results show that ParaMount speeds up the existing sequential algorithms by a factor of 6 with 8 threads. We have implemented an online predicate detector using ParaMount. For predicate detection, our detector based on ParaMount is 10 to 50 times faster than RV runtime (a verification tool that uses Cooper and Marzullo's BFS enumeration algorithm). Copyright 2015 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共56页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：