检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

336 篇 会议
46 篇 期刊文献

馆藏范围

382 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

329 篇 工学
- 284 篇 软件工程
- 268 篇 计算机科学与技术...
- 12 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 4 篇 机械工程
- 4 篇 控制科学与工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
58 篇 理学
- 52 篇 数学
- 5 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
15 篇 管理学
- 10 篇 管理科学与工程(可...
- 8 篇 工商管理
- 5 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
2 篇 教育学
- 2 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

71 篇 performance
49 篇 parallel process...
42 篇 algorithms
42 篇 parallel program...
39 篇 languages
34 篇 design
21 篇 gpu
20 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
9 篇 theory
9 篇 parallel computi...
8 篇 mpi
8 篇 parallel
7 篇 parallelism
7 篇 graphics process...
7 篇 logic programmin...
7 篇 concurrency
6 篇 openmp
5 篇 reliability

机构

7 篇 carnegie mellon ...
5 篇 indiana univ blo...
4 篇 univ wisconsin d...
3 篇 univ of tokyo
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 purdue univ w la...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 princeton univ d...
2 篇 georgetown univ ...
2 篇 yale university ...
2 篇 coll william & m...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
5 篇 kennedy ken
4 篇 dhulipala laxman
4 篇 miller barton p.
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 agrawal kunal
4 篇 gu yan
4 篇 leiserson charle...

语言

343 篇 英文
39 篇 其他

检索条件"任意字段=14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 382 条记录，以下是221-230 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

the STAPL parallel Container Framework 11

The STAPL Parallel Container Framework

引用

16th acm symposium on principles and practice of parallel programming

作者： Tanase, Gabriel Buss, Antal Fidel, Adam Harshvardhan Papadopoulos, Ioannis Pearce, Olga Smith, Timmie thomas, Nathan Xu, Xiabing Mourad, Nedal Vu, Jeremy Bianco, Mauro Amato, Nancy M. Rauchwerger, Lawrence Texas A&M Univ Parasol Lab Dept Comp Sci & Engn College Stn TX 77843 USA

ISBN: (纸本)9781450301190

the Standard Template Adaptive parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL parallel Container Framework (PCF), that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. the PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). the PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.

关键词： parallel programming Languages Libraries Data Structures Languages Design Performance

来源：评论

学校读者我要写书评

暂无评论

COREMU: A Scalable and Portable parallel Full-system Emulator 11

COREMU: A Scalable and Portable Parallel Full-system Emulato...

引用

16th acm symposium on principles and practice of parallel programming

作者： Wang, Zhaoguo Liu, Ran Chen, Yufei Wu, Xi Chen, Haibo Zhang, Weihua Zang, Binyu Fudan Univ Parallel Proc Inst Shanghai Peoples R China

ISBN: (纸本)9781450301190

this paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. the key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 (1) cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

关键词： Full-system Emulator parallel Emulator Multicore Design Experimentation Performance

来源：评论

学校读者我要写书评

暂无评论

Lifeline-based Global Load Balancing 11

Lifeline-based Global Load Balancing

引用

16th acm symposium on principles and practice of parallel programming

作者： Saraswat, Vijay Kambadur, Prabhanjan Kodali, Sreedhar Grove, David Krishnamoorthy, Sriram Pacific NW Natl Lab Richland WA 99352 USA

ISBN: (纸本)9781450301190

On shared-memory systems, Cilk-style work-stealing [5] has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS) [24, 28]. there are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work. In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution. When work is sparse, this results in performance degradation. In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing. Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts. Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work). this problem is well-studied and requires careful design for good performance. Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection. In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory. First, we introduce lifeline graphs: low-degree, low-diameter, fully-connected directed graphs. Such graphs can be constructed from k-dimensional hypercubes. When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph. Quiescent nodes do not disturb other nodes. A quiesced node is reactivated when work arrives from a lifeline, and itself shares this work with those of its incoming lifelines that are activated. Termination occurs precisely when computation at all nodes has quiesced. In a language such as X10, such passive distributed terminati

关键词： UTS global load balancing distributed computing X10 work-stealing parallel programming Algorithms Design

来源：评论

学校读者我要写书评

暂无评论

A Domain-Specific Approach To Heterogeneous parallelism

A Domain-Specific Approach To Heterogeneous Parallelism

引用

16th acm symposium on principles and practice of parallel programming

作者： Chafi, Hassan Sujeeth, Arvind K. Brown, Kevin J. Lee, HyoukJoong Atreya, Anand R. Olukotun, Kunle Stanford Univ Pervas Parallelism Lab Stanford CA 94305 USA

Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce De lite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on De lite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.

关键词： parallel programming Domain-Specific Languages Dynamic Optimizations Languages Performance

来源：评论

学校读者我要写书评

暂无评论

Intra-Application Shared Cache Partitioning For Multithreaded Applications

Intra-Application Shared Cache Partitioning For Multithreade...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Muralidhara, Sai Prashanth Kandemir, Mahmut Raghavan, Padma Penn State Univ University Pk PA 16802 USA

In this paper, we address the problem of partitioning a shared cache when the executing threads belong to the same application.

ISBN: (纸本)9781605587080

In this paper, we address the problem of partitioning a shared cache when the executing threads belong to the same application.

关键词： Cache Multicore parallel Applications

来源：评论

学校读者我要写书评

暂无评论

Applying the Concurrent Collections programming Model to Asynchronous parallel Dense Linear Algebra

Applying the Concurrent Collections Programming Model to Asy...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Chandramowlishwaran, Aparna Knobe, Kathleen Vuduc, Richard Georgia Inst Technol Atlanta GA 30332 USA Intel Corp Santa Clara CA 95051 USA

ISBN: (纸本)9781605587080

this poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of dense matrices. In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. We demonstrate the performance potential of CnC in this poster, by showing that our Cholesky implementation nearly matches or exceeds competing vendor-tuned codes and alternative programming models. We conclude that the CnC model is well-suited for expressing asynchronous-parallel algorithms on emerging multicore systems.

关键词： Algorithms Performance

来源：评论

学校读者我要写书评

暂无评论

the Pilot Library for Novice MPI Programmers

The Pilot Library for Novice MPI Programmers

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Carter, John D. Gardner, William B. Grewal, Gary Univ Guelph Sch Comp Sci Guelph ON N1G 2W1 Canada

ISBN: (纸本)9781605587080

the Pilot library is a new method for programming MPI-enabled clusters in C, targeted at novice parallel programmers. Formal elements from Communicating Sequential Processes (CSP) are used to realize a process/channel model of parallel computation that reduces opportunities for deadlock and other communication errors. this simple model, plus an application programming interface (API) styled after C's formatted I/O, are designed to make the library easy to learn. the Pilot library exists as a thin layer on top of any standard Message Passing Interface (MPI) implementation, preserving MPI's portability and efficiency, with little performance overhead arising as result of Pilot's additional features.

关键词： Design Languages high-performance computing cluster programming C MPI collective operations deadlock detection

来源：评论

学校读者我要写书评

暂无评论

Structure-driven Optimizations for Amorphous Data-parallel Programs

Structure-driven Optimizations for Amorphous Data-parallel P...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Mendez-Lojo, Mario Nguyen, Donald Prountzos, Dimitrios Sui, Xin Hassaan, M. Amber Kulkarni, Milind Burtscher, Martin Pingali, Keshav Univ Texas Austin Inst Computat Engn & Sci Austin TX 78712 USA

ISBN: (纸本)9781605587080

Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic parallel execution can be substantial. In this paper, we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads. We describe the implementation of these optimizations in the Galois system and present experimental results to demonstrate their benefits. To the best of our knowledge, this is the first system to exploit algorithmic structure to optimize the execution of irregular programs.

关键词： Amorphous Data-parallelism Irregular Programs Optimistic parallelization Synchronization Overheads Cautious Operator Implementations One-shot Optimization Iteration Coalescing

来源：评论

学校读者我要写书评

暂无评论

Load Balancing on Speed

Load Balancing on Speed

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Hofmeyr, Steven Iancu, Costin Blagojevic, Filip Lawrence Berkeley Natl Lab Berkeley CA USA

ISBN: (纸本)9781605587080

To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on "faster" and "slower" cores. We provide a user level implementation of speed balancing on UMA and NUMA multi-socket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load balancing improves performance and provides good performance isolation in all cases considered. Speed balancing is also able to provide comparable or better performance than DWRR, a fair multi-processor scheduling implementation inside the Linux kernel. Furthermore, parallel application performance is often determined by the implementation of synchronization operations and speed balancing alleviates the need for tuning the implementations of such primitives.

关键词： Experimentation theory Performance Measurement Languages Design parallel programming Operating System Load Balancing Speed Balancing Multicore Multisocket

来源：评论

学校读者我要写书评

暂无评论

Data Transformations Enabling Loop Vectorization on Multithreaded Data parallel Architectures

Data Transformations Enabling Loop Vectorization on Multithr...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Jang, Byunghyun Mistry, Perhaad Schaa, Dana Dominguez, Rodrigo Kaeli, David Northeastern Univ Dept ECE Boston MA 02115 USA

ISBN: (纸本)9781605587080

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. this paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.

关键词： Loop Vectorization Data Transformation GPGPU

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共39页 << < 19 20 21 22 23 24 25 26 27 28 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：