检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

312 篇 会议
18 篇 期刊文献

馆藏范围

330 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

281 篇 工学
- 239 篇 软件工程
- 228 篇 计算机科学与技术...
- 12 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 5 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
54 篇 理学
- 50 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 2 篇 化学
15 篇 管理学
- 11 篇 管理科学与工程(可...
- 9 篇 工商管理
- 4 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
1 篇 教育学
- 1 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

54 篇 performance
49 篇 parallel process...
33 篇 algorithms
32 篇 parallel program...
27 篇 languages
25 篇 design
20 篇 parallel algorit...
20 篇 gpu
9 篇 experimentation
9 篇 measurement
7 篇 graphics process...
7 篇 theory
7 篇 parallel
6 篇 mpi
6 篇 parallel computi...
6 篇 concurrency
5 篇 scalability
5 篇 parallelism
5 篇 graph algorithms
5 篇 synchronization

机构

7 篇 carnegie mellon ...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 georgetown univ ...
2 篇 univ wisconsin d...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...
2 篇 tsinghua univ de...
2 篇 shanghai jiao to...
2 篇 nvidia corporati...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 tan guangming
4 篇 wang haojie
4 篇 mellor-crummey j...
4 篇 agrawal kunal
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori
3 篇 pingali keshav
3 篇 kim jungwon

语言

328 篇 英文
2 篇 其他

检索条件"任意字段=Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 330 条记录，以下是201-210 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

COREMU: A Scalable and Portable parallel Full-system Emulator 11

COREMU: A Scalable and Portable Parallel Full-system Emulato...

引用

16th acm symposium on principles and practice of parallel programming

作者： Wang, Zhaoguo Liu, Ran Chen, Yufei Wu, Xi Chen, Haibo Zhang, Weihua Zang, Binyu Fudan Univ Parallel Proc Inst Shanghai Peoples R China

ISBN: (纸本)9781450301190

this paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. the key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 (1) cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

关键词： Full-system Emulator parallel Emulator Multicore Design Experimentation Performance

来源：评论

学校读者我要写书评

暂无评论

Lifeline-based Global Load Balancing 11

Lifeline-based Global Load Balancing

引用

16th acm symposium on principles and practice of parallel programming

作者： Saraswat, Vijay Kambadur, Prabhanjan Kodali, Sreedhar Grove, David Krishnamoorthy, Sriram Pacific NW Natl Lab Richland WA 99352 USA

ISBN: (纸本)9781450301190

On shared-memory systems, Cilk-style work-stealing [5] has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS) [24, 28]. there are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work. In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution. When work is sparse, this results in performance degradation. In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing. Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts. Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work). this problem is well-studied and requires careful design for good performance. Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection. In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory. First, we introduce lifeline graphs: low-degree, low-diameter, fully-connected directed graphs. Such graphs can be constructed from k-dimensional hypercubes. When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph. Quiescent nodes do not disturb other nodes. A quiesced node is reactivated when work arrives from a lifeline, and itself shares this work with those of its incoming lifelines that are activated. Termination occurs precisely when computation at all nodes has quiesced. In a language such as X10, such passive distributed terminati

关键词： UTS global load balancing distributed computing X10 work-stealing parallel programming Algorithms Design

来源：评论

学校读者我要写书评

暂无评论

Intra-Application Shared Cache Partitioning For Multithreaded Applications

Intra-Application Shared Cache Partitioning For Multithreade...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Muralidhara, Sai Prashanth Kandemir, Mahmut Raghavan, Padma Penn State Univ University Pk PA 16802 USA

In this paper, we address the problem of partitioning a shared cache when the executing threads belong to the same application.

ISBN: (纸本)9781605587080

In this paper, we address the problem of partitioning a shared cache when the executing threads belong to the same application.

关键词： Cache Multicore parallel Applications

来源：评论

学校读者我要写书评

暂无评论

Applying the Concurrent Collections programming Model to Asynchronous parallel Dense Linear Algebra

Applying the Concurrent Collections Programming Model to Asy...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Chandramowlishwaran, Aparna Knobe, Kathleen Vuduc, Richard Georgia Inst Technol Atlanta GA 30332 USA Intel Corp Santa Clara CA 95051 USA

ISBN: (纸本)9781605587080

this poster is a case study on the application of a novel programming model, called Concurrent Collections (CnC), to the implementation of an asynchronous-parallel algorithm for computing the Cholesky factorization of dense matrices. In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. We demonstrate the performance potential of CnC in this poster, by showing that our Cholesky implementation nearly matches or exceeds competing vendor-tuned codes and alternative programming models. We conclude that the CnC model is well-suited for expressing asynchronous-parallel algorithms on emerging multicore systems.

关键词： Algorithms Performance

来源：评论

学校读者我要写书评

暂无评论

the Pilot Library for Novice MPI Programmers

The Pilot Library for Novice MPI Programmers

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Carter, John D. Gardner, William B. Grewal, Gary Univ Guelph Sch Comp Sci Guelph ON N1G 2W1 Canada

ISBN: (纸本)9781605587080

the Pilot library is a new method for programming MPI-enabled clusters in C, targeted at novice parallel programmers. Formal elements from Communicating Sequential Processes (CSP) are used to realize a process/channel model of parallel computation that reduces opportunities for deadlock and other communication errors. this simple model, plus an application programming interface (API) styled after C's formatted I/O, are designed to make the library easy to learn. the Pilot library exists as a thin layer on top of any standard Message Passing Interface (MPI) implementation, preserving MPI's portability and efficiency, with little performance overhead arising as result of Pilot's additional features.

关键词： Design Languages high-performance computing cluster programming C MPI collective operations deadlock detection

来源：评论

学校读者我要写书评

暂无评论

Structure-driven Optimizations for Amorphous Data-parallel Programs

Structure-driven Optimizations for Amorphous Data-parallel P...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Mendez-Lojo, Mario Nguyen, Donald Prountzos, Dimitrios Sui, Xin Hassaan, M. Amber Kulkarni, Milind Burtscher, Martin Pingali, Keshav Univ Texas Austin Inst Computat Engn & Sci Austin TX 78712 USA

ISBN: (纸本)9781605587080

Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic parallel execution can be substantial. In this paper, we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads. We describe the implementation of these optimizations in the Galois system and present experimental results to demonstrate their benefits. To the best of our knowledge, this is the first system to exploit algorithmic structure to optimize the execution of irregular programs.

关键词： Amorphous Data-parallelism Irregular Programs Optimistic parallelization Synchronization Overheads Cautious Operator Implementations One-shot Optimization Iteration Coalescing

来源：评论

学校读者我要写书评

暂无评论

Load Balancing on Speed

Load Balancing on Speed

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Hofmeyr, Steven Iancu, Costin Blagojevic, Filip Lawrence Berkeley Natl Lab Berkeley CA USA

ISBN: (纸本)9781605587080

To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on "faster" and "slower" cores. We provide a user level implementation of speed balancing on UMA and NUMA multi-socket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load balancing improves performance and provides good performance isolation in all cases considered. Speed balancing is also able to provide comparable or better performance than DWRR, a fair multi-processor scheduling implementation inside the Linux kernel. Furthermore, parallel application performance is often determined by the implementation of synchronization operations and speed balancing alleviates the need for tuning the implementations of such primitives.

关键词： Experimentation theory Performance Measurement Languages Design parallel programming Operating System Load Balancing Speed Balancing Multicore Multisocket

来源：评论

学校读者我要写书评

暂无评论

Data Transformations Enabling Loop Vectorization on Multithreaded Data parallel Architectures

Data Transformations Enabling Loop Vectorization on Multithr...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Jang, Byunghyun Mistry, Perhaad Schaa, Dana Dominguez, Rodrigo Kaeli, David Northeastern Univ Dept ECE Boston MA 02115 USA

ISBN: (纸本)9781605587080

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in the data stream. this paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures. We present a mathematical model that captures loop-based memory access patterns and computes the most appropriate data transformations in order to enable vectorization. Our experimental results show that the proposed data transformations can significantly increase the number of loops that can be vectorized and enhance the data-level parallelism of applications. Our results also show that the overhead associated with our data transformations can be easily amortized as the size of the input data set increases. For the set of high performance benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4X) by applying vectorization using our data transformation approach.

关键词： Loop Vectorization Data Transformation GPGPU

来源：评论

学校读者我要写书评

暂无评论

Effective Communication and Computation Overlap with Hybrid MPI/SMPSs

Effective Communication and Computation Overlap with Hybrid ...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Marjanovic, Vladimir Labarta, Jesus Ayguade, Eduard Valero, Mateo Barcelona Supercomp Ctr BSC CNS Dept Comp Sci Barcelona Spain Tech Univ Catalunya UPC Comp Architecture Dept Barcelona Spain

ISBN: (纸本)9781605587080

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication primitives. this increases code complexity, requiring more development effort and making less readable programs. this paper presents the hybrid use of MPI and SMPSs (SMP superscalar, a task-based shared-memory programming model) that allows the programmer to easily introduce the asynchrony necessary to overlap communication and computation. We demonstrate the hybrid use of MPI/SMPSs with the high-performance LINPACK benchmark (HPL), and compare it to the pure MPI implementation, which uses the look-ahead technique to overlap communication and computation. the hybrid MPI/SMPSs version significantly improves the performance of the pure MPI version, getting close to the asymptotic performance at medium problem sizes and still getting significant benefits at small/large problem sizes.

关键词： Algorithms Performance Languages parallel programming model MPI hybrid MPI/SMPSs LINPACK

来源：评论

学校读者我要写书评

暂无评论

Is Transactional programming Actually Easier?

Is Transactional Programming Actually Easier?

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Rossbach, Christopher J. Hofmann, Owen S. Witchel, Emmett Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781605587080

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. the promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently. the transactional memory (TM) research community is heavily invested in the claim that programming with transactional memory is easier than alternatives (like locks), but evidence for or against the veracity of this claim is scant. In this paper, we describe a user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students' code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

关键词： Design Performance Transactional Memory Optimistic Concurrency Synchronization

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共33页 << < 17 18 19 20 21 22 23 24 25 26 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：