检索结果-内蒙古大学图书馆

19th IEEE International parallel and Distributed Processing Symposium, IPDPS 2005

作者： Gao, Guang R. Computer Architecture and Parallel System Laboratory Dept. of Electrical and Computer Engineering University of Delaware Department of Electrical and Computer Engineering University of Delaware

ISBN: (纸本)0769523129

Breakthrough-quality scientific discoveries in the new millennium (such as those expected in computation biology and others), along with optimal engineering designs, have created a demand for High-End Computing (HEC) systems with sustained performance requirements at a petaflop scale and beyond. Despite the very pessimistic (if not negative) views on parallel computing systems that have prevailed in 1990s, there seems to be no other viable alternatives for such HEC systems. In this talk, we present a fresh look at the problems facing the design of petascale parallel computing systems. We review several fundamental issues that such HEC parallel computing systems must resolve. These issues include: execution models that support dynamic and adaptive multithreading, fine-grain synchronization, and global name-space and memory consistency. Related issues in parallel programming, dynamic compilation models, and system software design will also be discussed. Present solutions and future direction will be discussed based on (1) application demand (e.g. computation biology and others), (2) the recent trend as demonstrated by the HTMT, HPCS, and the Blue-Gene Cyclops (e.g. Cyclops-64) architectures, and (3) a historical perspective on influential models such as dataflow, along with concepts learned from these models.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Sustained petaflop and beyond: can parallel computing systems meet the challenges?

Sustained petaflop and beyond: can parallel computing system...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： G.R. Gao Computer Architecture and Parallel System Laboratory Department of Electrical and Computer Engineering University of Delaware USA

Summary form is only given. Breakthrough-quality scientific discoveries in the new millennium (such as those expected in computation biology and others), along with optimal engineering designs, have created a demand for High-End Computing (HEC) systems with sustained performance requirements at a petaflop scale and beyond. Despite the very pessimistic (if not negative) views on parallel computing systems that have prevailed in 1990s, there seems to be no other viable alternatives for such HEC systems. In this talk, we present a fresh look at the problems facing the design of petascale parallel computing systems. We review several fundamental issues that such HEC parallel computing systems must resolve. These issues include: execution models that support dynamic and adaptive multithreading, fine-grain synchronization, and global name-space and memory consistency. Related issues in parallel programming, dynamic compilation models, and system software design will also be discussed. Present solutions and future direction will be discussed based on (1) application demand (e.g. computation biology and others), (2) the recent trend as demonstrated by the HTMT, HPCS, and the Blue-Gene Cyclops (e.g. Cyclops-64) architectures, and (3) a historical perspective on influential models such as dataflow, along with concepts learned from these models.

关键词： parallel processing Biological system modeling computer architecture Biology computing computer science Concurrent computing High performance computing system software Application software Bioinformatics

来源：评论

学校读者我要写书评

暂无评论

A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core architectures

A Discussion in Favor of Dynamic Scheduling for Regular Appl...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Elkin Garcia Daniel Orozco Robert Pavel Guang R. Gao Computer Architecture and Parallel System Laboratory (CAPSL) Electrical and Computer Engineering Department University of Delaware Newark DE USA

ISBN: (纸本)9781467309745

The recent evolution of many-core architectures has resulted in chips where the number of processor elements (PEs) are in the hundreds and continue to increase every day. In addition, many-core processors are more and more frequently characterized by the diversity of their resources and the way the sharing of those resources is arbitrated. On such machines, task scheduling is of paramount importance to orchestrate a satisfactory distribution of tasks with an efficient utilization of resources, especially when fine-grain parallelism is desired or required. In the past, the primary focus of scheduling techniques has been on achieving load balancing and reducing overhead with the aim to increase total performance. This focus has resulted in a scheduling paradigm where Static Scheduling (SS) is preferred to Dynamic Scheduling (DS) for highly regular and embarrassingly parallel applications running on homogeneous architectures. We have revisited the task scheduling problem for these types of applications under the scenario imposed by many-core architectures to investigate whether or not there exists scenarios where DS is better than SS. Our main contribution is the idea that, for highly regular and embarrassingly parallel applications, DS is preferable to SS in some situations commonly found in many-core architectures. We present experimental evidence that shows how the performance of SS is degraded by the new environment on many-core chips. We analyze three reasons that contribute to the superiority of DS over SS on many-core architectures under the situations described: 1) A uniform mapping of work to processors without considering the granularity of tasks is not necessarily scalable under limited amounts of work. 2) The presence of shared resources (i.e. the crossbar switch) produces unexpected and stochastic variations on the duration of tasks that SS is unable to manage properly. 3) Hardware features, such as in-memory atomic operations, greatly contribute to decrea

关键词： Tiles computer architecture Dynamic scheduling Random access memory Throughput Hardware Instruction sets

来源：评论

学校读者我要写书评

暂无评论

Speculative prefetching of induction pointers 10th

Speculative prefetching of induction pointers

引用

10th International Conference on Compiler Construction, CC 2001 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2001

作者： Stoutchinin, Artour Amaral, José Nelson Gao, Guang R. Dehnert, James C. Jain, Suneel Douillet, Alban STMicroelectronics Grenoble France Department of Computing Science University of Alberta EdmontonAB Canada Computer Architecture and Parallel System Laboratory University of Delaware NewarkDE United States Transmeta Co Santa ClaraCA United States Hewlett-Packard Co CupertinoCA United States

ISBN: (纸本)354041861X

We present an automatic approach for prefetching data for linked list data structures. The main idea is based on the observation that linked list elements are frequently allocated at constant distance from one another in the heap. When linked lists are traversed, a regular pattern of memory accesses with constant stride emerges. This regularity in the memory footprint of linked lists enables the development of a prefetching framework where the address of the element accessed in one of the future iterations of the loop is dynamically predicted based on its previous regular behavior. We automatically identify pointer-chasing recurrences in loops that access linked lists. This identification uses a surprisingly simple method that looks for induction pointers | pointers that are updated in each loop iteration by a load with a constant offset. We integrate induction pointer prefetching with loop scheduling. A key intuition incorporated in our framework is to insert prefetches only if there are processor resources and memory bandwidth available. In order to estimate available memory bandwidth we calculate the number of potential cache misses in one loop iteration. Our estimation algorithm is based on an application of graph coloring on a memory access interference graph derived from the control flow graph. We implemented the prefetching framework in an industry-strength production compiler, and performed experiments on ten benchmark programs with linked lists. We observed performance improvements between 15% and 35% in three of them. © Springer-Verlag Berlin Heidelberg 2001.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

A dynamic schema to increase performance in many-core architectures through percolation operations

A dynamic schema to increase performance in many-core archit...

引用

International Conference on High Performance Computing

作者： Elkin Garcia Daniel Orozco Rishi Khan Ioannis E. Venetisz Kelly Livingston Guang R. Gao Computer Architecture and Parallel System Laboratory (CAPSL) University of Delaware Newark DE U.S.A ET International Newark DE U.S.A University of Patras Patras GR

Optimization of parallel applications under new many-core architectures is challenging even for regular applications. Successful strategies inherited from previous generations of parallel or serial architectures just return incremental gains in performance and further optimization and tuning are required. We argue that conservative static optimizations are not the best fit for modern many-core architectures. The limited advantages of static techniques come from the new scenarios present in many-cores: Plenty of thread units sharing several resources under different coordination mechanisms. We point out that scheduling and data movement across the memory hierarchy are extremely important in the performance of applications. In particular, we found that scheduling of data movement operations significantly impact performance. To overcome those difficulties, we took advantage of the fine-grain synchronization primitives of many-cores to define percolation operations in order to schedule data movement properly. In addition, we have fused percolation operations with dynamic scheduling into a dynamic percolation approach. We used Dense Matrix Multiplication on a modern manycore to illustrate how our proposed techniques are able to increase the performance under these new environments. In our study on the IBM Cyclops-64, we raised the performance from 44 GFLOPS (out of 80 GFLOPS possible) to 70.0 GFLOPS (operands in on-chip memory) and 65.6 GFLOPS (operands in off-chip memory). The success of our approach also resulted in excellent power efficiency: 1.09 GFLOPS/Watt and 993 MFLOPS/Watt when the input data resided in on-chip and off-chip memory respectively.

关键词： Tiles system-on-chip computer architecture Optimization Program processors Registers Random access memory

来源：评论

学校读者我要写书评

暂无评论

Preface

引用

Lecture Notes in computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2004年 3222卷 v-vi页

作者： Siegel, H.J. Li, Guojie Ebcioglu, Kemal Gao, Guang R. Xu, Zhiwei University of Delaware Computer Architecture and Parallel System Laboratory 140 Evans Hall NewarkDE United States Chinese Academy of Sciences Institute of Computing Technology P.O. Box 2704 Beijing China

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：