检索结果-内蒙古大学图书馆

THE CLASSIFICATION, FUSION, AND parallelization OF ARRAY LANGUAGE PRIMITIVES

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1994年第10期5卷 1113-1120页

作者： JU, DCR WU, CL CARINI, P IBM CORP THOMAS J WATSON RES CTRYORKTOWN HTSNY 10598

We present a classification scheme for array language primitives that quantifies the variation in parallelism and data locality that results from the fusion of any two primitives. We also present an algorithm based on this scheme that efficiently determines when it is beneficial to fuse any two primitives. Experimental results show that five LINPACK routines report 50% performance improvement from the fusion of array operators.

关键词： ARRAY LANGUAGE compiler OPTIMIZATION DATA PARALLELISM LOOP FUSION compiler parallelization

来源：评论

学校读者我要写书评

暂无评论

A compiler for exploiting nested parallelism in OpenMP programs

引用

PARALLEL COMPUTING 2005年第1-12期31卷 960-983页

作者： Tian, XM Hoeflinger, JP Haab, G Chen, YK Girkar, M Shah, S Intel Corp Intel Compiler Labs Software & Solut Grp Santa Clara CA 95052 USA Intel Corp Appl Res Lab Core Technol Grp Santa Clara CA 95052 USA Intel Corp Parallel & Distributed Solut Div Software & Solut Grp Champaign IL 61820 USA

This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel (R) C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragmas or directives. We conduct the performance evaluation of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel C++ compiler on Hyper-Threading Technology (HT) enabled multiprocessor systems. The performance results show that the multithreaded code generated by the Intel compiler achieved a speedup up to 4.69 on 4 processors with HT enabled for five different input video sequences for the H.264 encoder workload, and a 1.28 speedup on an HT enabled single-CPU system and 1.99 speedup on an HT-enabled dual-CPU system for the audio visual speech recognition workload. The performance gain due to exploiting nested parallelism for leveraging Hyper-Threading Technology is up to 70% for two multimedia workloads under different multiprocessor system configurations. These results demonstrate that hyper-threading benefits can be achieved by exploiting nested parallelism through Intel compiler and runtime system support for OpenMP programs, (c) 2005 Elsevier B.V. All rights reserved.

关键词： compiler parallelization nested parallelism OpenMP hyper-threading performance

来源：评论

学校读者我要写书评

暂无评论

LOOP COALESCING AND SCHEDULING FOR BARRIER MIMD ARCHITECTURES

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1993年第9期4卷 1060-1064页

作者： OKEEFE, MT DIETZ, HG PURDUE UNIV SCH ELECT ENGNW LAFAYETTEIN 47907

Barrier MIMD's are asynchronous multiple instruction stream, multiple data stream architectures capable of parallel execution of variable execution time instructions and arbitrary control flow (e.g., while loops and calls);however, they differ from conventional MIMD's in that the need for run-time synchronization is significantly reduced. This work considers the problem of scheduling nested loop structures on a barrier MIMD. The basic approach employs loop coalescing, a technique for transforming a multiply-nested loop into a single loop. Loop coalescing is extended to nested triangular loops, in which inner loop bounds are functions or outer loop indices. In addition, a more efficient scheme to generate the original loop indices from the coalesced index is proposed for the case of constant loop bounds. These results are general, and can be applied to extend previous work using loop coalescing techniques. We concentrate on using loop coalescing for scheduling barrier MIMDs, and show how previous work in loop transformations and linear scheduling theory can be applied to this problem.

关键词： BARRIER SYNCHRONIZATION compiler OPTIMIZATION compiler parallelization LOOP COALESCING LOOP TRANSFORMATION STATIC BARRIER MIMD

来源：评论

学校读者我要写书评

暂无评论

A Gate-Level Approach To Compiling For Quantum Computers 9

A Gate-Level Approach To Compiling For Quantum Computers

引用

9th International Green and Sustainable Computing Conference (IGSC)

作者： Dietz, Henry G. Univ Kentucky Elect & Comp Engn Lexington KY 40506 USA

ISBN: (纸本)9781538674666

Programming language constructs generally operate on data words, and so does most compiler analysis and transformation. However, individual word-level operations often harbor pointless, yet resource and power hungry, lower-level operations. By transforming complete programs into gate-level operations on individual bits, and optimizing operations at that level, it is possible to dramatically reduce the total amount of work needed to execute the program's algorithm. This gate-level representation can be in terms of any complete set of logic gate types;earlier work targeted conventional multiplexor gates, but the work reported here centers on targeting CSWAP (FredKin) gates without fanout - a form that can be implemented on a quantum computer. This paper will overview the approach, describe the current state of the prototype compiler, and suggest some ways in which compiler automatic parallelization technology might be extended to allow ordinary programs to take advantage of the unique properties of quantum computers.

关键词： compiler optimization logic optimization compiler parallelization CSWAP quantum computing

来源：评论

学校读者我要写书评

暂无评论

Containers on the parallelization of General-Purpose Java Programs 99

Containers on the Parallelization of General-Purpose Java Pr...

引用

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

作者： Peng Wu David Padua

ISBN: (纸本)9780769504254

Automatic parallelization of general-purpose programs is still not possible in general in the presence of irregular data structures and complex control-flows. One promising strategy is tread-level data speculation (TLDS). Although TLDS alleviates the need of proving independent computations statically, studies show that applying TLDS blindly to programs with limited speculative parallelism may lead to performance degradation. Therefore, a positive approach is to combine TLDS with strong compiler analyses. The compiler can provide a guideline of where to speculate by "lazily" detecting some dependences and leave dependences that are more dynamic to be detected at runtime. Furthermore, transformations can be applied to eliminate some of the dependences detected by the compiler to enhance speculative parallelism in the program. This paper proposes compiler techniques to implement this approach. In particular, we focus on general-purpose Java programs with extensive use of containers that refer to any general-purpose aggregate data structures.

关键词： container Java parallelization dependence test data speculation compiler parallelization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：