检索结果-内蒙古大学图书馆

Non-strict independence-based program parallelization using sharing and freeness information

THEORETICAL COMPUTER SCIENCE 2009年第46期410卷 4704-4723页

作者： Gras, Daniel Cabeza Hermenegildo, Manuel V. Univ Politecn Madrid Fac Informat E-28040 Madrid Spain

The current ubiquity of multi-core processors has brought renewed interest in program parallelization. Logic programs allow studying the parallelization of programs with complex, dynamic data structures with (declarative) pointers in a comparatively simple semantic setting. In this context, automatic parallelizers which exploit and-parallelism rely on notions of independence in order to ensure certain efficiency properties. "Non-strict" independence is a more relaxed notion than the traditional notion of "strict" independence which still ensures the relevant efficiency properties and can allow considerable more parallelism. Non-strict independence cannot be determined solely at run-time ("a priori") and thus global analysis is a requirement. However, extracting non-strict independence information from available analyses and domains is non-trivial. This paper provides on one hand an extended presentation of our classic techniques for compile-time detection of non-strict independence based on extracting information from (abstract interpretation-based) analyses using the now well understood and popular Sharing + Freeness domain. This includes algorithms for combined compile-time/run-time detection which involve special run-time checks for this type of parallelism. In addition, we propose herein novel annotation (parallelization) algorithms, URLP and CRLP, which are specially suited to non-strict independence. We also propose new ways of using the Sharing + Freeness information to optimize how the run-time environments of goals are kept apart during parallel execution. Finally. we also describe the implementation of these techniques in our parallelizing compiler and recall some early performance results. We provide as well an extended description of our pictorial representation of sharing and freeness information. (C) 2009 Elsevier B.V. All rights reserved.

关键词： Parallelism Automatic parallelization Abstract interpretation Abstract domains Sharing and freeness Non-strict independence parallelizing compilers Declarative languages Logic programming

来源：评论

学校读者我要写书评

暂无评论

A program auto-parallelizer based on the component technology of optimizing compiler construction

引用

PROGRAMMING AND COMPUTER SOFTWARE 2009年第6期35卷 321-339页

作者： Drozdov, A. Yu. Novikov, S. V. Russian Acad Sci Lebedev Inst Precise Mech & Comp Engn Moscow 119991 Russia

This paper describes a program auto-parallelizer that is based on the component approach to constructing optimizing compilers;the parallelizer is included in the technological chain of gcc. Details of using analytical and optimization components for constructing an auto-parallelizer and a parallelization algorithm using the OpenMP library are considered. Finally, we discuss the results of operation of the auto-parallelizer in terms of performance on a subset of problems in the Spec2006 and NAS parallel benchmarks packages.

关键词： PARALLEL programs (Computer programs) OPTIMIZING compilers PARALLEL algorithms parallelizing compilers COMPUTER software

来源：评论

学校读者我要写书评

暂无评论

Parallelization with Automatic parallelizing Compiler Generating Consumer Electronics Multicore API

Parallelization with Automatic Parallelizing Compiler Genera...

引用

IEEE International Symposium on Parallel and Distributed Processing with Applications

作者： Miyamoto, Takamichi Asaka, Saori Mikami, Hiroki Mase, Masayoshi Wada, Yasutaka Nakano, Hirofumi Kimura, Keiji Kasahara, Hironori Waseda Univ Dept Comp Sci & Engn Shinjuku Ku Tokyo 1698555 Japan

ISBN: (纸本)9780769534718

Multicore processors have been adopted for consumer electronics like portable electronics, mobile phones, car navigation systems, digital TVs and games to obtain high performance with low power consumption. The OSCAR automatic parallelizing compiler has been developed to utilize these multicores easily. Also, a new Consumer Electronics Multicore Application Program Interface (API) to use the OSCAR compiler with native sequential compilers for various kinds of multicores from different vendors has been developed in NEDO (New Energy and Industrial Technology Development Organization) "Multicore Technology for Realtime Consumer Electronics" project with Japanese 6 IT companies. This paper evaluates the parallel processing performance of multimedia applications using this API by the OSCAR compiler on the FR1000 4 VLIW cores multicore processor developed by Fujitsu Ltd, and the RP1 4 SH-4A cores multicore processor jointly-developed by Renesas Technology Corp., Hitachi Ltd. and Waseda University. As the results, the parallel codes generated by the OSCAR compiler using the API give us 3.27 times speedup on average using 4 cores against 1 core on the FR1000 multicore, and 3.31 times speedup on average using 4 cores against 1 core on the RP1 multicore.

关键词： application program interface automatic parallelizing compiler multicore multimedia application Multi-core processors MULTIMEDIA APPLICATION consumer electronics parallelizing compilers parallelization Speeding compilers

来源：评论

学校读者我要写书评

暂无评论

Memetic algorithms for parallel code optimization

引用

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING 2007年第1期35卷 33-61页

作者： Oezcan, Ender Onbasioglu, Esin Yeditepe Univ TR-34755 Istanbul Turkey

Discovering the optimum number of processors and the distribution of data on distributed memory parallel computers for a given algorithm is a demanding task. A memetic algorithm (MA) is proposed here to find the best number of processors and the best data distribution method to be used for each stage of a parallel program. Steady state memetic algorithm is compared with transgenerational memetic algorithm using different crossover operators and hill-climbing methods. A self-adaptive MA is also implemented, based on a multimeme strategy. All the experiments are carried out on computationally intensive, communication intensive, and mixed problem instances. The MA performs successfully for the illustrative problem instances.

关键词： distributed memory parallel computers memetic algorithms parallelizing compilers search methods

来源：评论

学校读者我要写书评

暂无评论

An exact data dependence testing method for quadratic expressions

引用

INFORMATION SCIENCES 2007年第23期177卷 5316-5328页

作者： Wu, Jia-Hwa Chu, Chih-Ping Natl Cheng Kung Univ Dept Comp Sci & Informat Engn Tainan 701 Taiwan

Optimizing compilers relies on program analysis techniques to detect data dependence between program statements. Data dependence testing is a basic step in detecting loop-level parallelism in numerical program. Most studies indicate that data dependence tests cannot handle nonlinear-expression array subscripts. This study presents an exact dependence test that can handle quadratic expression array subscripts precisely. The proposed method detects whether a quadratic equation is monotonically increasing or decreasing, and then reduces the integer solution interval of each variable by repeated projection. When the effective solution interval for any variable shrinks to empty, no integer solutions exist for this quadratic equation;otherwise, all integer solutions can be found, implying that parallelism of a loop can be exploited. (C) 2007 Elsevier Inc. All rights reserved.

关键词： dependence test parallelizing compilers

来源：评论

学校读者我要写书评

暂无评论

Compiling parallel MATLAB for general distributions using telescoping languages

Compiling parallel MATLAB for general distributions using te...

引用

32nd IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Fletcher, Mary McCosh, Cheryl Jin, Guohua Kennedy, Ken Rice Univ Dept Comp Sci Houston TX 77005 USA

ISBN: (纸本)1424407281

Matlab is one of the most popular computer languages for technical and scientific programming. However, until recently, it has been limited to running on uniprocessors. One strategy for overcoming this limitation is to introduce global distributed arrays, with those arrays distributed across the processors of a parallel machine. In this paper, we describe the compilation technology we have designed for Matlab D, a distributed-array extension of Matlab. Our approach is distinguished by a two-phase compilation technology with support for a rich collection of data distributions. By precompiling array operations and communication steps into Fortran plus MPI, the time to compile an application using those operations is significantly reduced. This paper includes preliminary results that demonstrate that this approach can dramatically improve performance, scaling well to at least 32 processors.

关键词： parallelizing compilers

来源：评论

学校读者我要写书评

暂无评论

Reduction of complexity and automation of parallel execution through loop level parallelism

Reduction of complexity and automation of parallel execution...

引用

7th International Conference on Quality Software

作者： Tefft, Robert A. Lee, Roger Y. Cent Michigan Univ Dept Comp Sci Mt Pleasant MI 48859 USA Cent Michigan Univ SEITI Mt Pleasant MI 48859 USA

ISBN: (纸本)9780769530352

SHAM (Single Instruction Multiple Data) is a processor-architecture classification from Flynn's taxonomy. The concept is that a single instruction set operates on multiple units of data simultaneously. Computers use this processor architecture are known as array processors or vector processors. Most computers in use today are SISD (single instruction single data) though allowing a single instruction to operate on multiple data can also be applied to a virtual machine that is capable of parallel execution through the use of multi-threading/multi-core processors, or distributed parallel execution on a multi-computer grid. This paper proposes a language structure that applies the SIMD concept to the Java virtual machine. The motive is to reduce the complexity of the code and ease implementation of parallelization by running a single set of instructions concurrently on an entire collection of objects.

关键词： concurrency control parallel languages parallel processing parallel programming parallelizing compilers vector processing

来源：评论

学校读者我要写书评

暂无评论

Formal methods to generate parallel iterative codes for PDE-based applications

Formal methods to generate parallel iterative codes for PDE-...

引用

IEEE International Conference on Engineering of Complex Computer Systems

作者： Peiyi Tang Department of Computer Science University of Arkansas Little Rock Little Rock AR USA

Developing parallel software is far more complex than traditional sequential software. An effective approach to deal with the complexity of parallel software is domain-specific programming in an abstraction higher than general-purpose programming languages. In this paper, we focus on the domain of the applications based on partial differential equations (PDE) and provide a formal framework and methods for PDE compilers to generate parallel iterative codes for the domain. We also provide a PDE compiler optimization to minimize the number of messages between parallel processors. Our framework and methods can be used to build PDE compilers to generate efficient parallel software for PDE-based applications automatically.

关键词： parallelizing compilers Optimizing compilers Parallel programming Partial differential equations Minimization methods Software requirements and specifications Message passing

来源：评论

学校读者我要写书评

暂无评论

An experimental evaluation of data dependence analysis techniques

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2004年第3期15卷 196-213页

作者： Psarris, K Kyriakopoulos, K Univ Texas Dept Comp Sci San Antonio TX 78249 USA

Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. Data dependence information captures the essential ordering constraints of the statements in a program that need to be preserved in order to produce valid optimized and parallel code. Data dependence testing is very important for automatic parallelization, vectorization, and any other code transformation. In this paper, we examine the impact of data dependence analysis in practice. A number of data dependence tests have been proposed in the literature. In each test, there are different trade offs between accuracy and efficiency. We present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test, and the Omega test. We compare these tests in terms of data dependence accuracy, compilation efficiency, effectiveness in parallelization, and program execution performance. We analyze the reasons why a data dependence test can be inexact and we explain how the examined tests handle such cases. We run various experiments using the Perfect Club Benchmarks and the scientific library Lapack. We present the measured accuracy of each test and the reasons for any approximation. We compare these tests in terms of efficiency and we analyze the trade offs between accuracy and efficiency. We also determine the impact of each data dependence test on the total compilation time. Finally, we measure the number of loops parallelized by each test and we compare the execution performance of each benchmark on a multiprocessor. Our results indicate that the Omega test is more accurate, but also very inefficient in the cases where the other two tests are inaccurate. In general, the cost of the Omega test is high and uses a significant percentage of the total compilation time. Furthermore, the difference in accuracy of the Omega test over the Banerjee test and the I-Test does not improve parallelization and program execution performance.

关键词： parallelizing compilers data dependence program analysis automatic parallelization compiler optimization

来源：评论

学校读者我要写书评

暂无评论

Run-time support for the automatic parallelization of Java programs

引用

JOURNAL OF SUPERCOMPUTING 2004年第1期28卷 91-117页

作者： Chan, B Abdelrahman, TS Univ Toronto Dept Elect & Comp Engn Edward S Rogers Sr Dept Elect & Comp Engn Toronto ON M5S 3G4 Canada

We describe and evaluate a novel approach for the automatic parallelization of programs that use pointer-based dynamic data structures, written in Java. The approach exploits parallelism among methods by creating an asynchronous thread of execution for each method invocation in a program. At compile time, methods are analyzed to determine the data they access, parameterized by their context. A description of these data accesses is transmitted to a run-time system during program execution. The run-time system utilizes this description to determine when a thread may execute, and to enforce dependences among threads. This run-time system is the main focus of this paper. More specifically, the paper details the representation of data accesses in a method and the framework used by the run-time system to detect and enforce dependences among threads. Experimental evaluation of an implementation of the run-time system on a four-processor Sun multiprocessor indicates that close to ideal speedup can be obtained for a number of benchmarks. This validates our approach.

关键词： automatic parallelization parallelizing compilers Java optimizations run-time parallelization task-level parallelism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：