检索结果-内蒙古大学图书馆

Efficient run-time parallelization for DO loops

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 1998年第1期14卷 237-253页

作者： Yang, CT Tseng, SS Hsieh, MH Kao, SH Natl Space Program Off ROCSAT Ground Segment Hsinchu 300 Taiwan Natl Tsing Hua Univ Dept Informat & Comp Sci Hsinchu 300 Taiwan

A run-time technique based on the inspector-executor scheme is proposed in this paper to find available parallelism on loops. Our inspector can determine the wavefronts by building a DEF-USE table for each loop of a program. Additionally, the process the inspector uses to find the wavefronts can be parallelized fully without any synchronization. Our executor executes loop iterations concurrently. For each wavefront, the auto-adapted function is used to get a tailored thread number instead of using a fixed number of thread for execution. Experimental results show that our new parallel inspector can handle complex data dependency patterns and significantly reduce the execution time.

关键词： run-time loop parallelization inspector executor parallelizing compiler multiprocessor systems

来源：评论

学校读者我要写书评

暂无评论

High performance Fortran compilation techniques for parallelizing scientific codes 98

High performance Fortran compilation techniques for parallel...

引用

Proceedings of the 1998 ACM/IEEE conference on Supercomputing

作者： Vikram Adve Guohua Jin John Mellor-Crummey Qing Yi Rice University Houston TX

ISBN: (纸本)9780897919845

With current compilers for High Performance Fortran (HPF), substantial restructuring and hand-optimization may be required to obtain acceptable performance from an HPF port of an existing Fortran application. A key goal of the Rice dHPF compiler project is to develop optimization techniques that can provide consistently high performance for a broad spectrum of scientific applications with minimal restructuring of existing Fortran 77 or Fortran 90 applications. This paper presents four new optimization techniques we developed to support efficient parallelization of codes with minimal restructuring. These optimizations include computation partition selection for loop nests that use privatizable arrays, along with partial replication of boundary computations to reduce communication overhead; communication-sensitive loop distribution to eliminate inner-loop communications; interprocedural selection of computation partitions; and data availability analysis to eliminate redundant communications. We studied the effectiveness of the dHPF compiler, which incorporates these optimizations, in parallelizing serial versions of the NAS SP and BT application benchmarks. We present experimental results comparing the performance of hand-written MPI code for the benchmarks against code generated from HPF using the dHPF compiler and the Portland Group's pghpf compiler. Using the compilation techniques described in this paper we achieve performance within 15% of hand-written MPI code on 25 processors for BT and within 33% for SP. Furthermore, these results are obtained with HPF versions of the benchmarks that were created with minimal restructuring of the serial code (modifying only approximately 5% of the code).

关键词： HPF parallelizing compiler NAS benchmarks

来源：评论

学校读者我要写书评

暂无评论

Using knowledge-based techniques on loop parallelization for parallelizing compilers

引用

PARALLEL COMPUTING 1997年第3期23卷 291-309页

作者： Yang, CT Tseng, SS Chuang, CD Shih, WC NATL CHIAO TUNG UNIV DEPT COMP & INFORMAT SCIHSINCHU 300TAIWAN

In this paper we propose a knowledge-based approach for solving data dependence testing and loop scheduling problems. A rule-based system, called the K-Test, is developed by repertory grid and attribute ording table to construct the knowledge base. The K-Test chooses an appropriate testing algorithm according to some features of the input program by using knowledge-based techniques, and then applies the resulting test to detect data dependences for loop parallelization. Another rule-based system, called the KPLS, is also proposed to be able to choose an appropriate scheduling by inferring some features of loops and assign parallel loops on multiprocessors for achieving high speedup. The experimental results show that the graceful speedup obtained by our compiler is obvious.

关键词： parallelizing compiler data dependence testing loop parallelization parallel loop scheduling knowledge-based repertory grid analysis speedup

来源：评论

学校读者我要写书评

暂无评论

A note on compiling FORTRAN loop kernels onto a dataflow architecture

引用

PARALLEL COMPUTING 1997年第11期22卷 1545-1557页

作者： Walker, E Morgan, G Cass, B Ulanowski, Z UNIV YORK DEPT COMP SCI YORK YO1 5DD N YORKSHIRE ENGLAND

Currently, dataflow architectures are programmed using applicative languages to ease the task of deriving the dataflow graph during compilation. We summarise our experience gained in prototyping a FORTRAN nested loop kernel compiler for a pipeline-ring dataflow architecture. We present the status of the current implementation and future directions which the development of the compiler will take. Current evidence suggests that it is possible to efficiently compile FORTRAN nested loop kernels directly onto dataflow architectures without the need for additional run-time support mechanisms. We present a scheme for deriving the dataflow graph from the analysis of ''carried'' array variable subscript expressions, and a scheme to map the actors in the dataflow graph onto a pipeline-ring of Field Programmable Gate Array (FPGA) devices.

关键词： FORTRAN loop kernels parallelizing compiler dataflow architecture high performance computing FPGA devices

来源：评论

学校读者我要写书评

暂无评论

Efficient algorithms for data distribution on distributed memory parallel computers

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1997年第8期8卷 825-839页

作者： Lee, PZ Institute of Information Science Taipei Taiwan

Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented.

关键词： component alignment data distribution distributed memory computer Do-loops dynamic programming algorithm for data distribution parallelizing compiler

来源：评论

学校读者我要写书评

暂无评论

Automatic data mapping of signal processing applications

Automatic data mapping of signal processing applications

引用

IEEE International Conference on Application-Specific Systems, Architectures and Processes

作者： Ancourt, C Barthou, D Guettier, C Irigoin, F Jeannet, B Jourdan, J Mattioli, J Ecole des Mines de Paris Fontainebleau France

ISBN: (纸本)0818679581

This paper presents a technique to map automatically a complete digital signal processing (DSP) application onto a parallel machine with distributed memory. Unlike other applications where coarse or medium grain scheduling techniques can be used DSP applications integrate several thousand of tasks and hence necessitate fine grain considerations. Moreover finding an effective mapping imperatively require to take into account both architectural resources constraints and real time constraints. The main contribution of this paper is to show how it as possible to handle and to solve data partitioning, and fine-grain scheduling under the above operational constraints using Concurrent Constraints Logic Programming languages (CCLP). Our concurrent resolution technique undertaking linear and non linear constraints takes advantage of the special features of signal processing applications and provides a solution equivalent to a manual solution for the representative Panoramic Analysis (PA) application.

关键词： parallelizing compiler scheduling constraint logic programming

来源：评论

学校读者我要写书评

暂无评论

Compiling for scalable multiprocessors with polaris

引用

Parallel Processing Letters 1997年第4期7卷 425-436页

作者： Paek, Yunheung Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL 61801 1304 West Springfield Avenue United States

Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. As a consequence, there has been much research on the development of compiler techniques to simplify programming, to increase reliability, and to reduce development costs. For code generation, a compiler applies a number of transformations hi areas such as data privatization, data copying and replication, synchronization, and data and work distribution. In this paper, we discuss our recent work on the development and implementation of a few compiler techniques for some of these transformations. We use Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement our algorithms. The paper includes experimental results obtained by applying our techniques to several benchmark codes. © World Scientific Publishing Company.

关键词： Communication Multiprocessors parallelizing compiler

来源：评论

学校读者我要写书评

暂无评论

PPD: A practical parallel loop detector for parallelizing compilers on multiprocessor systems

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 1996年第11期E79D卷 1545-1560页

作者： Yang, CT Wu, CT Tseng, SS Department of Computer and Information Science National Chiao Tung University Hsinchu 300 . Taiwan

It is well known that extracting parallel loops plays a significant role in designing parallelizing compilers. The execution efficiency of a loop is enhanced when the loop can be executed in parallel or partial parallel, like a DOALL or DOACROSS loop. This paper reports on the practical parallelism detector (PPD) that is implemented in PFPC (a portable FORTRAN parallelizing compiler running on OSF/1) at NCTU to concentrate on finding the parallelism available in loops. The PPD can extract the potential DOALL and DOACROSS loops in a program by invoking a combination of the ZIV test and the I test for verifying array subscripts. Furthermore, if DOACROSS loops are available, an optimization of synchronization statement is made. Experimental results show that PPD is more reliable and accurate than previous approaches.

关键词： parallelizing compiler data dependence analysis dependent tests DOALL loops DOACROSS loops loop parallelization I test

来源：评论

学校读者我要写书评

暂无评论

A compiling technique for dataflow machines - New algorithm for optimum translation from control flow graph into dataflow graph

引用

SYSTEMS AND COMPUTERS IN JAPAN 1996年第4期27卷 12-24页

作者： Yasue, T Muraoka, Y Member School of Science and Engineering Waseda University Tokyo Japan 169

In this paper an optimum algorithm to translate control flow graphs to dataflow graphs is proposed for dataflow execution of sequential programs. Some of the existing analysis methods restrict the specification of a program to be processed while others require a very high analysis cost. The algorithm proposed in this paper (CD translation algorithm), (1) with a very low cost, and (2) for any control structure that can be described by a control flow graph, (3) can generate dataflow programs that give an optimum dataflow execution, Furthermore, this proposed analysis algorithm is designed to handle task level control flow graphs as well as instruction level control flow graphs, which are accepted by the existing methods, so that optimum control is-possible for task level dataflow execution.

关键词： dataflow execution parallelizing compiler control dependency data dependency dataflow analysis

来源：评论

学校读者我要写书评

暂无评论

DPART: AN AUTOMATIC DATA PARTITIONING SYSTEM FOR DISTRIBUTED MEMORY PARALLEL MACHINES

引用

Parallel Algorithms and Applications 1996年第3-4期9卷 205-212页

作者： Zhaohui Duan[a] Zhaoqing Zhang[a] [a] National Research Center for Intelligent Computing Systems Institute of Computing Technology Beijing PR China

One of the most intellectual steps in compiling for distributed memory parallel machines is to determine a suitable data partitioning scheme for a particular program. Most of the parallelizing compilers for these machines provide no or little support to the user in this difficult task. We have developed DPART, an automatic data partitioning system for Fortran 77 procedures. This paper describes the partitioning strategics of alignment, distribution, and processor layout in DPART. Finally we present experimental results for TRED2, DGEFA, and JACOBI procedures to demonstrate the effectiveness of this system.

关键词： Data partitioning distributed memory parallel machines parallelizing compiler

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：