检索结果-内蒙古大学图书馆

international symposium on parallel Processing

作者： M. Kandemir A. Choudhary J. Ramanujam P. Banerjee CPDC Department of Electrical and Computer Engineering Northwestern University Evanston IL USA Department of Electrical and Computer Engineering Louisiana State University Baton Rouge LA USA

In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advances in caches help in better utilization of the memory hierarchy, compiler-directed locality enhancement techniques are also important. In this paper we propose a locality improvement technique that uses data space (array layout) transformations in contrast to most of the previous work based on iteration space (loop) transformations. In other words, rather than changing the order of loop iterations, our technique modifies the memory layouts of multi-dimensional arrays. In comparison with previous work on data transformations it brings two novelties. First, we formulate the problem on a special graph structure called the layout graph (LG) and use integer linear programming (ILP) methods to determine optimal layouts. Second, in addition to static layout detection, our approach also enables the compiler to determine optimal dynamic layouts; that is, the layouts that can be changed across loop nest boundaries. We believe that this is the first attempt to determine optimal dynamic memory layouts. We also present preliminary experimental results on the SGI Origin 2000 distributed shared memory multiprocessor. Our results so far are encouraging and indicate that the additional compilation time taken by the solver is tolerable.

关键词： Random access memory Data structures Data mining Memory management Integer linear programming parallel machines Cache memory Optimizing compilers Program processors Law

来源：评论

学校读者我要写书评

暂无评论

OpenMP for networks of SMPs

OpenMP for networks of SMPs

引用

international symposium on parallel Processing

作者： Y.C. Hu Honghui Lu A.L. Cox W. Zwaenepoel Department of Computer Science Rice University Houston TX USA Department of Electrical and Computer Engineering Rice University Houston TX USA

In this paper we present the first system that implements OpenMP on a network of shared-memory multiprocessors. This system enables the programmer to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is implemented via a translator that converts OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed memory system (SDSM). In contrast to previous SDSM systems for SMPs, the modified TreadMarks uses POSIX threads for parallelism within an SMP node. This approach greatly simplifies the changes required to the SDSM in order to exploit the intra-node hardware shared memory. We present performance results for six applications (SPLASH-2 Barnes-Hut and Water; NAS 3D-FFT, SOR, TSP and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the threaded implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly reduces the amount of data and the number of messages transmitted between nodes, and consequently achieves speedups up to 30% better than the original versions. We also compare SDSM against message passing. Overall, the speedups of multithreaded TreadMarks programs are within 7-30% of the MPI versions.

关键词： Switched-mode power supply programming profession Message passing parallel programming Computer science Automatic logic units Runtime library Yarn algorithms Open source software

来源：评论

学校读者我要写书评

暂无评论

Design and optimization of a parallel architecture dedicated to image matching

Design and optimization of a parallel architecture dedicated...

引用

IEEE international Conference on Image Processing

作者： E.E. Pissaloux F. Le Coat P. Bonnin A. Tissot F. Durbin Université de Rouen Velizy France Laboratoire de Robotique de Paris Velizy France Institut d'Electronique Fondamentale Université Paris 1 Orsay France CEA DRIF DCRE ISEIM France

The design and optimised (in time and space) implementation of a systolic circuit dedicated to aerial image matching is proposed. The final run time data adaptive architecture evaluation with Xilinx XC 4010 XL offers ... 详细信息

关键词： Design optimization parallel architectures Image matching Dynamic programming Circuits Biomedical optical imaging Image motion analysis Heuristic algorithms Optical arrays Field programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

Data management for large-scale scientific computations in high performance distributed systems

Data management for large-scale scientific computations in h...

引用

international symposium on High Performance Distributed Computing

作者： A. Choudhary M. Kandemir H. Nagesh J. No X. Shen V. Taylor S. More R. Thakur Center for Parallel and Distributed Computing Department of Electrical and Computer Engineering Northwestern University Evanston IL USA Mathematics and Computer Science Division Argonne National Laboratory Argonne IL USA

With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.

关键词： Large-scale systems Distributed computing High performance computing Data visualization Engineering management Concurrent computing Relational databases Art Data analysis Image analysis

来源：评论

学校读者我要写书评

暂无评论

parallel coprocessor architectures for Molecular Dynamics simulation: A case study in design space exploration

Parallel coprocessor architectures for Molecular Dynamics si...

引用

IEEE international symposium on Circuits and Systems (ISCAS 98)

作者： Gerber, M Gossi, T ETH Zurich Comp Engn & Networks Lab CH-8092 Zurich Switzerland

ISBN: (纸本)0780344553

The purpose of the paper is to describe a new semi-automated design space exploration method based on genetic programming. A new control/dataflow specification method is proposed as well as appropriate models for hardware parts and algorithms. With this method we are able to test many different hardware architectures and algorithms against cost, speed, computation time and other constraints within very short time. The remaining manual work is to exploit the model parameters of the components of the architecture and the algorithm. In contrast to other approaches our method is suited for embedded and distributed systems. The method, models and application are explained in detail by means of a comprehensive case study.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel tree building on a range of shared address space multiprocessors: algorithms and application performance 1

Parallel tree building on a range of shared address space mu...

引用

1st Merged international parallel Processing symposium/symposium on parallel and Distributed Processing (IPPS/SPDP 1998)

作者： Shan, HZ Singh, JP Princeton Univ Dept Comp Sci Princeton NJ 08544 USA

ISBN: (纸本)0818684038

irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a coherent shared address space programming model. As more and more supercomputing platforms that can support different programming models become available to users, from tightly-coupled hardware-coherent machines to clusters of workstations or SMPs, to truly deliver on its ease of programing advantages to application users it is important that the shared address space model nor only perform and scale well in the rightly-coupled case but also port well in performance across the range of platforms (as the message passing model can). For tree-based N-body applications, this is currently not true: While the actual computation of interactions ports well, the parallel tree building phase can become a severe bottleneck on coherent shared address space platforms, in particular an platforms with less aggressive, commodity-oriented communication architectures (even though it rakes less than 3 percent of the time in most sequential executions). We therefore investigate the performance of five parallel tree building methods in the context of a complete galaxy simulation on four very different platforms that support this programming model: an SGI Origin2000 (an aggressive hardware cache-coherent machine with physically distributed memory), an SGI Challenge bits-based shared memory multiprocessor art Intel Paragon running a shared virtual memory protocol in software at page granularity, and a Wisconsin Typhoon-zero in which the granularity of coherence can be varied using hardware support but the protocol runs in software (in the last case using both a page-based and a fine-grained protocol). We find that the algorithms used successfully and widely distributed so far for the first two platforms cause overall application performance to be very poor on the latter two commodit

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

programming with divide-and-conquer skeletons: A case study of FFT

引用

JOURNAL OF SUPERCOMPUTING 1998年第1-2期12卷 85-97页

作者： Gorlatch, S Univ Passau D-94030 Passau Germany

We demonstrate an approach to parallel programming, based on skeletons - parameterized program schemas with efficient implementations over diverse architectures. The contribution of the paper is two-fold: (1)we classify divide-and-conquer (DC) algorithms and provide a family of provably correct parallel implementations for a particular DC skeleton, called DH (distributable homomorphism);(2) we adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH skeleton and, thereby, obtain a generic SPMD program, well suited for implementation under MPI. The generic program includes the efficient FFT solutions used in practice - the binary-exchange and the 2D- and 3D-transpose implementations - as special cases.

关键词： parallel programming skeletons divide-and-conquer Bird-Meertens formalism (BMF) Fast Fourier Transform (FFT)

来源：评论

学校读者我要写书评

暂无评论

Linear programming models for scheduling systems of affine recurrence equations - a comparative study 98

Linear programming models for scheduling systems of affine r...

引用

Proceedings of the 1998 10th Annual ACM symposium on parallel algorithms and architectures, SPAA

作者： Balev, S. Quinton, P. Rajopadhye, S. Risset, T. Inst of Mathematics and Informatics Sofia Bulgaria

ISBN: (纸本)9780897919890

We study the problem of scheduling systems of affine recurrence equations (SAREs), a convenient formalism for modeling massively parallel computations. We unify in a single framework, the two most important methods for solving the problem: the Farkas method and the vertex method, both using linear programming. Then we compare the efficiency of the methods, in term of number of variables, number of constraints and execution time of the resolution, on real-word examples arising from parallelization problems. Our conclusions show that the Farkas method is significantly better than the vertex method.

关键词： Linear programming

来源：评论

学校读者我要写书评

暂无评论

Vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines 1

Vector prefix and reduction computation on coarse-grained, d...

引用

1st Merged international parallel Processing symposium/symposium on parallel and Distributed Processing (IPPS/SPDP 1998)

作者： Bae, S Kim, D Ranka, S ETRI Parallel Programming Sect Taejon South Korea

ISBN: (纸本)0818684038

Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. Our algorithms are relatively architecture independent and can be used effectively in many applications such as Pack/Unpack, Array Prefix/Reduction Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

PACE: Processor architectures for circuit emulation 12th

引用

10 IPPS/SPDP 98 Workshops Held in Conjunction with the 12th international parallel Processing symposium / 9th symposium on parallel Distributed Processing

作者： Kolla, R Springauf, O Univ Wurzburg Lehrstuhl Tech Informat D-97070 Wurzburg Germany

ISBN: (纸本)3540643591

We describe a family of reconfigurable parallel architectures for logic emulation. They are supposed to be applicable like conventional FPGAs, while covering a larger range of circuit sizes and clock frequencies. In order to evaluate the performance of such programmable designs, we also need software methods for code generation from circuit descriptions. We propose a combination of scheduling and routing algorithms for embedding calculations into the target architecture.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：