检索结果-内蒙古大学图书馆

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： V. Chaudhary Chengzhong Xu S. Roy Jialin Ju V. Sinha Laiwu Luo Parallel and Distributed Computing Laboratory Department of Electrical and Computer Engineering Wayne State University MI USA

In this paper, we present the design and evaluation of a compiler system, called APE, for automatic parallelization of scientific and engineering applications on distributed memory computers. APE is built on top of SUIF compiler. It extends SUIF with capabilities in parallelizing loops with non-uniform cross-iteration dependencies, and in handling loops that have indirect access patterns. We have evaluated the effectiveness of SUIF with several CFD test codes, and found that SUIF handles uniform loops over dense and regular data structures very well. For non-uniform loops, an innovative and efficient parallelization approach based on convex theory have been proposed and is being implemented. We have also presented a class of scalable algorithms for parallel distribution and redistribution of unstructured data structures during parallelizing irregular loops.

关键词： Program processors Distributed computing Concurrent computing Computational fluid dynamics Data structures Quantum computing programming profession parallel processing Design engineering Optimizing compilers

来源：评论

学校读者我要写书评

暂无评论

An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+

An MPI library which uses polling, interrupts and remote cop...

引用

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： D. Sitsky E. Hayashi Department of Computer Science CAP Research Program Australian National University Canberra ACT Australia High Performance Computing Group Fujitsu Laboratories Limited Kawasaki Japan

A complete implementation of MPI for the Fujitsu AP1000+ is presented. The library can employ a number of different mechanisms in implementing the send and receive message passing operations. The method of detecting the arrival of new messages can be realized through interrupt-driven and polling techniques. Transferring message data is achieved by either sending the message data directly to the receiver "in-place", or using a rendezvous method which allows the use of a fast noncopying nonblocking remote-fetching operation. The MPI library exhibits good performance compared to the native message passing library, and allows the user to decide at runtime which mechanisms will be used in order to achieve the best performance on a per-application basis.

关键词： Message passing Broadcasting Routing Computer science Runtime library parallel programming Standards organizations Workstations National electric code High performance computing

来源：评论

学校读者我要写书评

暂无评论

The PASM project: a study of reconfigurable parallel computing

The PASM project: a study of reconfigurable parallel computi...

引用

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： H.J. Siegel T.D. Braun H.G. Dietz M.B. Kulaczewski M. Maheswaran P. Pero J.M. Siegel J.J.E. So Min Tan M.D. Theys Lee Wang School of Electrical and Computer Engineering Parallel Processing Laboratory Purdue University West Lafayette IN USA

PASM is a concept for a parallel processing system that allows experimentation with different architectural design alternatives. PASM is dynamically reconfigurable along three dimensions: partitionability into independent or communicating submachines, variable interprocessor connections, and mixed-mode SIMD/MIMD parallelism. With mixed-mode parallelism, a program can switch between SIMD (synchronous) and MIMD (asynchronous) parallelism at instruction-level granularity, allowing the use of both modes in a single machine. The PASM concept is presented, showing the ways in which reconfiguration can be accomplished. Trade-offs among SIMD/MIMD, and mixed-mode parallelism are explored. The small-scale PASM prototype with 16 processing elements is described. The ELP mixed-mode programming language used on the prototype is discussed. An example of a prototype-based study that demonstrates the potential of mixed-mode parallelism is given.

关键词： parallel processing Switches Prototypes Concurrent computing Costs Communication switching Broadcasting Decoding Yarn Hardware

来源：评论

学校读者我要写书评

暂无评论

Adsmith: an efficient object-based distributed shared memory system on PVM

Adsmith: an efficient object-based distributed shared memory...

引用

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： Wen-Yew Liang Chun-Ta King Feipei Lai Department Computer Science and Information Engineering National Taiwan University Taipei Taiwan Department Computer Science National Tsing Hua University Hsinchu Taiwan Department Electrical Engineering & Department Computer Sciencand Information Engineeringe National Taiwan University Taipei Taiwan

ISBN: (纸本)0818674601

In this paper, we describe an object-based distributed shared memory called Adsmith. In an object-based DSM, the shared memory consists of many shared objects, through which the shared memory is accessed. Adsmith is built on top of PVM at the library layer using C++. PVM is used as the communication subsystem, because it is a de facto standard and encapsulates many system related details. Several mechanisms are used to improve the performance of Adsmith, such as release memory consistency, load/store-like memory accesses, nonblocking accesses, and atomic operations, etc. Performance results show that even though Adsmith is implemented on top of PVM, programs running on Adsmith can achieve a performance comparable with those running directly on PVM.

关键词： programming profession Computer science Distributed computing Libraries Concurrent computing High performance computing Memory management Hardware Operating systems Communication standards

来源：评论

学校读者我要写书评

暂无评论

An element-based concurrent partitioner for unstructured finite element meshes

An element-based concurrent partitioner for unstructured fin...

引用

international symposium on parallel Processing

作者： H.Q. Ding R.D. Ferraro Jet Propulsion Laboratory California Institute of Technology Pasadena CA USA

A concurrent partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The partitioner uses an element-based partitioning strategy. Its main advantage over the more conventional node-based partitioning strategy is its modular programming approach to the development of parallel applications. The partitioner first partitions element centroids using a recursive inertial bisection algorithm. Elements and nodes then migrate according to the partitioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a non-scalable implementation which is a straightforward parallelization of a sequential partitioner. The algorithms adopted in the partitioner scale logarithmically, as confirmed by actual timing measurements on the Intel Delta on up to 512 processors for scaled size problems.

关键词： Partitioning algorithms Finite element methods Memory architecture parallel programming Timing Size measurement

来源：评论

学校读者我要写书评

暂无评论

High-Performance Fortran and possible extensions to support conjugate gradient algorithms 96

High-Performance Fortran and possible extensions to support ...

引用

international symposium on High Performance Distributed Computing

作者： K. Dincer G.C. Fox K. Hawick School of Computer and Information Sciences Northeast Parallel Architectures Center Syracuse University NY USA

ISBN: (纸本)9780818675829

Evaluates the High Performance Fortran (HPF) language for the compact expression and efficient implementation of conjugate-gradient iterative matrix-solvers on high-performance computing and communications (HPCC) platforms. We discuss the use of intrinsic functions, data distribution directives and explicitly parallel constructs to optimize performance by minimizing communications requirements in a portable manner. We focus on implementations using the existing HPF definitions but also discuss issues arising that may influence a revised definition for HPF-2. Some of the codes discussed are available on the World Wide Web at http://***/hpfa/, along with other educational and discussion material related to applications in HPF.

关键词： Character generation Sparse matrices Concurrent computing programming profession Equations Iterative algorithms Physics computing Distributed computing Program processors Fluid dynamics

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel computational geometry for coarse grained multicomputers

引用

international JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS 1996年第3期6卷 379-400页

作者： Dehne, F Fabri, A RauChaplin, A CARLETON UNIV SCH COMP SCIOTTAWAON K1S 5B6CANADA INRIA F-06902 SOPHIA ANTIPOLISFRANCE

We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memory and all processors are connected via some arbitrary interconnection network (e.g. mesh, hypercube, fat tree). We present O(T-sequential/p + T-s(n,p)) time scalable parallel algorithms for several computational geometry problems. T-s(n,p) refers to the time of a global sort operation. Our results are independent of the multicomputer's interconnection network. Their time complexities become optimal when T-sequential/p dominates T-s(n,p) or when T-s(n,p) is optimal. This is the case for several standard architectures, including meshes and hypercubes, and a wide range of ratios n/p that include many of the currently available machine configurations. Our methods also have some important practical advantages: For interprocessor communication, they use only a small fixed number of one global routing operation, global sort, and all other programming is in the sequential domain. Furthermore, our algorithms use only a small number of very large messages, which greatly reduces the overhead for the communication protocol between processors. (Note however, that our time complexities account for the lengths of messages.) Experiments show that our methods are easy to implement and give good timing results.

关键词： computational geometry parallel algorithms scalability

来源：评论

学校读者我要写书评

暂无评论

Program transformations and skeletons: Formal derivation of parallel programs 1

Program transformations and skeletons: Formal derivation of ...

引用

1st Aizu international symposium on parallel algorithms/Architecture Synthesis, AISPAS 1995

作者： Geerling, A. Max Computing Science Institute University of Nijmegen Toernooiveld 1 NijmegenNL-6525 ED Netherlands

ISBN: (纸本)081867038X

The paper describes-from a software engineering perspective-a framework for the formal development of parallel algorithms on arbitrary architectures. The algorithms are synthesised in a transformational way, i.e. by applying correctness preserving rewrite rules to a formal specification. The architectures are modelled by skeletons-higher order functions that represent elementary computations on a certain architecture. It is shown that the combination of transformational programming and skeletons stimulates the reuse of program derivations. Furthermore, interskeleton transformations will provide the means for architecture independent program development. © 1995 IEEE.

关键词： Computer software reusability

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 1st Aizu international symposium on parallel algorithms/Architecture Synthesis, AISPAS 1995

Proceedings - 1st Aizu International Symposium on Parallel A...

引用

1st Aizu international symposium on parallel algorithms/Architecture Synthesis, AISPAS 1995

ISBN: (纸本)081867038X

The proceedings contain 42 papers. The topics discussed include: improvement of duplication scheduling heuristic algorithm with nonstrict triggering of program graph nodes;cohesion : an efficient distributed shared memory system supporting multiple memory consistency models;supercompilers for massively parallel architectures;investigation of some hardware accelerators for relational algebra operations;implementing higher-order gamma on MasPar: a case study;a framework for visual parallel programming;parallelizing a PDE solver: experiences with PISCES-MP;efficient scalable mesh algorithms for merging, sorting and selection;and constructing parallel implement at ions with algebraic programming tools.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Methods and tools for the efficient use of parallel computer architectures 1

Methods and tools for the efficient use of parallel computer...

引用

1st Aizu international symposium on parallel algorithms/Architecture Synthesis, AISPAS 1995

作者： Bode, Arndt Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation Technische Universitat Munchen MunchenD-80290 Germany

ISBN: (纸本)081867038X

This article covers research at Technische Universität München on distributed and parallel architectures and applications. First, an overview on the parallel processing research organization is given. The second main topic covers an integrated hierarchical programming environment TOPSYS for parallel and distributed systems developed as part of the research grant. © 1995 IEEE.

关键词： Hierarchical systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：