检索结果-内蒙古大学图书馆

Compiling data-parallel programs for clusters of SMPs

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2004年第2-3期16卷 111-132页

作者： Benkner, S Brandes, T Univ Vienna Inst Software Sci A-1090 Vienna Austria SCAI Fraunhofer Inst Algorithms & Sci Comp Schloss Birlinghoven D-53754 St Augustin Germany

Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SNIP clusters significantly increase the complexity of user application development when using the low-level application programming interfaces MPI and OpenMP, forcing users to deal with both distributed-memory and shared-memory parallelization details. In this paper we present extensions of High Performance Fortran (HPF) for SNIP clusters which enable the compiler to adopt a hybrid parallelization strategy, efficiently combining distributed-memory with shared-memory parallelism. By means of a small set of new language features, the hierarchical structure of SNIP clusters may be specified. This information is utilized by the compiler to derive inter-node data mappings for controlling distributed-memory parallelization across the nodes of a cluster and intra-node data mappings for extracting shared-memory parallelism within nodes. Additional mechanisms are proposed for specifying inter- and intra-node data mappings explicitly, for controlling specific shared-memory parallelization issues and for integrating OpenMP routines in HPF applications. The proposed features have been realized within the ADAPTOR and VFC compilers. The parallelization strategy for clusters of SMPs adopted by these compilers is discussed as well as a hybrid-parallel execution model based on a combination of MPI and OpenMP. Experimental results indicate the effectiveness of the proposed features. Copyright (C) 2004 John Wiley Sons, Ltd.

关键词： SMP clusters parallel programming HPF OpenMP MPI hybrid parallelization

来源：评论

学校读者我要写书评

暂无评论

Cluster computing for digital microscopy

引用

MICROSCOPY RESEARCH AND TECHNIQUE 2004年第2期64卷 204-213页

作者： Carrington, WA Lisin, D Univ Massachusetts Sch Med Dept Physiol Shrewsbury MA 01545 USA Univ Massachusetts Dept Comp Sci Amherst MA 01003 USA

Microscopy is becoming increasingly digital and dependent on computation. Some of the computational tasks in microscopy are computationally intense, such as image restoration (deconvolution), some optical calculations, image segmentation, and image analysis. Several modern microscope technologies enable the acquisition of very large data sets. 3D imaging of live cells over time, multispectral imaging, very large tiled 3D images of thick samples, or images from high throughput biology all can produce extremely large images. These large data sets place a very large burden on laboratory computer resources. This combination of computationally intensive tasks and larger data sizes can easily exceed the capability of single personal computers. The large multiprocessor computers that are the traditional technology for larger tasks are too expensive for most laboratories. An alternative approach is to use a number of inexpensive personal computers as a cluster;that is, use multiple networked computers programmed to run the problem in parallel on all the computers in the cluster. By the use of relatively inexpensive over-the-counter hardware and open source software, this approach can be much more cost effective for many tasks. We discuss the different computer architectures available, and their advantages and disadvantages. (C) 2004 Wiley-Liss, Inc.

关键词： parallel programming image processing computing computer architecture fluorescence

来源：评论

学校读者我要写书评

暂无评论

SuperPAS: A parallel Architectural Skeleton model supporting extensibility and skeleton composition

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2004年 3358卷 985-996页

作者： Akon, Mohammad Mursalin Goswami, Dhrubajyoti Li, Hon Fung Department of Computer Science Concordia University Montreal Que. H3G 1M8 Canada

Application of pattern-based approaches to parallel programming is an active area of research today. The main objective of pattern-based approaches to parallel programming is to facilitate the reuse of frequently occurring structures for parallelism whereby a user supplies mostly the application specific code-components and the programming environment generates most of the code for parallelization. parallel Architectural Skeleton (PAS) is such a pattern-based parallel programming model and environment. The PAS model provides a generic way of describing the architectural/structural aspects of patterns in message-passing parallel computing. Application development using PAS is hierarchical, similar to conventional parallel programming using MPI, however with the added benefit of reusability and high level patterns. Like most other pattern-based parallel programming models, the benefits of PAS were offset by some of its drawbacks such as difficulty in: (1) extending PAS and (2) skeleton composition. SuperPAS is an extension of PAS that addresses these issues. SuperPAS provides a skeleton description language for the generic PAS. Using SuperPAS, a skeleton developer can extend PAS by adding new skeletons to the repository (i.e., extensibility). SuperPAS also makes the PAS system more flexible by defining composition of skeletons. In this paper, we describe SuperPAS and elaborate its use through examples. © Springer-Verlag Berlin Heidelberg 2004.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A scalable low discrepancy point generator for parallel computing

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2004年 3358卷 257-262页

作者： Liu, Kwong-Ip Hickernell, Fred J. Department of Mathematics Hong Kong Baptist University Kowloon Hong Kong

The Monte Carlo (MC) method is a simple but effective way to perform simulations involving complicated or multivariate functions. The Quasi-Monte Carlo (QMC) method is similar but replaces independent and identically distributed (i.i.d.) random points by low discrepancy points. Low discrepancy points are regularly distributed points that may be deterministic or randomized. The digital net is a kind of low discrepancy point set that is generated by number theoretical methods. A software library for low discrepancy point generation has been developed. It is thread-safe and supports MPI for parallel computation. A numerical example from physics is shown. © Springer-Verlag Berlin Heidelberg 2004.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Scientific Computation: A Structured Approach using BSP and MPI

引用

2004年

作者： Rob H. Bisseling

ISBN: (数字)9780191712869

ISBN: (纸本)9780198529392

This book explains the use of the bulk synchronous parallel (BSP) model and the BSPlib communication library in parallel algorithm design and parallel programming. The main topics treated in the book are central to the area of scientific computation: solving dense linear systems by Gaussian elimination, computing fast Fourier transforms, and solving sparse linear systems by iterative methods based on sparse matrix-vector multiplication. Each topic is treated in depth, starting from the problem formulation and a sequential algorithm, through a parallel algorithm and its cost analysis, to a complete parallel program written in C and BSPlib, and experimental results obtained using this program on a parallel computer. Throughout the book, emphasis is placed on analyzing the cost of the parallel algorithms developed, expressed in three terms: computation cost, communication cost, and synchronization cost. The book contains five example programs written in BSPlib, which illustrate the methods taught. These programs are freely available as the package BSPedupack. An appendix on the message-passing interface (MPI) discusses how to program in a structured, bulk synchronous parallel style using the MPI communication library, and presents MPI equivalents of all the programs in the book.

关键词： bulk synchronous parallel communication fast Fourier transform linear system message-passing interface parallel algorithm parallel programming sparse matrix-vector multiplication

来源：评论

学校读者我要写书评

暂无评论

The application of the BSP model on DataGrid

The application of the BSP model on DataGrid

引用

IEEE International Conference on Services Computing (SCC)

作者： Chen Chen Weiqin Tong College of Computer Engineering and Science Shanghai Jiaotong University China

ISBN: (纸本)0769522254

DataGrids are becoming increasingly important for sharing large data collections, achievements and resources. The BSP model is a widely used parallel programming model. The idea of the superstep in the BSP model should be able to help DataGrid access and storage in regular sequence. When services are not isolated from each other in multiuser environments, this should be able to avoid, in the process of data access and storage, the occurrence of four types or phenomena: lost update, dirty read, nonrepeatable read, phantom.

关键词： parallel programming Imaging phantoms

来源：评论

学校读者我要写书评

暂无评论

Developing an aCe solution for two-dimensional strip packing

Developing an aCe solution for two-dimensional strip packing

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： J.E. Dorband C.L. Mumford P.Y. Wang NASA Goddard Space Flight Center Greenbelt MD USA Computer Science Department Cardiff University UK Computer Science Department 4A5 George Mason University Fairfax VA USA

Summary form only given. This paper describes the development of a fine-grained meta-heuristic for solving large strip packing problems with guillotine layouts. An architecture-adaptive environment aCe, and the aCe C parallel programming language are used to implement a massively parallel genetic simulated annealing (GSA) algorithm. The parallel GSA combines the temperature schedule of simulated annealing with the crossover and mutation operators that are applied to chromosome populations in genetic algorithms. For our problem, chromosomes are normalized postfix expressions that represent guillotine strip packings. Preliminary results for some benchmark data sets are reported and indicate that the parallel GSA method holds promise as a technique for solving the strip packing problem.

关键词： Strips parallel programming Simulated annealing programming profession Yarn Biological cells NASA Computer science Genetics Temperature

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of OmniRPC in a grid environment

Performance evaluation of OmniRPC in a grid environment

引用

Workshops on Applications and the Internet (SAINT)

作者： Y. Nakajima M. Sato T. Boku D. Takahashi H. Gotoh Graduate School of Systems & Information Engineering University of Tsukuba Japan Institute of Information Sciences and Electronics University of Tsukuba Japan Knowledge-based Information Engineering Toyohashi University of Technology Japan

OmniRPC is a Grid RPC system for parallel programming in a grid environment. In order to understand the performance characteristics of OmniRPC, we executed a synthetic benchmark program which varies the execution time in remote nodes and the amount of communication on several configurations of our grid environment. The result shows the performance of the application is improved if RPC data transmissions are less than 10 KB, the job time in remote nodes is more than 4 seconds, and RPCs are called more than 256 times. Our result also shows a small performance degradation when using the feature of communication multiplexing. We also measured the performance of the EP application from the NAS parallel benchmark suite. In EP, even if using SSH or the Globus GRAM as methods of agent invocation, both performances are almost the same. As a practical application, we parallelized the CONFLEX molecular confirmation search program using OmniRPC. In the comparison of CONFLEX-G with the CONFLEX MPI version, CONFLEX-G achieves comparable efficiencies to the MPI version and increased speed by using two or more clusters.

关键词： Computer networks Grid computing Distributed computing Knowledge engineering parallel programming Pervasive computing Application software Systems engineering and theory Data communication Degradation

来源：评论

学校读者我要写书评

暂无评论

Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters

Performance comparison of pure MPI vs hybrid MPI-OpenMP para...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： N. Drosinos N. Koziris School of Electrical and Computer Engineering Computing Systems Laboratory National and Technical University of Athens Athens Greece

Summary form only given. We compare the performance of three programming paradigms for the parallelization of nested loop algorithms onto SMP clusters. More specifically, we propose three alternative models for tiled nested loop algorithms, namely a pure message passing paradigm, as well as two hybrid ones, that implement communication both through message passing and shared memory access. The hybrid models adopt an advanced hyperplane scheduling scheme, that allows both for minimal thread synchronization, as well as for pipelined execution with overlapping of computation and communication phases. We focus on the experimental evaluation of all three models, and test their performance against several iteration spaces and parallelization grains with the aid of a typical microkernel benchmark. We conclude that the hybrid models can in some cases be more beneficial compared to the monolithic pure message passing model, as they exploit better the configuration characteristics of an hierarchical parallel platform, such as an SMP cluster.

关键词： Message passing Concurrent computing parallel programming Yarn Clustering algorithms Systems engineering and theory Laboratories Electronic mail Processor scheduling Testing

来源：评论

学校读者我要写书评

暂无评论

Multi-processor SoC design methodology using a concept of two-layer hardware-dependent software 04

Multi-processor SoC design methodology using a concept of tw...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Sungjoo Yoo M.-W. Youssef A. Bouchhima A.A. Jerraya M. Diaz-Nava TIMA Laboratory Grenoble France STMicroelectronics Grenoble France

ISBN: (纸本)9780769520858

In conventional multiprocessor SoC (MPSoC) design methods, we find two problems: lack of SW code portability and lack of early SW validation. The problems cause a long design cycle. To resolve them, we present a concept of two-layer hardware-dependent software (HdS). The presented HdS consists of hardware abstraction layer to abstract the sub-system architecture and SoC abstraction layer to abstract the global MPSoC architecture. During the exploration of global and sub-system architectures, the application programming interfaces of presented two-layer HdS allow to keep the SW independent from architectural change. The simulation models of two-layer HdS enable to validate the entire system including the SW and HW design early in the design steps. We show the effectiveness of the presented methodology in the MPSoC architecture exploration of an OpenDiVX encoder system design.

关键词： Design methodology Computer architecture Hardware Application software Network-on-a-chip parallel programming Laboratories Microelectronics Communication networks Software prototyping

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：