The finite element method is widely applied to many domains, such as engineering, atmology, oceanography, biology, etc. The major drawback of the finite element method is that its execution takes a lot of time and mem...
详细信息
The finite element method is widely applied to many domains, such as engineering, atmology, oceanography, biology, etc. The major drawback of the finite element method is that its execution takes a lot of time and memory spaces. Due to the computation-intensiveness and computation-locality properties, we can use the parallel processing method to improve the performance of the finite element method on distributed memory computing environments. However, it is quite difficult to program the finite element method on a distributed memory computing environment. Therefore, the development of a front-end parallel partial differential equations solver generation system is important. In this paper, we want to develop a front-end parallel partial differential equations solver generation system based on the World Wide Web on a distributed-memory computing environment, such as a PC cluster, a workstation cluster, etc. With the system, users who want to use parallel computers to solver partial differential equations can use web browser to input data and parameters. The system will automatically generate the corresponding parallel codes and execute the codes on the distributed memory computing environment. The execution result will be shown on the web browser. The results can also be download by user.
The authors describe a reference implementation and evaluation of Scalable I/O Low-Level API (SIO-LLAPI) on a scalable file system (COSMOS) of the Dawning2000 cluster system. This prototype provides scalable I/O API, ...
详细信息
ISBN:
(纸本)0769505892
The authors describe a reference implementation and evaluation of Scalable I/O Low-Level API (SIO-LLAPI) on a scalable file system (COSMOS) of the Dawning2000 cluster system. This prototype provides scalable I/O API, keeps COSMOS compatible file structure and stripping algorithm, and runs as a user-level library. We present the initial experiences and evaluation results. It is observed that SIO-LLAPl provides the opportunities of significant I/O performance improvement on a cluster system.
With the development of clusters based on high performance networks, it is now possible to design efficient Distributed Shared Memory systems. In this paper we present the approach we choose to implement a high perfor...
详细信息
With the development of clusters based on high performance networks, it is now possible to design efficient Distributed Shared Memory systems. In this paper we present the approach we choose to implement a high performance DSM system on top of a cluster by combining the use of low-latency communication protocols (MPI-BIP on Myrinet networks) with multithreading approach (PM2). We present our approach called Distributed Objects Shared MemOry System (DOSMOS system), its design and experiments performed on various communication libraries (PVM, MPI) and on various networks (Ethernet, Myrinet).
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programmingparallel applications for such platforms is their hierarchical network structure: latency and ban...
详细信息
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programmingparallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our goal is to optimize MPI's collective operations for such platforms. In this paper we focus on optimized utilization of the (scarce) wide-area bandwidth. We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links. To determine the best graph shape and segment size, we introduce a performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length. With P-LogP, the optimal segment size and the best broadcast tree shape can be determined at runtime. (For conciseness, we restrict our discussion to the broadcast operation). An experimental performance evaluation shows that the new broadcast has significantly improved performance (for large messages) and that there is a close match between the theoretical model and the measured completion times.
Networks of workstations are fast becoming the standard environment for parallel applications. However, the use of "found" resources as a platform for tightly-coupled runtime environments has at least three ...
详细信息
Networks of workstations are fast becoming the standard environment for parallel applications. However, the use of "found" resources as a platform for tightly-coupled runtime environments has at least three obstacles: contention for resources, differing processor speeds, and processor heterogeneity. All three obstacles result in load imbalance, leading to poor performance for scientific applications. This paper describes the use of thread migration in transparently addressing this load imbalance in the context of the CVM software distributed shared memory system. We describe the implementation and performance of mechanisms and policies that accommodate both resource contention, and heterogeneity in clock speed and processor type. Our results show that these cycles can indeed be effectively exploited, and that the runtime cost of processor heterogeneity can be quite manageable. Along the way, however, we identify a number of problems that need to be addressed before such systems can enjoy widespread use.
Heterogeneous computing is a special form of parallel and distributed computing where computations are performed using a single autonomous computer operating in both SIMD and MIMD modes, or using a number of connected...
详细信息
ISBN:
(纸本)0769505007
Heterogeneous computing is a special form of parallel and distributed computing where computations are performed using a single autonomous computer operating in both SIMD and MIMD modes, or using a number of connected autonomous computers. In multimode system heterogeneous computing, tasks can be executed in both SIMD and MIMD simultaneously. In this paper, we present PQE HPF, a High Performance Fortran (HPF) based programming library which allows one to exploit the MIMD and SIMD capabilities offered by PQE-1, a multimode parallel architecture. Two different implementations of a well-known application, using HPF and PQE HPF respectively, were used to evaluate the overheads introduced over the machine's runtime system. Preliminary tests, conducted by running the case study application on the first PQE-1 prototype, show good results and encourage us to dedicate more effort to implement real production parallel codes on a similar architecture.
Network based distributed computing has been gaining popularity over the past decade. Many parallel programming languages and related parallel programming modes are becoming widely accepted. However, the execution of ...
详细信息
Network based distributed computing has been gaining popularity over the past decade. Many parallel programming languages and related parallel programming modes are becoming widely accepted. However, the execution of parallel applications on distributed systems has been hampered by the high communication overhead. To reduce the communication overhead and the completion time of a parallel application, we propose a key message model for parallel computing on network of workstations (NOWs). In the key message model, all messages generated in a key message path are prioritized. A key message path in a task graph is defined as the path that is optimized by the key message algorithm. All messages generated in a key message path are prioritized. Besides, the key message algorithm automatically finds the key message paths. In this paper, we first describe the algorithm that identifies the key messages to be prioritized in a parallel application, then analyze the cost of the algorithm, and finally evaluate the performance of the algorithm in a simulation. Our preliminary analysis of the algorithm shows improvement over the system which does not use prioritization scheme.
A high-level programming support is an essential component for the practical development of computational science applications using the cellular automata model. This paper, after introducing the CARPET language, show...
详细信息
A high-level programming support is an essential component for the practical development of computational science applications using the cellular automata model. This paper, after introducing the CARPET language, shows its practical use for programming cellular automata simulations on parallel computers. CARPET is a high-level language designed for supporting rapid prototyping and full implementation of a large number of science and engineering applications on high-performance computers. The language provides a user with a programming layer that offers constructs for the direct definition of the cellular automata features such as lattice dimension, cell state, neighborhood, and transition function. The CARPET parallel run-time system maps CA programs on a parallel computer hiding the architecture issues to a user and it provides advanced visualization of program's output. The paper describes how practical cellular automata algorithms for lattice gas, gas diffusion simulation, and traffic flow modeling can be designed using the CARPET programming language. (C) 1999 Elsevier Science B.V. All rights reserved.
We describe Actors, a flexible, scalable and efficient model of computation, and develop a framework for analyzing the parallel complexity of programs written in it. Actors are asynchronous, autonomous objects which i...
详细信息
In this paper we present the Orchid system, a new portable and scalable platform for parallel programming, suitable for any type of distributed memory architecture. It includes C libraries that facilitate dynamic proc...
详细信息
In this paper we present the Orchid system, a new portable and scalable platform for parallel programming, suitable for any type of distributed memory architecture. It includes C libraries that facilitate dynamic process allocation, asynchronous process communication, and global process synchronization. It also integrates a set of flexible mechanisms for the implementation of a wide variety of Distributed Shared Memory (DSM) paradigms. As an example, two different DSM paradigms are proposed. Moreover, a new polyparametric model is suggested, which can be used in the performance evaluation of any DSM paradigm. Orchid has been successfully used for the development of a large scale application, i.e. an environment for parallel logic programming, based on attribute grammars.
暂无评论