parallelization strategies based on domain partitioning techniques have been widely adopted for parallel finite element computations because of their suitability to distributed memory platforms. In most cases, this pa...
详细信息
ISBN:
(纸本)0769507689
parallelization strategies based on domain partitioning techniques have been widely adopted for parallel finite element computations because of their suitability to distributed memory platforms. In most cases, this parallelization is based on non-overlapping partitions especially for Computational Structural Mechanics applications. However, finite volume (or mixed finite element/finite volume) discretization methods, which are frequently implemented in Computational Fluid Dynamics applications, generally require the use Of overlapping mesh partitions to keep the parallelization work simple. Unfortunately many tools on which the partitioning step relies give poop results when asked for overlapping partitions. In this paper, we describe an efficient method to transform a non-overlapping partition of a domain into an over-lapping one. We also propose an optimization strategy for overlapping partitions that mainly aims at reducing the computational load Imbalance as well as the size of the interfaces. The new algorithms demonstrate significant improvements as they are applied to generate overlapping partitions in the context of a parallel mixed finite element/finite volume three-dimensional flow solver.
We investigate the management of flocking mobile objects using a parallel message-passing computer cluster. An octree, a data structure well-known for use in managing a 3D space, is adapted to "span" the clu...
详细信息
ISBN:
(纸本)1892512416
We investigate the management of flocking mobile objects using a parallel message-passing computer cluster. An octree, a data structure well-known for use in managing a 3D space, is adapted to "span" the cluster. Objects are distributed in the tree, and partitions of the tree are distributed among the processors in such a way that a minimum of global information is required to be shared by the processors. When objects move, the tree is modified accordingly;this in turn may cause partitions to migrate processors. Two constraints drive the distribution algorithm: (1) minimizing message traffic by clustering nearby objects on the same processor, and (2) processor load-balancing. Boids, flocking artificial life forms, embody the objects in this study. The performance of the system is measured in terms of the inter-processor message traffic as a function of the number, interactivity, and mobility of objects. An application of the scheme allows external clients to view objects in specified spatial loci.
Java is a valuable and emerging alternative for the development of parallelapplications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of bo...
详细信息
ISBN:
(纸本)9780769535449
Java is a valuable and emerging alternative for the development of parallelapplications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption in this field, although it is difficult to evaluate accurately its performance due to the lack of standard benchmarks in Java. This paper presents NPB-MPJ, the first extensive implementation of the NAS parallel Benchmarks (NPB), the standard parallel benchmark suite, for Message-Passing in Java (MPJ) libraries. Together with the design and implementation details of NPB-MPJ, this paper gathers several optimization techniques that can serve as a guide for the development of more efficient Java applications for High Performance Computing (HPC). NPB-MPJ has been used in the performance evaluation of Java against C/Fortran parallel libraries on two representative multi-core clusters. Thus, NPB-MPJ provides an up-to-date snapshot of MPJ performance, whose comparative analysis of current Java and native parallel solutions confirms that MPJ is an alternative for parallel programming multi-core systems.
Despite their crucial impact on the performances of P2P systems, very few is known on exchanges actually processed in such networks. We propose here a first step towards a better understanding of these exchanges by me...
详细信息
ISBN:
(纸本)1932415262
Despite their crucial impact on the performances of P2P systems, very few is known on exchanges actually processed in such networks. We propose here a first step towards a better understanding of these exchanges by measuring them in a running environment and making some statistical analyses of the obtained data. We use in particular the tools from complex network analysis.
Many recent large-scale distributed computing applications utilize spare processor cycles of personal computers. The resulting distributed computing platforms provide computational power that previously was available ...
详细信息
ISBN:
(纸本)9781932415582
Many recent large-scale distributed computing applications utilize spare processor cycles of personal computers. The resulting distributed computing platforms provide computational power that previously was available only through the use of expensive supercomputers. However, distributed computations running in untrusted or unstable environments raise a number of concerns, including the potential for disrupting computations and many security issues. It is shown that the standard techniques for managing these issues, i. e. replication and/or redundancy, still do not always resolve situations where computational integrity is threatened. This paper presents a generalized strategy for applying redundancy in a manner that is tunable and provides several advantages. In addition, the improvement is achieved without an increase in the amount of computation required by participants and only a slight increase in task tracking overhead.
The paper presents performance results of parallel matrix LU decomposition algorithms with row-wise cyclic striping and partial pivoting implemented within several message passing distributed environments. Two environ...
详细信息
ISBN:
(纸本)1892512459
The paper presents performance results of parallel matrix LU decomposition algorithms with row-wise cyclic striping and partial pivoting implemented within several message passing distributed environments. Two environments have been chosen: message passing interface (MPI with point-to-point communication routines;environment portable and not interoperable), parallel virtual machine (PVM- a dynamic collection of potentially heterogeneous computing resources;environment portable and interoperable). The following performance measures are studied and results are presented standard speedup, scaled speedup, and efficiency. The conclusions are supported by experimental results conducted for matrix sizes up to 2000 x 2000 with 2, 4, 6, and 8 processors. From experiments it is apparent that if application is going to be developed on a massively parallel processor or on a homogeneous network, then MPI may be a choice because of its good communication performance. If an application is going to be developed on a heterogeneous network, then PVM would appear to be the preferred choice. Interoperability of PVM, comparing with MPI, comes with a price. PVM parallel programs level off with sequential programs for matrix sizes 600 x 600 and higher, comparing with MPI parallel programs which achieve the same performance result for matrices of 400 x 400.
This paper presents the design of a new bit-serial floating-point unit (FPU), It has been developed for the processors of the Instruction Systolic Array parallel computer model. In contrast to conventional bit-paralle...
详细信息
ISBN:
(纸本)1892512416
This paper presents the design of a new bit-serial floating-point unit (FPU), It has been developed for the processors of the Instruction Systolic Array parallel computer model. In contrast to conventional bit-parallel FPUs the bit-serial approach requires different data formats. Our FPU uses an IEEE compliant internal floating point format that allows a fast least significant bit (LSB)-first arithmetic and can be efficiently implemented in hardware.
It is well known that vector docks capture perfectly the causality relationship among events in a distributed system. However, there are some interesting properties of vector clocks that are still to be explored. In p...
详细信息
ISBN:
(纸本)1932415262
It is well known that vector docks capture perfectly the causality relationship among events in a distributed system. However, there are some interesting properties of vector clocks that are still to be explored. In particular, we are interested in discovering whether there is an efficient procedure for deciding if a given set of vector clocks is contained in some distributed history. We call this the possible vector clock set problem.
Real-time systems are becoming increasingly important in our society. Its applications can be found almost every where ranged from nuclear reactors and automotive controller to games and graphics animation. Real-Time ...
详细信息
ISBN:
(纸本)1932415610
Real-time systems are becoming increasingly important in our society. Its applications can be found almost every where ranged from nuclear reactors and automotive controller to games and graphics animation. Real-Time Linux (RTLinux) is one of the Real-Time system, which is a small hard real-time kernel that can run Linux as its lowest priority thread. In this paper, we focus on the understanding the efficiency of a working RTLinux kernel in handling real-time tasks. Beginning with a brief review of the system and the methods of preferment analysis, we performed and carefully documented a group of experiments to test the performance of the system. The experimental results are then presented with simple statistical analysis. Based on the case study, we conclude that the RTLinux possesses capability and feasibility as a real-time operating system. We also remark on our further study about RTLinux system.
An effective resolution multiprocessor can be built from distributedprocessing, logic programming, and interface elements. Widely used, portable, components can be modularly composed into a portable parallel system t...
详细信息
An effective resolution multiprocessor can be built from distributedprocessing, logic programming, and interface elements. Widely used, portable, components can be modularly composed into a portable parallel system that displays good resistance to premature obsolescence by software evolution. A virtual multiprocessor offering common message passing and configuration services integrates a distributed mesh of sequential resolution engines. Users configure and control the resolution engines and virtual multiprocessor through a GUI using an embedded command language to drive its facilities. Prolog programs either explicitly control parallel execution through message passing or would have to rely on program transformation techniques to extract parallelism implicitly. Copyright (C) 1996 Elsevier Science Ltd
暂无评论