This paper presents a new structured parallel programming model, ''SEQ OF PAR'': based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens fo...
详细信息
ISBN:
(纸本)0818678763
This paper presents a new structured parallel programming model, ''SEQ OF PAR'': based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens formalism (Bh IF) of locality-based parallel computation. This model is to support for more general, architecture-independent parallel programming. It provides a structured approach to integrate task (or process) parallelism and data-parallelism in one framework. The well-founded algebra of CCL and BMF makes it also possible to derive, optimize and verify parallel programs through algebraic transformations. Experimental results show that it is very promising to adopt this programming model for getting efficient, portable parallel code.
In this paper we present the results of a parallel implementation of a heart field simulation algorithm. The application of biomagnetic fields offers a wide range for using parallel algorithms. Pathological changes in...
详细信息
ISBN:
(纸本)0818678763
In this paper we present the results of a parallel implementation of a heart field simulation algorithm. The application of biomagnetic fields offers a wide range for using parallel algorithms. Pathological changes in the human body, especially in the heart muscle, can be diagnosed and localised by means of biomagnetic field parameters. The gain of this diagnose method is to fit an individual reference modell of the heart field of a patient. Based on differences between the reference modell and the real measured biomagnetic field parameters, the type and the position of defects in the heart can be located. The most time consuming components of the whole algorithm are the matrix computations, especially the matrix inversion. The matrix inversion can be implemented on a paralleldistributed memory system. In this paper we discuss the routing, the parallel matrix inversion, and the speed up for different network topologies that depends on the number of processors and different problem sizes.
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplet...
详细信息
ISBN:
(纸本)0818678763
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and Incomplete Modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively paralleldistributed memory computers. The performance of these methods an this kind of architecture is always limited because of the global communication required for the inner products. We will describe the parallelization of PCGLS and lMGS preconditioner by two ways of improvement. One is To assemble the results of a number of inner products collectively and the other is to create situations where communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.
In the course of the development of reactive systems often real rime constraints have to be met. In such time critical applications heterogeneous multi-processor systems are used in order to fulfill these time constra...
详细信息
ISBN:
(纸本)0818678763
In the course of the development of reactive systems often real rime constraints have to be met. In such time critical applications heterogeneous multi-processor systems are used in order to fulfill these time constraints. This paper presents a hybrid partitioning method that uses a stochastic algorithm together with mixed integer linear programming. This method supports the development of time critical systems. We assume that the algorithm which has to be analyzed is given inform of a so-called task-graph. The goal of the overall method is to fix for every task the processor that will execute it and the starting time of this execution. The main issue is a high-level-synthesis-like method for constructing a problem-specific multi-processor board. The presented methods have been fully implemented and tested.
Although highly paralleldistributed memory computers exist for several years, the operating systems used on them did nor fit the requirements very well. Most of them are designed for sequential, shared memory paralle...
详细信息
ISBN:
(纸本)0818678763
Although highly paralleldistributed memory computers exist for several years, the operating systems used on them did nor fit the requirements very well. Most of them are designed for sequential, shared memory parallel, or distributed computers. Examples are Unix on the IBM SP/2 [17] and Mach on the Intel Paragon. This results in poor scalability caused by inefficient communication primitives designed for wide area networks or by waste of resources due to huge kernels (e.g. 8 MB per node are reported for Mach on the Paragon, [16]), which is harmful especially in highly parallel systems with hundreds or thousands of nodes. With Cosy (Concurrent Operating System) we have shown that a well structured and carefully designed system can be small (70 Kb for the kernel, 372 total memory usage per node), efficient (33 mu s for communication), and scalable (applications run efficient on up to 1024 processors).
computing potential is wasted and idle not only when applications are executed, but also when a user navigates by Internet. To take advantage of this, an architecture named Parasite has been designed in order to use d...
详细信息
ISBN:
(纸本)0769518753
computing potential is wasted and idle not only when applications are executed, but also when a user navigates by Internet. To take advantage of this, an architecture named Parasite has been designed in order to use distributed and networked resources, without disturbing local computation. This is the working principle of certain global computing projects, but our development introduces new ideas with respect to user intervention, the distributed programming paradigm or the resident software on each user's computer. The project is based on developing software technologies and infrastructures in order to facilitate Web-based distributedcomputing. This paper outlines the most recent advances in the project, as well as discussing the architecture developed and an experimental framework that would validate this infrastructure.
This volume is the second part of a four-volume set (CCIS 190, CCIS 191, CCIS 192, CCIS 193), which constitutes the refereed proceedings of the First International Conference on computing and Communications, ACC 2011,...
ISBN:
(数字)9783642227141
ISBN:
(纸本)9783642227134
This volume is the second part of a four-volume set (CCIS 190, CCIS 191, CCIS 192, CCIS 193), which constitutes the refereed proceedings of the First International Conference on computing and Communications, ACC 2011, held in Kochi, India, in July 2011. The 72 revised full papers presented in this volume were carefully reviewed and selected from a large number of submissions. The papers are organized in topical sections on database and information systems; distributed software development; human computer interaction and interface; ICT; internet and Web computing; mobile computing; multi agent systems; multimedia and video systems; parallel and distributed algorithms; security, trust andprivacy.
Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s.t. some objective function is optimized. High-quality partitions are important for many applications, whose objective...
详细信息
Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s.t. some objective function is optimized. High-quality partitions are important for many applications, whose objective functions are often NP-hard to optimize. Most state-of-the-art graph partitioning libraries use a variant of the Kernighan-Lin (KL) heuristic within a multilevel framework. While these libraries are very fast, their solutions do not always meet all user requirements. Moreover, due to its sequential nature, KL is not easy to parallelize. its use as a load balancer in parallel numerical applications therefore requires complicated adaptations. That is why we developed previously an inherently parallel algorithm, called BUBBLE-FOS/C [H. Meyerhenke, B. Monien, S. Schamberger, Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid, in: proceedings of the 20th IEEE International parallel and distributed Processing Symposium, IPDPS'06, IEEE Computer Society,.2006, p. 57 (CD)], which optimizes partition shapes by a diffusive mechanism. However, it is too slow for practical use, despite its high solution quality. In this paper, besides proving that BUBBLE-FOS/C converges towards a local optimum of a potential function, we develop a much faster method for the improvement of partitionings. This faster method called TRUNCCONS is based on a different diffusive process, which is restricted to local areas of the graph and also contains a high degree of parallelism. By coupling TRUNCCONS with BUBBLE-FOS/C in a multilevel framework based on two different hierarchy construction methods, we obtain our new graph partitioning heuristic DIBAP. Compared to BUBBLE-FOS/C, DIBAP shows a considerable acceleration, while retaining the positive properties of the slower algorithm. Experiments with popular benchmark graphs show that DIBAP computes consistently better results than the state-of-the-art libraries METIS and JOSTLE. Moreover, with our
This article gives a brief overview of theoretical advances, computing trends, applications and future perspectives in parallel genetic algorithms. The information is segregated into two periods before and after the y...
详细信息
ISBN:
(纸本)0769521320
This article gives a brief overview of theoretical advances, computing trends, applications and future perspectives in parallel genetic algorithms. The information is segregated into two periods before and after the year 2000 and in all chapters. The second period is more interesting and of higher importance, because it highlights recent research efforts and gives some hints about possible future trends. That is why we devote much space to the second period. As there is no such an overview of the recent period of parallel genetic algorithms, we find our investigation to be important in many aspects.
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on rem...
详细信息
ISBN:
(纸本)0818681187
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on remote high-performance computers. A Web browser running on the user's machine provides a user-friendly interface to sewer-site user accounts and allows the use of parallelcomputing platforms and software in a convenient manner. The user may create, edit, and execute files through this Web browser interface. This new Web-based client-sewer architecture has the potential of being used as a future front-end to high-performance computer systems. We discuss the design and implementation of several prototype WPPEs that are currently in use at the Northeast parallel Architectures Center and the Cornell Theory Center These initial prototypes support high-level parallel programming with Fortran 90 and Nigh Performance Fortran (HPF), as well as explicit tow-level programming with Message Passing Interface (MPI). We detail the lessons learned during the development process and outline the tradeoffs of various design choices in the realization of the design. We especially concentrate on providing sewer-site user accounts, mechanisms to access those accounts through the Web, and the Web-related system security issues.
暂无评论