Experiments of large data sets are computationally expensive. Signal processing analysis on a single CPU leads to unacceptably long execution times. The paper presents initial experiments on calculating the time-frequ...
详细信息
Experiments of large data sets are computationally expensive. Signal processing analysis on a single CPU leads to unacceptably long execution times. The paper presents initial experiments on calculating the time-frequency power spectrum using the coarse-grained parallel programming technique. Experimental speedup factors are given and discussed. The measured speedup factor of the time-frequency power spectrum parallel calculation process is sublinear which indicates that the time-frequency power spectrum is a suitable application for parallel programming. The parallel efficiency is acceptable with the lowest value of 75.1% occurring at N = 10. The maximum speedup factor of 9.1 is obtained when N = 12 at 75.3% of efficiency.
Presents a distributed implementation of the Structured Gamma programming language, a language based on the Gamma multi-set rewriting paradigm. Structured Gamma offers, in addition to the advantages introduced by Gamm...
详细信息
ISBN:
(纸本)0769511538
Presents a distributed implementation of the Structured Gamma programming language, a language based on the Gamma multi-set rewriting paradigm. Structured Gamma offers, in addition to the advantages introduced by Gamma, implicit concurrent behavior and a type system where not only types themselves are defined but also the automatic verification of user-defined types at compilation time. The problems and mechanisms involved in an MPI-based implementation of Structured Gamma using a type-checking engine based on the most general unifier (MGU) are investigated.
We consider program modules, e.g. procedures, functions, and methods as the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exists in a set o...
详细信息
We consider program modules, e.g. procedures, functions, and methods as the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exists in a set of C and Java programs on a set of chip-multiprocessor architecture models, and identify what inherent program features, as well as architectural deficiencies, that limit the speedup. Our data complement previous limit studies by indicating that the programming style-object-oriented versus imperative-does not seem to have any noticeable impact on the achievable speedup. Further, we show that as few as eight processors are enough to exploit all of the inherent parallelism. However, memory-level data dependence resolution and thread management mechanisms of recent CMP proposals may impose overheads that severely limit the speedup obtained.
We develop a new software layer called the Automatic parallel Detection Layer (APDL) for the automatic transformation from sequential to parallel code. The main interest, in this research, is the parallelism at loop l...
详细信息
We develop a new software layer called the Automatic parallel Detection Layer (APDL) for the automatic transformation from sequential to parallel code. The main interest, in this research, is the parallelism at loop level, because significant parallelism in programs almost invariably occurs in loops. The proposed APDL has five processes for code transformation: the sequential source code parser, data dependence analysis of this code, partitioning, scheduling both task and data, and generating parallel source code. Many cases have been studied to evaluate the performance of the developed layer. The performance is evaluated depending on the execution time of: the sequential code, the parallel programmer code, and the code output from APDL for the same case study. Performance results show that APDL greatly improves the execution time with respect to sequential execution time, and saves on the high cost of a parallel programmer.
Cellular automata is a nature inspired parallel processing model. It has been proposed several years ago by J. Von Neumann to simulate complex dynamical processes. In the past two decades several models of cellular au...
详细信息
ISBN:
(纸本)0769509878
Cellular automata is a nature inspired parallel processing model. It has been proposed several years ago by J. Von Neumann to simulate complex dynamical processes. In the past two decades several models of cellular automata that differ from the original one proposed by Von Neumann have been defined for modeling real-world systems and phenomena. This paper describes the design and implementation of standard and nonstandard parallel cellular automata in the CARPET language. CARPET is a cellular automata based language that has been implemented on MIMD parallel computers. The language is specifically designed for programming cellular computations supporting concise and efficient coding of parallel cellular algorithms. The paper analyzes the main features of the language and describes as they can be exploited to implement different cellular automata on parallel computers, starting from the standard model to its modifications and generalizations. Inhomogeneous, partitioned, asynchronous, and probabilistic cellular automata programmed in CARPET are presented.
作者:
B. KrysztopH. KrawczykFaculty of Electronics
Telecommunication and Informatics Computer Architecture Department Technical University of GdaDsk GdaDsk Poland Faculty of Electronics
elecommunication and Informatics Computer Architecture Department Technical University of GdaDsk GdaDsk Poland
A new approach for developing efficient and flexible component-based distributed applications is proposed. It is based on a new programming platform TL (Transformation Language) which allows to express both abstract s...
详细信息
A new approach for developing efficient and flexible component-based distributed applications is proposed. It is based on a new programming platform TL (Transformation Language) which allows to express both abstract sequential code and parallel processing model of an application. To minimize execution cost and maximize flexibility, Distributed Partial Executor (DPE) tool and optimization algorithm is introduced. The example of the distributed image processing application is considered and its optimization in TL is analyzed. The obtained results confirm usability of the proposed methodology.
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of inc...
详细信息
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems-e.g., logic-based and constraint-based systems. The idea of incremental stack-copying is to only copy the difference between the data areas of two agents, instead of copying them entirely, when distributing parallel work. In order to further reduce the communication during stack-copying and make its implementation efficient on message-passing platforms, a new technique, called stack-splitting, has recently been proposed. In this paper, we describe a scheme to effectively combine stack-splitting with incremental stack copying, to achieve superior parallel performance in a non-shared memory environment. We also describe a scheduling scheme for this incremental stack-splitting strategy. These techniques are currently being implemented in the PALS system-a parallel constraint logic programming system.
Presents a visualization technique based on particle tracking. The technique consists in defining a set of points distributed on a closed surface and following the surface deformations as the velocity field changes in...
详细信息
ISBN:
(纸本)0780372239
Presents a visualization technique based on particle tracking. The technique consists in defining a set of points distributed on a closed surface and following the surface deformations as the velocity field changes in time. Deformations of the surface contain information about dynamics of the flow; in particular, it is possible to identify zones where flow stretching and foldings occur. Because the points on the surface are independent of each other, it is possible to calculate the trajectory of each point concurrently. Two parallel algorithms are studied; the first one for a shared memory Origin 2000 supercomputer and the second one for a distributed memory PC cluster. The technique is applied to a fluid moving by natural convection inside a cubic container.
暂无评论