There has recently been an interest in the introduction of reconfigurable buses to existing parallel architectures. Among them Reconfigurable Mesh (RM) draws much attention because of its simplicity. This paper presen...
详细信息
There has recently been an interest in the introduction of reconfigurable buses to existing parallel architectures. Among them Reconfigurable Mesh (RM) draws much attention because of its simplicity. This paper presents two O(1) time algorithms to compute the contour of the maximal elements of N planar points on the RM. The first algorithm employs an RM of size N/spl times/N while the second one uses a 3-D RM of size /spl radic/N/spl times//spl radic/N/spl times//spl radic/N.
The performance of a distributed system depends upon the efficiency of job distribution among processing nodes, as well as that of its system architecture and operating system. The paper presents an extended C languag...
详细信息
ISBN:
(纸本)0818680679
The performance of a distributed system depends upon the efficiency of job distribution among processing nodes, as well as that of its system architecture and operating system. The paper presents an extended C language, ParaC, that supports efficient parallel programming on distributed systems. ParaC is designed to reduce the effort of job distribution on distributed programming environments. Our design includes the description of design goals for the parallel language, the definition of a programming model and the design of ParaC constructs. The paper also addresses the detailed design issues related to translation and finally presents our prototype.
Model updating has now become a well-known field of application in engineering. The following is restricted to spatially discretized mathematical models. First, the classical sensitivity (including sensitivities of th...
详细信息
The paper concentrates on the problem how to integrate genetic programming, neural networks, autonomous agents with some symbolic AI techniques. For this purpose, al introduces and employs the $-calculus, which is a g...
详细信息
ISBN:
(纸本)0818682043
The paper concentrates on the problem how to integrate genetic programming, neural networks, autonomous agents with some symbolic AI techniques. For this purpose, al introduces and employs the $-calculus, which is a general model of computation with a quantitative aspect (cost) allowing naturally to express optimization and modification in dynamic parallel AI systems. The papers presents basic operators of-the calculus, and a basic inference engine, so called modifying algorithm, used for problem solving. Next the approach is illustrated through a series of examples from various domains, including symbolic and subsymbolic systems.
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DI...
详细信息
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas;it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.
A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p re...
详细信息
ISBN:
(纸本)0818678763
A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also have to take their checkpoints in order to preserve system consistency. Although several synchronous checkpointing algorithms have been proposed in the literature, most of them do not address the performance issue. In this paper we propose an efficient distributed algorithm for synchronous checkpointing. Proof of correctness and analysis of efficiency of the algorithm are presented. It is shown that the algorithm has a better message and time complexity than the existing algorithms. The method proposed in this paper can also be applied to enhance the performance of rollback operation which always require synchronization of the inter-dependent processes.
An optimal parallel algorithm for computing all-pair shortest paths on doubly convex bipartite graphs is presented here. Our parallel algorithm runs in O(log n) time with O(n/sup 2//log n) processors on an EREW PRAM a...
详细信息
ISBN:
(纸本)0818682272
An optimal parallel algorithm for computing all-pair shortest paths on doubly convex bipartite graphs is presented here. Our parallel algorithm runs in O(log n) time with O(n/sup 2//log n) processors on an EREW PRAM and is time-and-work-optimal. As a by-product, we show that the problem can be solved by a sequential algorithm in O(n/sup 2/) time optimally on any adjacency list or matrix representing a doubly convex bipartite graph. The result in this paper improves a recent work on the problem for bipartite permutation graphs, which are properly contained in doubly convex bipartite graphs.
PVM-based parallel/distributed computation tools have been designed, implemented, and applied to two important mathematical algorithms. The tools make PVM easier to use and applicable to a wider class of computations....
详细信息
PVM-based parallel/distributed computation tools have been designed, implemented, and applied to two important mathematical algorithms. The tools make PVM easier to use and applicable to a wider class of computations. The application shows how advanced algebraic algorithms can take advantage of modern parallel/distributedcomputing with the aid of such tools. The tools and interfaces described include PVM-ET (a set of enhancement tools for PVM), PvmJobs (a general bag of jobs library that works with any user created job structure in a master/slave paradigm), and SaclibPvm (a simple software package interfacing SACLIB to PVM). The ability to interface symbolic computing to PVM allows us to tackle the parallelization of the Grobner bases algorithm and the characteristic sets method, two very compute intensive algorithms important in algebraic computations. These algorithms, their parallelization, and experimental results are presented.
Collective communications such as broadcast and reduction are commonly used in data parallel programs. It is important to understand the performance of such primitive communications to characterize parallel systems an...
详细信息
Collective communications such as broadcast and reduction are commonly used in data parallel programs. It is important to understand the performance of such primitive communications to characterize parallel systems and analyze the performance of parallel applications running on specific parallel systems. We measured the performance of collective communication operations on several multi-processor systems. In this paper, we report experimental results for collective communication performance on distributed memory systems. We also describes the performance prediction of data parallel programs using the performance of the primitives.
This paper investigates the execution behaviors of parallel sorting algorithms on an experimental multiprocessor (KuPP) and compares with predicted performance under LogP and BSP (Bulk Synchronous parallel) models. Si...
详细信息
This paper investigates the execution behaviors of parallel sorting algorithms on an experimental multiprocessor (KuPP) and compares with predicted performance under LogP and BSP (Bulk Synchronous parallel) models. Since the communication overhead is considered a primary candidate for improvement, a few schemes are devised and experimented on KuPP to reduce the time spent in communication, thus to enhance the overall performance. The authors believe the ideas can be adopted in other high-performance parallel computers.
暂无评论