作者:
SCHEININE, ALParallel Computing Group
Center for Advanced Studies Research and Development in Sardinia via Nazario Sauro 10 I-09123 Cagliari Italy
An overview is given of parallel computing work being done at CRS4 (Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna). Parallel computation projects include: parallelization of a simulation of the interaction...
详细信息
An overview is given of parallel computing work being done at CRS4 (Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna). Parallel computation projects include: parallelization of a simulation of the interaction of high energy particles with matter (GEANT), domain decomposition for numerical solution of partial differential equations, seismic migration for oil prospecting, finite-element structural analysis, parallel molecular dynamics, a C++ library for distributed processing of specific functions, and real-time visualization of a computer simulation that runs as distributed processes.
A software package that allows one to carry out multiple alignment of protein and nucleic acid sequences of almost unlimited length and number of sequences is developed on C-DAC parallel computer-a transputer-based ma...
The parallel implementation of the revised simplex algorithm (RSA) using eta-factorization holds the promise of significant improvement in the execution time by virtue of the existence of a high degree of parallelism ...
详细信息
The parallel implementation of the revised simplex algorithm (RSA) using eta-factorization holds the promise of significant improvement in the execution time by virtue of the existence of a high degree of parallelism in the computation within an iteration of the algorithm. However, the scheme employed to partition key data structures in a distributed memory parallel processor has a great impact on the achievable performance. The paper explores the trade-offs between block-row and block-column partitioning schemes for the matrix of constraint coefficients vis-a-vis the communication overheads and granularity of parallel computations. The results of an approximate analysis of the compute-communication balance are compared with measurements from practical implementation of the partitioning schemes on C-DAC's PARAM 8000 distributed memory parallel processor.< >
The notion of an elimination tree plays a very important role in the parallel algorithms for sparse Cholesky decomposition, symbolic factorization and in determining the mapping of columns of the matrix to processors....
详细信息
The notion of an elimination tree plays a very important role in the parallel algorithms for sparse Cholesky decomposition, symbolic factorization and in determining the mapping of columns of the matrix to processors. In this paper, we present a parallel algorithm to compute the elimination tree and simultaneously carry out symbolic factorization on a local memory multiprocessor. An existing parallel algorithm for symbolic factorization [5] requires the computation of elimination tree separately. In our algorithm, we use a tree defined on the given matrix, called false elimination tree, and convert it into the actual elimination tree. In the process, we also compute the structure of the columns of the factor matrix. Using the new parallel algorithm on grid problems, we found that it performs 2 to 3 times faster compared to the total time taken for sequential computation of the elimination tree and the parallel computation of symbolic factorization using [5]. Also, our algorithm is the first parallel algorithm for elimination tree computation that gives a speed-up.
The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distribu...
详细信息
The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. The authors describe the implementation details of PARUL on a multi-transputer system (PARAM) and discuss its performance.< >
We present a computationally efficient method for deriving the most appropriate transformation and mapping of a nested loop for a given hierarchical parallel machine. This method is in the context of our systematic an...
详细信息
The Logistics Management System (LMS) is a real‐time transaction‐based system combining decision technologies from AI, MS/OR, and decision support system that serves very successfully as a dispatcher or short‐inter...
详细信息
We present a unimodular transformation called rotation to partition the iteration space of a perfectly nested loop. The transformation captures the individual transformations like loop interchange, reversal, and skewi...
详细信息
In this paper, we present a unimodular loop transformation called rotation as a simple, systematic and uniform method for partitioning the iteration spaces of doubly nested loops for execution on distributed memory mu...
详细信息
暂无评论