This paper makes an efficient improvement of processor complexity while solving some connectivity problems in processor arrays with reconfigurable bus systems. We first derive two constant time algorithms in the propo...
详细信息
This paper makes an efficient improvement of processor complexity while solving some connectivity problems in processor arrays with reconfigurable bus systems. We first derive two constant time algorithms in the proposed parallelprocessing system for computing the dominators and the dominator tree of an undirected graph either using a 3-D n×n×n processors or a 2-D n2×n2 processors, where n is the number of vertices of the graph. Then based on the dominator tree algorithm, we also solve many other graph connectivity problems in constant time. They are the articulation points, bridges, biconnected components, and bridge-connected components problems in undirected graphs.
Whole array operations and array section operations are important features of many data-parallel languages. Efficient implementation of these operations on distributed-memory multicomputers is critical to the scalabil...
详细信息
ISBN:
(纸本)0818674601
Whole array operations and array section operations are important features of many data-parallel languages. Efficient implementation of these operations on distributed-memory multicomputers is critical to the scalability and high-performance of data-parallel programs. We present an approach for analyzing communication patterns induced by array operations and for scheduling message flow based on the information. Our scheduling algorithm guarantees contention-free data transfer and utilizes network resources optimally. It incurs little overhead and is suitable to be used in compilers and in runtime libraries. We also present simulation results that demonstrate the algorithm's superiority to the asynchronous transfer mode that is commonly used for this type of communication.
Consider a family C of continuous functions stored, in discretized form, one function per row in a mesh with multiple broadcasting of size √n215;√n. In a number of Computer Aided Geometric Design applications it i...
详细信息
ISBN:
(纸本)0818674601
Consider a family C of continuous functions stored, in discretized form, one function per row in a mesh with multiple broadcasting of size √n×√n. In a number of Computer Aided Geometric Design applications it is necessary to answer point location queries with respect to the given functions: e.g. cubic B-splines, control polygons of Bezier curves, etc. The main contribution of this work is to show that in such a domain an arbitrary collection of m, (1≤m≤n), point location queries can be answered in Θ(m/√n) time. We show that this is best possible on this platform. Our algorithms do not ignore the I/O time and see it as an important component of the overall processing time. In this regard, our algorithms feature a very attractive property, namely that the I/O time and the processing time are essentially the same.
The proceedings contain 110 papers. The special focus in this conference is on Programming Environment and Tools. The topics include: High-performance distributed computing;design and implementation of a parallel arch...
ISBN:
(纸本)9783540616269
The proceedings contain 110 papers. The special focus in this conference is on Programming Environment and Tools. The topics include: High-performance distributed computing;design and implementation of a parallel architecture for biological sequence comparison;dynamic load balancing in parallel database systems;distributed array query and visualization for high performance fortran;annai scalable run-time support for interactive debugging and performance analysis of large-scale parallel programs;on the implementation of a replay mechanism;concepts and functionalities of the DOSMOS-trace monitoring tool;an open monitoring system for parallel anddistributed programs;an adaptive cost system for parallel program instrumentation;a performance tuning tool for DSM-based parallel computers;cautious, machine-independent performance tuning for shared-memory multiprocessors;an environment for parallel programming on networks of heterogeneous workstations;an integrated environment to design parallel object-oriented applications;extending the message-passing interface;a refinement methodology for developing data-parallelapplications;efficient block cyclic data redistribution;optimal grain size computation for pipelined algorithms;dynamic redistribution on heterogeneous parallel computers;supporting distributed sparse matrix objects;low-latency communication over fast ethernet;a comparison of input and output driven routers;a pattern-associative router for interconnection network adaptive algorithms and on stack-graph OPS-based lightwave networks.
The author parallelized a tertiary structure search algorithm on a distributed memory parallel computer, Cenju-3, utilizing a standard message passing interface (MPI) parallel library. For parallelization scheme, a ma...
详细信息
ISBN:
(纸本)0818674601
The author parallelized a tertiary structure search algorithm on a distributed memory parallel computer, Cenju-3, utilizing a standard message passing interface (MPI) parallel library. For parallelization scheme, a master-workers model is used. I analyzed the total performance by utilizing a M/M/1 queueing model and Jackson's model, and clearly explained the actual turn around times. Tertiary structure search is a computationally intensive task. When test sequences are distributed to all processors, a single key sequence can be tested independently. Thus high parallelization results are anticipated. Since database is allocated on a master processor, worker processors should acquire test sequences from the master processor. Sometimes a worker should wait until other workers obtain test sequences from the master processor. By this waiting, the total performance will saturate when the number of processors proceeds certain level. I analyzed the total performance by utilizing a M/M/1 queueing model and Jackson's model. By using these models, the actual turn around times are explained clearly.
The growing interest in using clusters of workstations as the target platform for high-performance applications, has again emphasized the need for support tools that can be used during application design. In this pape...
详细信息
The growing interest in using clusters of workstations as the target platform for high-performance applications, has again emphasized the need for support tools that can be used during application design. In this paper we present a graphical technique, called ADL-D, that allows a developer to construct an application in terms of communicating processes. The technique distinguishes itself from others by its use of highly orthogonal concepts, and the support for automated code generation. Developers are encouraged to concentrate on designing components in isolation, making the complex design space more manageable than would otherwise be the case. ADL-D can be used from the early phases of application design through phases that concentrate on algorithmic design, and final implementation on some target platform. Rather than presenting details of ADL-D, we use it here as a vehicle for a more general discussion on design level support for parallel anddistributedapplications. In this discussion, an emphasis is put on the design of dynamic communication structure, i.e. structures that can change during runtime.
We present a simple and general parallel sorting scheme, ZZ-sort, which can be used to derive a class of efficient in-place sorting algorithms on realistic parallel machine models. We prove a tight bound for the worst...
详细信息
We present a simple and general parallel sorting scheme, ZZ-sort, which can be used to derive a class of efficient in-place sorting algorithms on realistic parallel machine models. We prove a tight bound for the worst case performance of ZZ-sort. We also demonstrate the average performance of ZZ-sort by experimental results obtained on a MasPar parallel computer. Our experiments indicate that ZZ-sort can be incorporated into a distributed memory parallel computer system as a standard routine, and this routine is useful for space critical situations. Finally, we show that ZZ-sort can be used to convert a non-adaptive parallel sorting algorithm into an in-place and adaptive one by considering the problem of sorting an arbitrarily large input on fixed-size reconfigurable meshes.
There are many features that facilitate high performance digital signal processing (DSP) design built into the field programmable gate array (FPGA) architecture. The FPGA can provide all the benefits and has all the f...
详细信息
ISBN:
(纸本)1864352094
There are many features that facilitate high performance digital signal processing (DSP) design built into the field programmable gate array (FPGA) architecture. The FPGA can provide all the benefits and has all the features of a fixed point programmable DSP. The performance advantage of FPGA-based DSP design comes from the ability to build parallel data flow structures internal to the FPGA. FPGA-based DSP offers a scaleable methodology for DSP design implementation based technique uses distributed arithmetic. The scalability of bit-serial and bit-paralleldistributed arithmetic optimizes the design by implementing the DSP function in an optimized FPGA-based DSP solution based on performance requirements.
We consider the problems of distributed ranking and sorting on a Coterie, a communication structure which has proven to be a good candidate as underlying interconnection network for distributedprocessing. Ranking and...
详细信息
We consider the problems of distributed ranking and sorting on a Coterie, a communication structure which has proven to be a good candidate as underlying interconnection network for distributedprocessing. Ranking and sorting problems are harder than a consensus one, a vital and well studied problem in distributedprocessing, in that the later one computes for only one function (e.g. summation), while the former one actually performs n functions, as ranking is to rank the key in each of n sites. The currently best known decentralized consensus protocols on a coterie uses O(n√n) messages, and requires two rounds of message exchange. In this paper we show that both ranking and sorting can be done on a coterie with the same message complexity although the problems we investigate are much harder. We first present a two-round ranking algorithm which requires only O(n√n) messages. Then using this ranking algorithm, we obtain a sorting algorithm which also uses only O(n√n) messages, but requires two more rounds of message exchange. Our schemes are optimal in the sense that the lower bound of messages needed for achieving a consensus is Ω(n√n) (a trivial lower bound shown in [7]).
This paper describes the management of heterogeneity in Stardust, an environment for parallel programming above networks of heterogeneous machines, which can include distributed memory multicomputers and networks of w...
详细信息
暂无评论