the proceedings contains 86 papers. Topics discussed include algorithms, parallelprocessing systems, program processors, computer software, genetic algorithms, optimization, computer architecture, data structures, di...
详细信息
the proceedings contains 86 papers. Topics discussed include algorithms, parallelprocessing systems, program processors, computer software, genetic algorithms, optimization, computer architecture, data structures, distributed computer systems, network protocols, state estimation, program debugging, large scale systems and data communication systems.
In multicomputers, an appropriate data distribution is crucial for reducing communication overhead and therefore the overall performance. For this reason, data parallel languages provide programmers with primitives, s...
详细信息
In multicomputers, an appropriate data distribution is crucial for reducing communication overhead and therefore the overall performance. For this reason, data parallel languages provide programmers with primitives, such as BLOCK and CYCLIC that can be used to distribute data across the distributed memory. However, the languages do not aid the programmer as to how the distribution should be performed to maximize the performance. therefore, this paper presents an analysis of data distribution methods for overlapping computation and communication in the Gaussian elimination algorithm. the analysis indicates that both BLOCK and CYCLIC distributions have their own merit;however, BLOCK_CYCLIC with its hybrid characteristic consistently out performs its counterparts.
We propose a new family of trivalent network graphs with constant node degree 3 for design of massively parallel systems. these graphs are shown to be regular, to have logarithmic diameter in the number of nodes, and ...
详细信息
We propose a new family of trivalent network graphs with constant node degree 3 for design of massively parallel systems. these graphs are shown to be regular, to have logarithmic diameter in the number of nodes, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose simple and optimal routing algorithms. We also show that the proposed graph belongs to the well known family of Cayley graphs.
We discuss a style of designing parallel algorithms withthe following characteristics for a problem of the best known sequential time T(n): C1. Each processor spends O(T(n)÷P) time in computing. C2. Each process...
详细信息
We discuss a style of designing parallel algorithms withthe following characteristics for a problem of the best known sequential time T(n): C1. Each processor spends O(T(n)÷P) time in computing. C2. Each processor sends and/or receives O(n÷P) messages of one-word-size. C3. the number of communication phases is constant, independent of the input size n. We show this is possible to achieve for several fundamental computational problems.
Most studies of processor scheduling in multiprogrammed parallel systems have ignored the I/O performed by applications. Recent studies have demonstrated that significant I/O operations are performed by a number of di...
详细信息
Most studies of processor scheduling in multiprogrammed parallel systems have ignored the I/O performed by applications. Recent studies have demonstrated that significant I/O operations are performed by a number of different classes of parallel applications. this paper focuses on some basic issues that underlie scheduling in multiprogrammed parallel environments running applications with I/O. Characterization of the I/O behavior of parallel applications is discussed first. Based on simulation models this research investigates the influence of these I/O characteristics on processor scheduling.
there are several models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model, a parameter λ is used to mod...
详细信息
there are several models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model, a parameter λ is used to model the communication latency of the message passing system. In this paper, the gap between the theoritical modeling and the practical implementation is bridged. In particular, a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global marine operation is investigated. Results of the investigation are presented.
A renegable priority queue has been designed on two different types of network. the first design uses hypercube networks, and has a response time and a pipeline cycle time O(log p), where p is the maximum number of pr...
详细信息
A renegable priority queue has been designed on two different types of network. the first design uses hypercube networks, and has a response time and a pipeline cycle time O(log p), where p is the maximum number of processors that may access the design simultaneously. the second design uses reconfigurable meshes with both response time and pipeline cycle time being constants. Each of these designs uses O(p2m) processing elements, and have a capacity of pm, where m is a positive integer.
A new algorithm for dynamic programming on linear arrays is presented which uses a single data path and runs twice as fast using less than half the memory locations of the best previously known algorithm. the algorith...
详细信息
A new algorithm for dynamic programming on linear arrays is presented which uses a single data path and runs twice as fast using less than half the memory locations of the best previously known algorithm. the algorithm employs a redundant data technique in which a permuted copy of the dynamic programming table is maintained to reduce communication costs. the simplified control structure and exclusive use of shift registers for storage (i.e. no RAM is required) make the algorithm particularly suitable for VLSI implementation.
We present a straightforward and simple method of visualizing the execution of a distributed system as it is recorded in a collection of per-process traces. the technique is based directly on Lamport's space-time ...
详细信息
We present a straightforward and simple method of visualizing the execution of a distributed system as it is recorded in a collection of per-process traces. the technique is based directly on Lamport's space-time diagrams and the notion of causal precedence as embodied in vector time. Timestamping trace events with vector time provides the leverage required for the rapid display of the space-time diagram. It also allows us to define and display concurrent regions of the trace and to implement a fast algorithm for the evaluation of global state predicates over those regions. We describe an implementation of DTVS for distributed systems in which synchronous communication is used.
暂无评论