Currently, several interesting superscalar and VLIW (very large instruction word) processors hit the market. These processors exploit so-called instruction level parallelism (ILP);each cycle multiple operations are ex...
详细信息
ISBN:
(纸本)0780342291
Currently, several interesting superscalar and VLIW (very large instruction word) processors hit the market. These processors exploit so-called instruction level parallelism (ILP);each cycle multiple operations are executed. This paper analyzes the data path complexity of lLP processors;in particular of VLIWs. It demonstrates that their complexity gets out of control when scaling to very high performance. Several methods are researched for reducing this complexity. Essentially these methods trade hardware for software complexity, i.e., performing as much as possible at compile time. This results in a new architectural approach called transport triggering. Its concept and characteristics are outlined. The application of this concept results in a number of hardware advantages, and introduces several new scheduling optimizations.
Active Expressions (A(e)) is a language-based model for the instantiation of type-safe concurrent applications. Using facilities included in modern object-oriented languages, Ae allows the definition of communication ...
详细信息
ISBN:
(纸本)0780342291
Active Expressions (A(e)) is a language-based model for the instantiation of type-safe concurrent applications. Using facilities included in modern object-oriented languages, Ae allows the definition of communication and synchronization patterns that, when combined with user provided functionality through well defined interfaces, instantiate complete concurrent applications. The approach has two unique characteristics: First, it shows that common patterns of concurrency can be expressed using language provided facilities. Second, the model can be implemented without requiring any complex user-interfaces, preprocessing stages or language extensions. It also shows that the pattern-based approach has the potential to reduce the complexity of developing concurrent applications.
This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of ...
详细信息
ISBN:
(纸本)0780342291
This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of the preceding sets. Each join is performed on all processors of the hypercube in parallel. The algorithm uses a global sorting method to distribute the elements of the first set evenly across all processors in increasing order, a new data balancing technique to distribute the elements of subsequent sets to match the intermediate set at each processor and to compensate for join skew, and a new efficient local range-join procedure. We analyse the performance of this algorithm and demonstrate that it improves on the best previously published algorithm for this problem when the join selectivity factor is small. The method can also be applied to similar problems such as band-join and equi-join.
Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. ...
详细信息
ISBN:
(纸本)0780342291
Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. Intra-operation and inter-operation parallelism are also called intra-query parallelism which has been studied extensively recently. In contrast inter-query parallelism has received little attention particularly for multiple dependent queries. In this paper, we develop a decompression algorithm, CPS, for coping with multiple dependent queries which are represented by a directed graph, and the algorithm makes use of the activity analysis of critical path analysis, and the resource scheduling and levelling of project management. A simulation study has been conducted and the results show that the proposed algorithm outperforms other existing methods and is able to provide a global optimal solution when the number of processors available is sufficient.
Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches ...
详细信息
In this paper, me devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedule...
详细信息
ISBN:
(纸本)0780342291
In this paper, me devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedules are generated from static task graphs at compile-time. At run-time a distributed program can use these alternatives to switch from one schedule to another if some machine/s become faulty. We have devised fast but efficient mechanisms for switching among schedules at run-time. This enables fault recovery from any number of simultaneous machine faults any number of times. The correctness of the resultant algorithm is ensured through prevention of direct data sharing among local tasks on a machine. Such a transparent fault tolerant strategy is easily implementable on a network of workstations running PVM-like softwares.
Multi-Spert is a scalable parallel system built from multiple Spert-II nodes which we have constructed to speed error backpropagation neural network training for speech recognition research. We present the Multi-Spert...
详细信息
ISBN:
(纸本)0780342291
Multi-Spert is a scalable parallel system built from multiple Spert-II nodes which we have constructed to speed error backpropagation neural network training for speech recognition research. We present the Multi-Spert hardware and software architecture, and describe our implementation of two alternative parallelization strategies for the backprop algorithm. We have developed detailed analytic models of the two strategies which allow us to predict performance over a range of network and machine parameters. The models' predictions are validated by measurements for a prototype five node Multi-Spert system. This prototype achieves a neural network training performance of over 530 million connection updates per second (MCUPS) while training a realistic speech application neural network. The model predicts that performance will scale to over 800 MCUPS for eight nodes.
We show a high throughput implementation of SAR on high performance computing (HPC) platforms. In our implementation, the processors are divided into two groups of size M and N. The first group consisting of M process...
详细信息
From the Publisher: The icapp-97 Proceedings comprises a well defined set of S papers in the area of parallelprocessing. Specific topics covered in the S include: basic issues of algorithms and architectures for Pa...
ISBN:
(纸本)9780780342293
From the Publisher: The icapp-97 Proceedings comprises a well defined set of S papers in the area of parallelprocessing. Specific topics covered in the S include: basic issues of algorithms and architectures for parallelprocessing; parallelprocessing Prospects; routing in parallel Computer Systems; special-purpose parallelarchitectures; operating Environments; scheduling; parallelisation and parallelising Computers; computing on Clusters of Workstations; parallelalgorithms; parallel Applications; parallelalgorithms and architectures for Neural Program; databases and parallelprocessing. Held in Melbourne, Australia, this important conference brought together developers and researchers from universities, industry and government to advance the level of knowledge for parallel and distributed systems and processing.
暂无评论