Synchronization techniques are proposed for algorithms which spawn processes remotely on loosely coupled processors based on run-time characteristics. the performance of the proposed synchronization schemes are measur...
详细信息
ISBN:
(纸本)0818656026
Synchronization techniques are proposed for algorithms which spawn processes remotely on loosely coupled processors based on run-time characteristics. the performance of the proposed synchronization schemes are measured on the iPSC/2 and SNAP-1 multiprocessors and their implementation cost is discussed. Results show that processes created dynamically throughout a distributed system can be synchronized at comparable overhead and cost to that required for fixed location process creation.
Reconfiguring a faulty hypercube into a maximal incomplete cube tends to lower potential performance degradation, because a hypercube so reconfigured often results in a much larger system that what is attained by any ...
详细信息
ISBN:
(纸本)0818656026
Reconfiguring a faulty hypercube into a maximal incomplete cube tends to lower potential performance degradation, because a hypercube so reconfigured often results in a much larger system that what is attained by any conventional reconfiguration scheme which identifies only complete subcubes. this paper proposes an efficient strategy for identifying all the maximal incomplete subcubes present in a faulty hypercube. the proposed strategy is distributed in that every healthy node executes the same identification algorithm independently at the same time.
We present an original approach to automatic array alignment, the step in the hierarchical transformation system aimed at the efficient execution of shared memory programs on distributed memory machines. Our array ali...
详细信息
ISBN:
(纸本)0818656026
We present an original approach to automatic array alignment, the step in the hierarchical transformation system aimed at the efficient execution of shared memory programs on distributed memory machines. Our array alignment algorithm deals with a broad set of intra-dimension and inter-dimension alignment preferences, including offsets, strides, permutations, embeddings, and their combinations. We discuss the algorithm and the tests performed on the Connection Machine CM-2.
Various massively parallel computers have emerged in recent years. Each of then have some distinct architecture properties that challenge the computer scientist to develop algorithms and software appropriate to that s...
详细信息
ISBN:
(纸本)0818656026
Various massively parallel computers have emerged in recent years. Each of then have some distinct architecture properties that challenge the computer scientist to develop algorithms and software appropriate to that specified architecture. this is at least as difficult a problem as software reuse in the sequential computer case. One approach to addressing this problem is to design parallel computation models which abstract the architecture details into several generic parameters, which we call resource metrics. Typical resource metrics include the number of processors, communication latency, synchronization, bandwidth, block transfer capability, memory access method, and network topology hierarchy. We review the various parallel and distributed computation models and compare the different resource metrics chosen by different computation models.
In this paper, we investigate the parallelism of simulated annealing for graph partitioning. A random sampling technique, in which disjoint partitions of the graph are distributed across the processors so that moves c...
详细信息
ISBN:
(纸本)0818656026
In this paper, we investigate the parallelism of simulated annealing for graph partitioning. A random sampling technique, in which disjoint partitions of the graph are distributed across the processors so that moves can be proposed and evaluated asynchronously in distinct processors, is proposed. Synchronizations among processors are not necessary until the equilibrium is reached at each processor. Furthermore, since no interacting moves can occur in our strategy, it is obvious that our scheme is free of errors in cost evaluation.
this paper describes the design of ScaLAPACK, a scalable software library for performing dense and banded linear algebra computations on distributed memory concurrent computers. the specification of the data distribut...
详细信息
ISBN:
(纸本)0818656026
this paper describes the design of ScaLAPACK, a scalable software library for performing dense and banded linear algebra computations on distributed memory concurrent computers. the specification of the data distribution has important consequences for interprocessor communication and load balance, and hence is a major factor in determining performance and scalability of the library routines. the block cyclic data distribution is adopted as a simple, yet general-purpose, way of decomposing block-partitioned matrices. distributed memory versions of the Level 3 BLAS provide an easy and convenient way of implementing the ScaLAPACK routines.
three Adjoining Grammar (TAG) is a powerful grammatical formalism for large-scale natural language processing. However, the computational complexity of parsing algorithms for TAG is high. We introduce a new parallel T...
详细信息
ISBN:
(纸本)0818656026
three Adjoining Grammar (TAG) is a powerful grammatical formalism for large-scale natural language processing. However, the computational complexity of parsing algorithms for TAG is high. We introduce a new parallel TAG parsing algorithm for MIMD hypercube multicomputers, using large granularity grammar partitioning, asynchronous communication, and distributed termination detection. We describe our implementation on the nCUBE/2 parallel computer, and provide experimental results on both random and English grammars. Our algorithm delivers the best performance of any TAG parsing algorithm to date, yielding an almost two order-of-magnitude speedup and good efficiency on up to 256 processors.
Non-wave shaping parallel bidirectional heuristic search algorithms have been reported to suffer of the bidirectional search anomaly. Although wave-shaping is considered as the most natural approach, parallel bidirect...
详细信息
ISBN:
(纸本)0818656026
Non-wave shaping parallel bidirectional heuristic search algorithms have been reported to suffer of the bidirectional search anomaly. Although wave-shaping is considered as the most natural approach, parallel bidirectional wave-shaping algorithms are extremely scare. We introduce a wave-shaping algorithm for parallel bidirectional heuristic search in distributed memory environments. the method is inspired by the celebrated uniprocessor bidirectional wave-shaping algorithm of DeChampeaux-Sint. Our performance evaluation shows that the proposed method scales well, maintains good performance for increasing problem sizes, and attains close to linear speedup over the sequential DeChampeaux-Sint algorithm.
In this paper a procedure is presented for locating single faults in N × N cube-type interconnection networks with global and withdistributed control. Networks withdistributed control are assumed to be based on...
详细信息
ISBN:
(纸本)0818656026
In this paper a procedure is presented for locating single faults in N × N cube-type interconnection networks with global and withdistributed control. Networks withdistributed control are assumed to be based on comparator cells with bit-serial architecture. It is shown that fault location requires eight tests for interconnection networks with global control under a functional fault model and four sets of test vectors for interconnection networks withdistributed control under the stuck-at fault model, given a specific implementation of the comparator cell. the outstanding feature of this procedure is that the input vectors are independent not only of the network size, but also of the actual fault location. thus, the fault location procedure can easily be implemented and has low time consumption.
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) ar...
详细信息
ISBN:
(纸本)0818656026
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) architectures, but less work has been done for Cache Only Memory Access (COMA) or attraction memory [1] architectures such as the KSR-1. In this paper, we presented two new barrier algorithms that offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. the new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks,. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.
暂无评论