In this paper, we present a dynamic load-balancing algorithm for parallel digital logic simulation making use of reinforcement learning We first introduce two dynamic load-balancing algorithms oriented towards balanci...
详细信息
ISBN:
(纸本)9781450300797
In this paper, we present a dynamic load-balancing algorithm for parallel digital logic simulation making use of reinforcement learning We first introduce two dynamic load-balancing algorithms oriented towards balancing the computational and communication load respectively and then utilize reinforcement learning to create an algorithm which is a combination of the first two algorithms In addition, the algorithm determines the value of two important parameters the number of processors which participate in the algorithm and the load which is exchanged during its execution. We investigate the algorithms on gate level simulations of several open source VLSI circuits
This paper presents an engineering design for a low latency high bandwidth interconnection network which will form the switching substrate for a multi-model parallel processing system. The performance is enhanced with...
详细信息
This paper introduces the Asynchronous PRAM model of computation, a variant of the PRAM in which the processors run asynchronously and there is an explicit charge for synchronization. A family of asynchronous PRAM'...
详细信息
In this paper we describe a technique for finding efficient parallelalgorithms for problems on directed graphs that involve checking the ezistence of certain kinds of paths in the graph. This technique provides effic...
详细信息
A new approach to parallel sorting called parallel Sorting by OverPartitioning (PSOP) is presented. The approach limits the communication cost by moving each element between processors at most once, and leads to good ...
详细信息
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method o...
详细信息
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method overall is the naive method but with virtual-processor numbers computed in Gray-code order. For some permutations, however, the naive method performs very poorly;the best method in these cases is an adaptation of the block BMMC algorithm for parallel disk systems in which the processor elements are treated as independent devices.
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's work...
详细信息
ISBN:
(纸本)9781450307437
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's workflow" and targeting XMT, a PRAM-inspired many-core architecture, we achieve significantly higher speed-ups than previous approaches. Comparison with the fastest known serial max-flow implementation on a modern CPU demonstrates for the first time potential for orders-of-magnitude performance improvement for Max-Flow. Using XMT, the PRAM Max-Flow algorithm is also much easier to program than for other parallel platforms, contributing a powerful example toward dual validation of both PRAM algorithmics and XMT.
The methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks were described. Most of the analysis were centered on the simulation of unit-delay rings on netwo...
详细信息
The methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks were described. Most of the analysis were centered on the simulation of unit-delay rings on networks of workstations (NOWs) with arbitrary delays on the links. Emulations were also derived for the wide variety of other unit-delay network architectures on a NOW with high-latency links. The lower bounds that established limits on the degree to which the high latency links were proven, can be mitigated. These bounds demonstrates that overcoming latencies in dataflow types of computations that require access to large local databases is easier.
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations...
详细信息
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations using a per-processor parameter g>1, such that each processor can send/receive at most h messages in g·h time. Other models (e.g., PRAM(m)) account for bandwidth limitations as an aggregate parameter mΩ(√lg p) separation known previously.
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallel...
详细信息
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallelism in some of the fundamental problems in compressing strings and in matching large dictionaries of patterns against texts. We present the first work-optimal algorithms for these well-studied problems including the classical dictionary matching problem, optimal compression with a static dictionary and the universal data compression with dynamic dictionary of Lempel and Ziv. All our algorithms are randomized and they are of the Las Vegas type. Furthermore, they are fast, working in time logarithmic in the input size. Additionally, our algorithms seem suitable for a distributed implementation.
暂无评论