This paper deals with data management for parallel and distributed systems in which the computing nodes are connected by a relatively sparse network. We present the DIVA (Distributed Variables) library that provides f...
详细信息
This paper deals with data management for parallel and distributed systems in which the computing nodes are connected by a relatively sparse network. We present the DIVA (Distributed Variables) library that provides fully transparent access to global variables, i.e., shared data objects, from the individual nodes in the network. The current implementations are based on mesh-connected massively parallel computers. The data management strategies implemented in the library use a non-standard approach based on a randomized but locality preserving embedding of `access trees' into the physical network. The access tree strategy was previously analyzed only in a theoretical model using competitive analysis, where it was shown that the strategy produces minimal network congestion up to small factors. In this paper, the access tree strategy will be evaluated experimentally. We test several variations of this strategy on three different applications of parallel computing, which are matrix multiplication, bitonic sorting, and Barnes-Hut N-body simulation. We compare the congestion and the execution time of the access tree strategy and their variations with a standard caching strategy that uses a fixed home for each data object. Additionally, we do comparisons with hand-optimized message passing strategies producing minimal communication overhead. At first, we will see that the execution time of the applications heavily depends on the congestion produced by the different data management strategies. At second, we will see that the access tree strategy clearly outperforms the fixed home strategy and comes reasonably close to the performance of the hand-optimized message passing strategies. In particular, the larger the network is the more superior the access tree strategy is against the fixed home strategy.
This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The parallel Vec...
ISBN:
(纸本)9781581131857
This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. The PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.
We consider parallel machine scheduling problems where the jobs are subject to precedence constraints, and the processing times of jobs are governed by independent probability distributions. The objective is to minimi...
ISBN:
(纸本)9780898714906
We consider parallel machine scheduling problems where the jobs are subject to precedence constraints, and the processing times of jobs are governed by independent probability distributions. The objective is to minimize the weighted sum of job completion times ∑, w, C, in expectation, where w, ⪈ 0. Building upon an LP-relaxation from [3] and an idle time charging scheme from [1], we derive the first approximation algorithms for this model.
暂无评论