The use of efficient Galois Field Arithmetic on SIMD architecture was presented. SIMD architectures were used for obtaining high speed implementation in the fields where data parallelism was encountered. In regard wit...
详细信息
ISBN:
(纸本)9781581136616
The use of efficient Galois Field Arithmetic on SIMD architecture was presented. SIMD architectures were used for obtaining high speed implementation in the fields where data parallelism was encountered. In regard with it, the role played by the bit-slicing procedure in the computations was also discussed.
We address optimal makespan-minimizing identical parallel machine scheduling (P vertical bar vertical bar C-max) by introducing new pruning rules for branch-and-bound (BnB) and integrating them into a prior BnB algori...
详细信息
ISBN:
(纸本)9798400704161
We address optimal makespan-minimizing identical parallel machine scheduling (P vertical bar vertical bar C-max) by introducing new pruning rules for branch-and-bound (BnB) and integrating them into a prior BnB algorithm. Experimental results indicate that the presented rules are inexpensive to evaluate, applicable frequently, and extremely beneficial to the BnB algorithm's overall performance.
We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our...
详细信息
ISBN:
(纸本)9798400704161
We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our parallel algorithm adjusts the maximal matching to these updates in poly(log n) depth and using poly(log n) amortized work per update. That is, the amortized work for processing a batch of k updates is k poly(log n), while all this work is done in poly(log n) depth, with high probability. This can be seen as a parallel counterpart of the sequential dynamic algorithms for constant-approximate and maximal matching [Onak and Rubinfeld STOC'10;Baswana, Gupta, and Sen FOCS'11;and Solomon FOCS'16]. Our algorithm readily generalizes to maximal matching in hypergraphs of rank r-where each hyperedge has at most r endpoints-with a poly(r) increase in work, while retaining the poly(log n) depth.
We introduce Autonomous SIMD (ASIMD) massively parallel architecture, then look at the flexibility, cost, and effectiveness of MIMD and ASIMD parallel systems. We show that ASIMD systems such as the MasPar MP-1 and MP...
详细信息
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency o...
详细信息
ISBN:
(纸本)9798400704161
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency on large-diameter graphs, which is a common challenge for many existing parallel graph processing systems due to the high overhead in synchronizing threads when traversing the graph in the breadth-first order. The core idea in PASGAL is a technique called vertical granularity control (VGC) to hide synchronization overhead by careful algorithm redesign and new data structures. We compare PASGAL with existing parallel implementations on several fundamental graph problems. PASGAL is always competitive on small-diameter graphs, and is significantly faster on large-diameter graphs.
Energy consumption by computer systems has emerged as an important concern However, the energy consumed in executing an algorithm cannot be inferred from its performance alone it must be modeled explicitly This paper ...
详细信息
ISBN:
(纸本)9781450300797
Energy consumption by computer systems has emerged as an important concern However, the energy consumed in executing an algorithm cannot be inferred from its performance alone it must be modeled explicitly This paper analyzes energy consumption of parallelalgorithms executed on shared memory multicore processors Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters.
Over the last five years, major microprocessor manufacturers have released plans for a rapidly increasing number of cores per microprossesor, with upwards of 64 cores by 2015. In this setting, a sequential RAM compute...
详细信息
ISBN:
(纸本)9781595939739
Over the last five years, major microprocessor manufacturers have released plans for a rapidly increasing number of cores per microprossesor, with upwards of 64 cores by 2015. In this setting, a sequential RAM computer will no longer accurately reflect the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. This model supports a high level of abstraction that simplifies the design and analysis of parallel programs. More importantly we show that in many instances it naturally leads to work-optimal parallelalgorithms via simple modifications to sequential algorithms.
Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sam...
详细信息
Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sampling for choosing splitters, were studied. The two were coded in Split-C and were run on a variety of platforms. Results were consistent with theoretical analysis and illustrated the scalability and efficiency of the algorithms.
暂无评论