This special issue contains 6 selected papers whose preliminary versions appeared in the proceedings of the 23rd annualacmsymposium on parallelism in algorithms and architectures (SPAA), held June 2011, in San Jose,...
详细信息
This special issue contains 6 selected papers whose preliminary versions appeared in the proceedings of the 23rd annualacmsymposium on parallelism in algorithms and architectures (SPAA), held June 2011, in San Jose, California, USA. These papers were selected by the special issue co-editors from 35 papers that were presented at the conference. The authors were invited to submit full versions of their papers, which were then fully refereed according to the usual standards of Theory of Computing Systems. The selected papers are representative of the breadth and depth of the research in parallelism in algorithms and architectures that was presented at SPAA 2011.
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksun...
详细信息
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksunder adversarial traffic;balanced graph partitioning;online hierarchical cooperative caching;scheduling against an adversarial network;effectively sharing a cache among threads;online algorithms for network design and dynamic analysis of the arrow distributed protocol.
The proceedings contains 46 papers from the conference on SPAA 2003 Fifteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: optimal sharing of bags of tasks in hete...
详细信息
The proceedings contains 46 papers from the conference on SPAA 2003 Fifteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: optimal sharing of bags of tasks in heterogeneous clusters;minimizing total flow time and total completion time with immediate dispatching;a practical algorithm for constructing oblivious routing schemes;a polynomial-time tree decomposition to minimize congestion and online oblivious routing.
The proceedings contain 7 papers from the symposium on parallelism in algorithms and architectures, SPAA 2003: 15th annualsymposium on parallelism in algorithms and architectures. The topics discussed include: a prac...
详细信息
The proceedings contain 7 papers from the symposium on parallelism in algorithms and architectures, SPAA 2003: 15th annualsymposium on parallelism in algorithms and architectures. The topics discussed include: a practical algorithm for constructing oblivious routing schemes;novel architectures for P2P applications: the continuous-discrete approach;quantifying instruction criticality for shared memory multiprocessors;relaxing the problem-size bound for out-of-core columnsort;the complexity of verifying memory coherence;a near optimal scheduler for switch-memory-switch routers;and on local algorithms for topology control and routing in ad hoc networks.
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asympto...
详细信息
ISBN:
(纸本)9781450307437
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asymptotic communication lower bounds for general memory models and classes of algorithms. Recent work [2] has established lower bounds for a wide set of linear algebra algorithms on a sequential machine and on a parallel machine with identical processors. This work extends these previous bounds to a heterogeneous model in which processors access data and perform floating point operations at differing speeds. We also present an algorithm for dense matrix multiplication which attains the lower bound.
Energy consumption by computer systems has emerged as an important concern However, the energy consumed in executing an algorithm cannot be inferred from its performance alone it must be modeled explicitly This paper ...
详细信息
ISBN:
(纸本)9781450300797
Energy consumption by computer systems has emerged as an important concern However, the energy consumed in executing an algorithm cannot be inferred from its performance alone it must be modeled explicitly This paper analyzes energy consumption of parallel algorithms executed on shared memory multicore processors Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters.
The proceedings contain 45 papers. The topics discussed include: randomized composable coresets for matching and vertex cover;almost optimal streaming algorithms for coverage problems;bicriteria distributed submodular...
ISBN:
(纸本)9781450345934
The proceedings contain 45 papers. The topics discussed include: randomized composable coresets for matching and vertex cover;almost optimal streaming algorithms for coverage problems;bicriteria distributed submodular maximization in a few rounds;on energy conservation in data centers;asymptotically optimal approximation algorithms for coflow scheduling;online flexible job scheduling for minimum span;minimizing total weighted flow time with calibrations;brief announcement: scheduling parallelizable jobs online to maximize throughput;brief announcement: a new improved bound for coflow scheduling;and a communication-avoiding parallel algorithm for the symmetric eigenvalue problem.
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronizatio...
详细信息
ISBN:
(纸本)9781450300797
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlock-free feedback loops, and 5) lower latency We compare team scheduling to the latest stream scheduling algorithm, SGMS, by evaluating 14 applications on a multi-core architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by SGMS clue to excessive buffer requirement or deadlocks in feedback loops (e.g., GSM and W-cDmA) For applications that can be validly scheduled by SGMS, team scheduling shows on average 37% higher throughput within the same buffer space constraints
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using su...
ISBN:
(纸本)9781450300797
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using submodular functions;collaborative scoring with dishonest participants;securing every bit: authenticated broadcast in radio networks;brief announcement: on speculative replication of transactional systems;data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory;basic network creation games;on the bit communication complexity of randomized rumor spreading;algorithms and application for grids and clouds;towards optimizing energy costs of algorithms for shared memory architectures;brief announcement: on regenerator placement problems in optical networks;best-effort group service in dynamic networks;and implementing and evaluating nested parallel transactions in software transactional memory.
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional...
详细信息
ISBN:
(纸本)9781595936677
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional implementations, as those incorporated in LAPACK, cannot be easily modified to render high performance as well as scalability on these architectures. The solution we propose is to arrange the data structures and algorithms so that matrix blocks become the fundamental units of data;and operations on these blocks become the fundamental units of computation, resulting in algorithms-by-blocks as opposed to the snore traditional blocked algorithms. We show that this facilitates the adoption of techniques akin to dynamic scheduling and out-of-order execution usual in superscalar processors;which we name SuperMatrix Out-of-Order scheduling. Performance results on a 16 CPU Itanium2-based server are used to highlight opportunities and issues related to this new approach.
暂无评论