The authors recently introduced the technique of sparse mesh refinement to produce the first near-optimal sequential time bounds of O(n lg L/s+m) for inputs in any fixed dimension with piecewise-linear constraining (P...
详细信息
ISBN:
(纸本)9781595936677
The authors recently introduced the technique of sparse mesh refinement to produce the first near-optimal sequential time bounds of O(n lg L/s+m) for inputs in any fixed dimension with piecewise-linear constraining (PLC) features. This paper extends that work to the parallel case, refining the same inputs in time O(lg(L/s)lg m) on an EREW PRAM while maintaining the work bound;in practice, this means we expect linear speedup for any practical number of processors. This is faster than the best previously known parallel Delaunay mesh refinement algorithms in two dimensions. It is the first technique with work bounds equal to the sequential case. In higher dimension, it is the first provably fast parallel technique for any kind of quality mesh refinement with PLC inputs. Furthermore, the algorithm's implementation is straightforward enough that it is likely to be extremely fast in practice.
In this paper, we study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model, where multiple agents interact wi...
详细信息
ISBN:
(纸本)9798400704161
In this paper, we study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model, where multiple agents interact with possibly different environments and they want to learn in parallel an objective function in the aggregated environment. By proving almost tight upper and lower bounds, we show that collaborative learning in the heterogeneous setting is inherently more difficult than that in the homogeneous setting in terms of the time-round tradeoff.
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional...
详细信息
ISBN:
(纸本)9781595936677
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional implementations, as those incorporated in LAPACK, cannot be easily modified to render high performance as well as scalability on these architectures. The solution we propose is to arrange the data structures and algorithms so that matrix blocks become the fundamental units of data;and operations on these blocks become the fundamental units of computation, resulting in algorithms-by-blocks as opposed to the snore traditional blocked algorithms. We show that this facilitates the adoption of techniques akin to dynamic scheduling and out-of-order execution usual in superscalar processors;which we name SuperMatrix Out-of-Order scheduling. Performance results on a 16 CPU Itanium2-based server are used to highlight opportunities and issues related to this new approach.
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
We introduce Autonomous SIMD (ASIMD) massively parallel architecture, then look at the flexibility, cost, and effectiveness of MIMD and ASIMD parallel systems. We show that ASIMD systems such as the MasPar MP-1 and MP...
详细信息
The proceedings contains 48 papers from Thirteen annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: compact routing schemes;simple on-line algorithms for the maximum disjoint path...
详细信息
The proceedings contains 48 papers from Thirteen annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: compact routing schemes;simple on-line algorithms for the maximum disjoint paths problem;competitive buffer management for shared-memory switches;attack propagation in networks;computational power of pipelined memory hierarchies;and a data tracking scheme for general networks.
暂无评论