The current paper provides preliminary statements of the panelists ahead of a panel discussion at the acm SPAA 2021 conference on the topic: algorithm-friendly architecture versus architecture-friendly algorithms. ...
详细信息
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronizatio...
详细信息
ISBN:
(纸本)9781450300797
We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlock-free feedback loops, and 5) lower latency We compare team scheduling to the latest stream scheduling algorithm, SGMS, by evaluating 14 applications on a multi-core architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by SGMS clue to excessive buffer requirement or deadlocks in feedback loops (e.g., GSM and W-cDmA) For applications that can be validly scheduled by SGMS, team scheduling shows on average 37% higher throughput within the same buffer space constraints
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional...
详细信息
ISBN:
(纸本)9781595936677
We discuss the high-performance parallel implementation and execution of dense linear algebra matrix operations on SMP architectures;with an eye towards multi-core processors with many cores. We argue that traditional implementations, as those incorporated in LAPACK, cannot be easily modified to render high performance as well as scalability on these architectures. The solution we propose is to arrange the data structures and algorithms so that matrix blocks become the fundamental units of data;and operations on these blocks become the fundamental units of computation, resulting in algorithms-by-blocks as opposed to the snore traditional blocked algorithms. We show that this facilitates the adoption of techniques akin to dynamic scheduling and out-of-order execution usual in superscalar processors;which we name SuperMatrix Out-of-Order scheduling. Performance results on a 16 CPU Itanium2-based server are used to highlight opportunities and issues related to this new approach.
The use of efficient Galois Field Arithmetic on SIMD architecture was presented. SIMD architectures were used for obtaining high speed implementation in the fields where data parallelism was encountered. In regard wit...
详细信息
ISBN:
(纸本)9781581136616
The use of efficient Galois Field Arithmetic on SIMD architecture was presented. SIMD architectures were used for obtaining high speed implementation in the fields where data parallelism was encountered. In regard with it, the role played by the bit-slicing procedure in the computations was also discussed.
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using su...
ISBN:
(纸本)9781450300797
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using submodular functions;collaborative scoring with dishonest participants;securing every bit: authenticated broadcast in radio networks;brief announcement: on speculative replication of transactional systems;data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory;basic network creation games;on the bit communication complexity of randomized rumor spreading;algorithms and application for grids and clouds;towards optimizing energy costs of algorithms for shared memory architectures;brief announcement: on regenerator placement problems in optical networks;best-effort group service in dynamic networks;and implementing and evaluating nested parallel transactions in software transactional memory.
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch...
ISBN:
(纸本)9781450307437
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs;linear-work greedy parallel approximate set cover and variants;optimizing hybrid transactional memory: the importance of nonspeculative operations;parallelism and data movement characterization of contemporary application classes;work-stealing for mixed-mode parallelism by deterministic team-building;full reversal routing as a linear dynamical system;reclaiming the energy of a schedule, models and algorithms;a tight runtime bound for synchronous gathering of autonomous robots with limited visibility;convergence of local communication chain strategies via linear transformations: or how to trade locality for speed;and convergence to equilibrium of logit dynamics for strategic games.
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
The proceedings contain 49 papers. The topics discussed include: fast stencil computations using fast Fourier transforms;low-span parallel algorithms for the binary-forking model;provable advantages for graph algorith...
ISBN:
(纸本)9781450380706
The proceedings contain 49 papers. The topics discussed include: fast stencil computations using fast Fourier transforms;low-span parallel algorithms for the binary-forking model;provable advantages for graph algorithms in spiking neural networks;algorithms for right-sizing heterogeneous data centers;efficient parallel self-adjusting computation;speed scaling with explorable uncertainty;efficient online weighted multi-level paging;paging and the address-translation problem;massively parallel algorithms for distance approximation and spanners;efficient load-balancing through distributed token dropping;finding subgraphs in highly dynamic networks;near-optimal time-energy trade-offs for deterministic leader election;and efficient stepping algorithms and implementations for parallel shortest paths.
The proceedings contain 53 papers. The topics discussed include: a first insight into object-aware hardware transactional memory;safe open-nested transactions through ownership;leveraging non-blocking collective commu...
ISBN:
(纸本)9781595939739
The proceedings contain 53 papers. The topics discussed include: a first insight into object-aware hardware transactional memory;safe open-nested transactions through ownership;leveraging non-blocking collective communication in high-performance applications;fractal communication in software data dependency graphs;many random walks are faster than one;improved distributed approximate matching;graph partitioning into isolated, high conductance clusters: theory, commutation and applications to preconditioning;automatic data partitioning in software transactional memories;checkpoints and continuations instead of nested transactions;adaptive transaction scheduling for transactional memory systems;operational analysis of processor speed scaling;and kicking the tires of software transactional memory: why the going gets tough.
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth ...
ISBN:
(纸本)9781450391467
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth graphs;adaptive massively parallel algorithms for cut problems;preparing for disaster: leveraging precomputation to efficiently repair graph structures upon failures;the energy complexity of Las Vegas leader election;a fully-distributed peer-to-peer protocol for byzantine-resilient distributed hash tables;brief announcement: the (limited) power of multiple identities: asynchronous byzantine reliable broadcast with improved resilience through collusion;brief announcement: composable dynamic secure emulation;and robust and optimal contention resolution without collision detection.
暂无评论