The proceedings contain 33 papers. The topics discussed include: a comparison of sorting algorithms for the connection machine CM-2;randomized sorting and selection on mesh-connected processor arrays;large-scale sorti...
ISBN:
(纸本)0897914384
The proceedings contain 33 papers. The topics discussed include: a comparison of sorting algorithms for the connection machine CM-2;randomized sorting and selection on mesh-connected processor arrays;large-scale sorting in parallel memories;optimal speedup for backtrack search on a butterfly network;fast and reliable parallel hashing;tight bounds for the chaining problem;parallel construction of trees with optimal weighted path length;more time-work tradeoffs for parallel graph algorithms;an overview of supertoroidal networks;and architectural primitives for a scalable shared memory multiprocessor.
The proceedings contain 47 papers. The topics discussed include: Fault-Tolerant Meshes with Small Degree;the verification of cache coherence protocols;fault diagnosis in a small constant number of parallel testing rou...
ISBN:
(纸本)0897915992
The proceedings contain 47 papers. The topics discussed include: Fault-Tolerant Meshes with Small Degree;the verification of cache coherence protocols;fault diagnosis in a small constant number of parallel testing rounds;tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults;on Gazit and Miller�s parallel algorithm for planar separators: achieving greater efficiency through random sampling;parallel and output sensitive algorithms for combinatorial and linear algebra problems;efficient parallel shortest-paths in digraphs with a separator decomposition;components for computing and communications;highly efficient dictionary matching in parallel;optimal parallel two dimensional pattern matching;and parallel construction and query of suffix trees for two-dimensional matrices.
The proceedings contain 45 papers. The topics discussed include: fast parallelalgorithms for the unit cost editing distance between trees;towards understanding exclusive read;on the parallel complexity of integer pro...
ISBN:
(纸本)089791323X
The proceedings contain 45 papers. The topics discussed include: fast parallelalgorithms for the unit cost editing distance between trees;towards understanding exclusive read;on the parallel complexity of integer programming;parallel RAMs with bounded memory wordsize;load balancing, selection and sorting on the hypercube;the power of parallel pointer manipulation;the communication complexity of several problems in matrix compntation;embedding of d-dimensional grids into optimal hypercubes;deterministic P-RAM simulation with constant redundancy;processor networks and interconnection networks without long wires;a lower bound on the size of shellsort sorting networks;cost-bandwidth tradeoffs for communication networks;a more practical PRAM model;the APRAM: incorporating asynchrony into the PRAM model;fault tolerance in hypercube-derivative networks;square meshes are not always optimal;parallel graph contraction;and the virtual time machine.
The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-bo...
ISBN:
(纸本)0897916719
The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-body simulation;a comparison of parallelalgorithms for connected components;studying overheads in massively parallel min/max-tree evaluation;scheduling trees using FIFO queues: a control-memory tradeoff;SIMD instruction cache;dynamic parallel tree contraction;improved bounds for routing and sorting on multi-dimensional meshes;parallel sorting by overpartitioning;an optimal randomized logarithmic time connectivity algorithm for the EREW PRAM;list ranking and list scan on the CRAY C-90;modeling communication in parallelalgorithms: a fruitful interaction between theory and systems?;on testing cache-coherent shared memories;diffracting trees;programming abstract DEC-alpha based multiprocessors the easy way;scheduling parallelizable tasks to minimize average response time;and an analysis of diffusive load-balancing.
Speculative data-parallelalgorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular e...
详细信息
Tridiagonalization, which is a key step in symmetric eigenvalue decomposition (EVD), aims to convert a symmetric matrix to a tridiagonal form. In Nvidia's cuSOLVER library, the FP64 precision tridiagonalization pr...
详细信息
ISBN:
(纸本)9798400714436
Tridiagonalization, which is a key step in symmetric eigenvalue decomposition (EVD), aims to convert a symmetric matrix to a tridiagonal form. In Nvidia's cuSOLVER library, the FP64 precision tridiagonalization process only reach 2.1 TFLOPs out of 67 TFLOPs on H100 GPU, and it consumes a significant portion of the elapsed time in the entire EVD process, accounting for over 97%. Thus, improving the tridiagonalization performance is crucial on accelerating EVD. In this paper, we analyze the reasons behind the suboptimal performance of tridiagonalization on GPU architectures, and we propose a new double blocking band reduction algorithm along with an implementation of GPU-based bulge chasing to improve the tridiagonalization performance. Through experimental evaluation, the proposed FP64 precision tridiagonalization method yields up to 19.6 TFLOPs which is 9.3x and 5.2x faster compared cuSOVLER and MAGMA, respectively.
Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Tes...
详细信息
ISBN:
(纸本)9798400714436
Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Testing (BGT) is the state-of-the-art approach, which integrates prior risk information into a Bayesian Boolean Lattice framework to minimize test counts and reduce false classifications. However, BGT, like other existing group testing techniques, struggles with multinomial group testing, where samples have multiple binary-classifiable attributes that can be individually distinguished simultaneously. We address this need by proposing Bayesian Multinomial Group Testing (BMGT), which includes a new Bayesian-based model and supporting theorems for an efficient and precise multinomial pooling strategy. We further design and develop SBMGT, a high-performance and scalable framework to tackle BMGT's computational challenges by proposing three key innovations: 1) a parallel binaryencoded product lattice model with up to 99.8% efficiency;2) the Bayesian Balanced Partitioning Algorithm (BBPA), a multinomial pooling strategy optimized for parallel computation with up to 97.7% scaling efficiency on 4096 cores;and 3) a scalable multinomial group testing analytics framework, demonstrated in a real-world disease surveillance case study using AIDS and STDs datasets from Uganda, where SBMGT reduced tests by up to 54% and lowered false classification rates by 92% compared to BGT.
This paper presents Snapgram, an innovative assignment for teaching matrix multiplications in the context of a parallel Computing course. Designed for fourth-year undergraduate at the University of California, Davis, ...
详细信息
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksun...
详细信息
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksunder adversarial traffic;balanced graph partitioning;online hierarchical cooperative caching;scheduling against an adversarial network;effectively sharing a cache among threads;online algorithms for network design and dynamic analysis of the arrow distributed protocol.
暂无评论