The proceedings contain 46 papers. The topics discussed include: exascale computing: the challenges and opportunities in the next decade;structure-driven optimizations for amorphous data-parallel programs;compiler aid...
ISBN:
(纸本)9781605587080
The proceedings contain 46 papers. The topics discussed include: exascale computing: the challenges and opportunities in the next decade;structure-driven optimizations for amorphous data-parallel programs;compiler aided selective lock assignment for improving the performance of software transactional memory;debugging programs that use atomic blocks and transactional memory;scheduling support for transactional memory contention management;leveraging parallel nesting in transactional memory;extreme scale computing: challenges and opportunities;scalable communication protocols for dynamic sparse data exchange;thread to strand binding of parallel network applications in massive multi-threaded systems;improving parallelism and locality with asynchronous algorithms;using data structure knowledge for efficient lock generation and strong atomicity;modeling advanced collective communication algorithms on cell-based systems;and towards scalable and transparent parallelization of multiplayer games using transactional memory support.
The proceedings contain 33 papers. The topics discussed include: toward terabyte pattern mining: an architecture-conscious solution;expressing and exploiting concurrency in networked applications with aspen;disens: sc...
详细信息
ISBN:
(纸本)1595936025
The proceedings contain 33 papers. The topics discussed include: toward terabyte pattern mining: an architecture-conscious solution;expressing and exploiting concurrency in networked applications with aspen;disens: scalable distributed sensor network simulation;optimizing communication overlap for high-speed networks;open nesting in software transactional memory;implicit parallelism with ordered transactions;dynamic multigrain parallelization on the cell broadband engine;adaptive work stealing with parallelism feedback;self-adaptive applications on the grid;latency hiding through multithreading on a network processor;efficient nonblocking software transactional memory;adaptive structured parallelism for computational grids;promised messages: recovering from inconsistent global states;supporting fault-tolerance in streaming grid applications;and pervasive parallel computing: an historic opportunity for innovation in programming and architecture.
The proceedings contain 48 papers. The topics discussed include: efficient algorithms for persistent transactional memory;investigating the semantics of futures in transactional memory systems;constant-time snapshots ...
ISBN:
(纸本)9781450382946
The proceedings contain 48 papers. The topics discussed include: efficient algorithms for persistent transactional memory;investigating the semantics of futures in transactional memory systems;constant-time snapshots with applications to concurrent data structures;reasoning about recursive tree traversals;synthesizing optimal collective algorithms;scaling implicit parallelism via dynamic control replication;efficiently reclaiming memory in concurrent search data structures while bounding wasted memory;are dynamic memory managers on GPUs slow? a survey and benchmarks;improving communication by optimizing on-node data movement with data layout;and Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.
The proceedings contain 45 papers. The topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenn...
ISBN:
(纸本)9781450319225
The proceedings contain 45 papers. The topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.
The proceedings contain 43 papers. The topics discussed include: provably good randomized strategies for data placement in distributed key-value stores;provably fast and space-efficient parallel biconnectivity;practic...
ISBN:
(纸本)9798400700156
The proceedings contain 43 papers. The topics discussed include: provably good randomized strategies for data placement in distributed key-value stores;provably fast and space-efficient parallel biconnectivity;practically and theoretically efficient garbage collection for multiversioning;fast and scalable channels in Kotlin coroutines;high-performance GPU-to-CPU transpilation and optimization via high-level parallel constructs;lifetime-based optimization for simulating quantum circuits on a new Sunway supercomputer;merchandiser: data placement on heterogeneous memory for task-parallel HPC applications with load-balance awareness;visibility algorithms for dynamic dependence analysis and distributed coherence;Block-STM: scaling blockchain execution by turning ordering curse to a performance blessing;TDC: towards extremely efficient CNNs on GPUs via hardware-aware tucker decomposition;and improving energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems.
The proceedings contain 46 papers. The topics discussed include: kite: efficient and available release consistency for the datacenter;Oak: a scalable off-heap allocated key-value map;optimizing batched Winograd convol...
The proceedings contain 46 papers. The topics discussed include: kite: efficient and available release consistency for the datacenter;Oak: a scalable off-heap allocated key-value map;optimizing batched Winograd convolution on GPUs;taming unbalanced training workloads in deep learning with partial collective operations;scalable top-K retrieval with Sparta;waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data;scaling concurrent queues by using HTM to profit from failed atomic operations;a wait-free universal construction for large objects;using sample-based time series data for automated diagnosis of scalability losses in parallel programs;scaling out speculative execution of finite-state machines with parallel merge;and detecting and reproducing error-code propagation bugs in MPI implementations.
The proceedings contain 39 papers. The topics discussed include: ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms;programming the memory hierarchy revisited: supporting ir...
ISBN:
(纸本)9781450301190
The proceedings contain 39 papers. The topics discussed include: ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms;programming the memory hierarchy revisited: supporting irregular parallelism in sequoia;compact data structure and scalable algorithms for the sparse grid technique;a domain-specific approach to heterogeneous parallelism;Copperhead: compiling an embedded data parallel language;OoOJava: software out-of-order execution;SpiceC: scalable parallelism via implicit copying and explicit commit;inferring ownership transfer for efficient message passing;all-window profiling and composable models of cache sharing;ULCC: a user-level facility for optimizing shared cache performance on multicores;ScalaExtrap: trace-based communication extrapolation for SPMD programs;and GRace: a low-overhead mechanism for detecting data races in GPU programs.
The proceedings contain 44 papers. The topics discussed include: FastFold: optimizing AlphaFold training and inference on GPU clusters;liger: interleaving intra- and inter-operator parallelism for distributed large mo...
ISBN:
(纸本)9798400704352
The proceedings contain 44 papers. The topics discussed include: FastFold: optimizing AlphaFold training and inference on GPU clusters;liger: interleaving intra- and inter-operator parallelism for distributed large model inference;optimizing collective communications with error-bounded lossy compression for GPU clusters;OsirisBFT: say no to task replication for scalable byzantine fault tolerant analytics;RELAX: durable data structures with swift recovery;a row decomposition-based approach for sparse matrix multiplication on GPUs;Tetris: accelerating sparse convolution by exploiting memory reuse on GPU;scaling up transactions with slower clocks;towards scalable unstructured mesh computations on shared memory many-cores;AGAThA: fast and efficient GPU acceleration of guided sequence alignment for long read mapping;and shared memory-contention-aware concurrent DNN execution for diversely heterogeneous system-on-chips.
The proceedings contain 57 papers. The topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithread...
ISBN:
(纸本)9781450311601
The proceedings contain 57 papers. The topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors;extending a C-like language for portable SIMD programming;DOJ: dynamically parallelizing object-oriented programs;GPU-based NFA implementation for memory efficient high speed regular expression matching;concurrent tries with efficient non-blocking snapshots;deterministic parallel random-number generation for dynamic-multithreading platforms;algorithm-based fault tolerance for dense matrix factorizations;revisiting the combining synchronization technique;FlexBFS: a parallelism-aware implementation of breadth-first search on GPU;optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA;and the boat hull model: adapting the roofline model to enable performance prediction for parallel computing.
The proceedings contain 21 papers. The topics discussed include: non-intrusive and interactive profiling in parasight;using data partitioning to implement a parallel assembler;efficient interprocedural analysis for pr...
ISBN:
(纸本)0897912764
The proceedings contain 21 papers. The topics discussed include: non-intrusive and interactive profiling in parasight;using data partitioning to implement a parallel assembler;efficient interprocedural analysis for program parallelization and restructuring;restructuring lisp programs for concurrent execution;compiling Fortran 8x array features for the connection machine computer system;compiling C∗ programs for a hypercube multicomputer;automatic discovery of parallelism: a tool and an experiment;an open environment for building parallelprogramming systems;parallel discrete-event simulation of FCFS stochastic queueing networks;Qlisp: experience and new directions;program development for a systoiic array;experiences with Poker;Soar/PSM-E: investigating match parallelism in a learning production system;applications experience with Linda;on the implementation of applicative languages on shared-memory, MIMD multiprocessors;large-scale parallelprogramming: experience with the BBN butterfly parallel processor;the parallel decomposition and implementation of an integrated circuit global router;and characterizing the synchronization behavior of parallel programs.
暂无评论