The proceedings contain 114 papers. The topics discussed include: cost-optimal execution of Boolean query trees with shared streams;exploiting geometric partitioning in task mapping for parallel computers;communicatio...
ISBN:
(纸本)9780769552071
The proceedings contain 114 papers. The topics discussed include: cost-optimal execution of Boolean query trees with shared streams;exploiting geometric partitioning in task mapping for parallel computers;communication-efficient distributed variance monitoring and outlier detection for multivariate time series;pythia: faster big data in motion through predictive software-defined network optimization at runtime;power and performance characterization and modeling of GPU-accelerated systems;scibox: online sharing of scientific data via the cloud;active measurement of the impact of network switch utilization on application performance;multi-resource real-time reader/writer locks for multiprocessors;remote invalidation: optimizing the critical path of memory transactions;and revisiting asynchronous linear solvers: provable convergence rate through randomization.
The proceedings contain 119 papers. The special focus in this conference is on parallel and distributedprocessing and applications. The topics include: Present and future supercomputer architectures;challenges in P2P...
ISBN:
(纸本)9783540241287
The proceedings contain 119 papers. The special focus in this conference is on parallel and distributedprocessing and applications. The topics include: Present and future supercomputer architectures;challenges in P2P computing;multihop wireless Ad Hoc networking: current challenges and future opportunities;an inspector-executor algorithm for irregular assignment;multi-grain parallelprocessing of data-clustering on programmable graphics hardware;a parallel reed-solomon decoder on the imagine stream processor;asynchronous document dissemination in dynamic Ad Hoc networks;location-dependent query results retrieval in a multi-cell wireless;an efficient mobile data mining model;towards correct distributed simulation of high-level petri nets with fine-grained partitioning;m-guard: a new distributed deadlock detection algorithm based on mobile agent technology;meta-based distributed computing framework;locality optimizations for jacobi iteration on distributedparallel;fault-tolerant cycle embedding in the WK-recursive network;RAIDb: redundant array of inexpensive databases;a fault-tolerant multi-agent development framework;a fault tolerance protocol for uploads: design and evaluation;topological adaptability for the distributed token circulation paradigm in faulty environment;adaptive data dissemination in wireless sensor networks;design and analysis of a k-connected topology control algorithm for Ad Hoc networks;on using temporal consistency for parallel execution of real-time queries in wireless sensor systems;cluster-based parallel simulation for large scale molecular dynamics in microscale thermophysics;parallel checkpoint/recovery on cluster of IA-64 computers;an enhanced message exchange mechanism in cluster-based mobile;a scalable low discrepancy point generator for parallel computing;generalized trellis stereo matching with systolic array.
The proceedings contain 108 papers. The topics discussed include: adaptive incremental checkpointing via delta compression for networked multicore systems;towards scalable checkpoint restart: a collective inline memor...
The proceedings contain 108 papers. The topics discussed include: adaptive incremental checkpointing via delta compression for networked multicore systems;towards scalable checkpoint restart: a collective inline memory contents deduplication proposal;on closed nesting and checkpointing in fault-tolerant distributed transactional memory;reliable service allocation in clouds;scaling and scheduling to maximize application performance within budget constraints in cloud workflows;optimizing resource allocation while handling SLA violations in cloud computing platforms;high-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms;self-adaptive OmpSs tasks in heterogeneous environments;an analytical performance model for partitioning off-chip memory bandwidth;a case for handshake in nanophotonic interconnects;and optimizations and analysis of BSP graph processing models on public clouds.
Minimizing the Gaussian Curvature of triangular meshes can have important applications in 3D computer vision and graphics. However, traditional explicit methods require solving high-order partial differential equation...
详细信息
ISBN:
(数字)9798331520526
ISBN:
(纸本)9798331520533
Minimizing the Gaussian Curvature of triangular meshes can have important applications in 3D computer vision and graphics. However, traditional explicit methods require solving high-order partial differential equations which makes them computationally demanding and impractical in many applications. This paper presents a very fast and efficient adaptive filtering technique termed Gaussian Curvature Filtering (GCF) which optimizes the Gaussian curvature of the triangular meshes through exploiting the properties of developable surfaces. By moving a vertex along its normal direction such that one of its 1-ring neighbors falls onto the vertex's tangent plane, GCF minimizes Gaussian curvature without explicitly computing the Gaussian curvature. A novel multi tangent plane projection strategy is developed to adaptively determine a vertex's moving distance which enables the GCF to achieve Gaussian curvature minimization while preserving important geometric features. We present extensive experiments to demonstrate that GCF outperforms state of the art methods in Gaussian curvature minimization and shape-preserving model smoothing, and that it is $7\sim 50$ times faster than previous explicit optimization methods.
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds bl...
ISBN:
(纸本)9798350364606
The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds blossom algorithm for maximum cardinality matching in general graphs;a reconfigurable architecture of a scalable, ultrafast, ultrasound, delay-and-sum beamformer;scheduling and allocation of disaggregated memory resources in HPC systems;GIM (ghost in the machine): a coarse-grained reconfigurable compute-in-memory platform for exploring machine-learning architectures;further optimizations and analysis of smith-waterman with vector extensions;measurement-based quantum approximate optimization;optimizing forward wavefield storage leveraging high-speed storage media;teaching performance metrics in parallel computing courses;and compiler-driven Swar parallelism for high-performance bitboard algorithms.
The proceedings contain 132 papers. The topics discussed include: an innovative replica consistency model in data grids;an effective approach for consistency management of replicas in data grid;a novel QoS-enable real...
ISBN:
(纸本)9780769534718
The proceedings contain 132 papers. The topics discussed include: an innovative replica consistency model in data grids;an effective approach for consistency management of replicas in data grid;a novel QoS-enable real-time publish-subscribe service;towards practical virtual server-based load balancing for distributed Hash tables;self-stabilizing construction of bounded size clusters;authorization using the publish-subscribe model;trust management for ubiquitous healthcare;integrating security solutions to support nanoCMOS electronics research;misusing Kademlia protocol to perform DDoS attacks;analyzing the efficiency and bottleneck of scientific programs on imagine stream processor by simulation;parallelism without pain: orchestrating computational algebra components into a high-performance parallel system;and parallelisation of a Valgrind dynamic binary instrumentation framework.
Communication is a key bottleneck for distributed graph neural network (GNN) training. Existing GNN training systems fail to scale to deep GNNs because of the tremendous amount of inter-GPU communication. This paper p...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
Communication is a key bottleneck for distributed graph neural network (GNN) training. Existing GNN training systems fail to scale to deep GNNs because of the tremendous amount of inter-GPU communication. This paper proposes Mithril, a new approach that significantly scales the distributed full-graph deep GNN training. Being the first to use layer-level model parallelism for GNN training, Mithril partitions GNN layers among GPUs, each device performs the computation for a disjoint subset of consecutive GNN layers on the whole graph. Compared to graph parallelism with each GPU handling a graph partition, Mithril reduces the communication volume by a factor of the number of GNN layers to scale to deep models. Mithril overcomes the unique challenges for pipelined layer-level model parallelism on the whole graph by partitioning it into dependent chunks, breaking the dependencies with embedding speculation, and applying specific training techniques to ensure convergence. We also propose a hybrid approach by combining Mithril with graph parallelism to handle large graphs, achieve better computer resource utilization and ensure model convergence. We build a general GNN training system supporting all three parallelism settings. Extensive experiments show that Mithril reduces the perepoch communication volume by up to $22.89 \times$ (on average $6.78 \times$). It achieves a maximum training time speedup of $2.34 \times$ (on average $1.49 \times$) on a GPU cluster with a high-performance InfiniBand network. On another cluster with a commodity Ethernet, Mithril outperforms the baseline by up to $10.21 \times$ (on average $7.16 \times$). Mithril also achieves a comparable level of model accuracy and convergence speed compared to graph parallelism.
The proceedings contain 123 papers. The topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel...
ISBN:
(纸本)9781665481069
The proceedings contain 123 papers. The topics discussed include: challenges and opportunities in designing high-performance and scalable middleware for HPC and ai: past, present, and future;HTS: a threaded multilevel sparse hybrid solver;a scalable adaptive-matrix SPMV for heterogeneous architectures;direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context;distributed-memory sparse kernels for machine learning;fam-graph: graph analytics on disaggregated memory;scalable multi-versioning ordered key-value stores with persistent memory support;in-memory indexed caching for distributed data processing;landau collision operator in the CUDA programming model applied to thermal quench plasmas;exploiting reduced precision for GPU-based time series mining;and MICCO: an enhanced multi-GPU scheduling framework for many-body correlation functions.
The proceedings contain 105 papers. The topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analy...
ISBN:
(纸本)9781665440660
The proceedings contain 105 papers. The topics discussed include: a tale of two C’s: convergence and composability;DSXplore: optimizing convolutional neural networks via sliding-channel convolutions;an in-depth analysis of distributed training of deep neural networks;scalable epidemiological workflows to support COVID-19 planning and response;AlphaR: learning-powered resource management for irregular, dynamic microservice graph;distributed-memory multi-GPU block-sparse tensor contraction for electronic structure;correlation-wise smoothing: lightweight knowledge extraction for HPC monitoring data;designing high-performance MPI libraries with on-the-fly compression for modern GPU clusters;Nowa: a wait-free continuation-stealing concurrency platform;noise-resilient empirical performance modeling with deep neural networks;and communication-avoiding and memory-constrained sparse matrix-matrix multiplication at extreme scale.
The proceedings contain 145 papers. The topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architec...
ISBN:
(纸本)9781728174457
The proceedings contain 145 papers. The topics discussed include: towards stability in the chapel language;the GraphIt universal graph framework: achieving high-performance across algorithms, graph types, and architectures;analyzing deep learning model inferences for image classification using OpenVINO;an automated machine learning approach for data locality optimizations in chapel;teaching modern multithreading in CS2 with actors;PHRYCTORIA: a messaging system for transprecision OpenCAPI-attached FPGA accelerators;machine learning-based prefetching for SCM main memory system;a microcode-based control unit for deep learning processors;and silent data access protocol for NVRAM+RDMA distributed storage.
暂无评论