Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (M...
详细信息
ISBN:
(纸本)9798400714436
Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (ML) tools have been leveraged in MLIPs, but they are not perfectly matched with each other, since many optimization opportunities in MLIPs have been missed by ML tools. this inefficiency arises from the fact that HPC+AI applications work with far more computational complexity compared with pure AI scenarios. this paper has developed an MLIP, named TensorMD, independently from any ML tool. TensorMD has been evaluated on two supercomputers and scaled to 51.8 billion atoms, i.e., similar to 3x compared with state-of-the-art.
the proceedings contains 21 papers from the Fifth ACM SIGPLAN symposium on principles & practice of parallelprogramming PPOPP. Topics discussed include data parallel programs;data libraries;data caches;data acces...
详细信息
the proceedings contains 21 papers from the Fifth ACM SIGPLAN symposium on principles & practice of parallelprogramming PPOPP. Topics discussed include data parallel programs;data libraries;data caches;data access;distributed and shared memory multiprocessors;dataflow analysis;scheduling;optimization;and synchronization.
the proceedings contains 25 papers. Topics discussed include data and task parallelism, irregular applications, coherence protocols, shared memory, compilers and performances issue.
the proceedings contains 25 papers. Topics discussed include data and task parallelism, irregular applications, coherence protocols, shared memory, compilers and performances issue.
the symposium materials contain 26 papers covering the spectrum from models of parallel computing to implementation techniques, and from compilation algorithms to application development tools and case studies, thus s...
详细信息
ISBN:
(纸本)0897915895
the symposium materials contain 26 papers covering the spectrum from models of parallel computing to implementation techniques, and from compilation algorithms to application development tools and case studies, thus satisfying the goal of broadly covering the active areas of parallelprogramming research.
the proceedings contains 14 papers from the conference on the proceedings of the ACM SIGPLAN symposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysi...
详细信息
the proceedings contains 14 papers from the conference on the proceedings of the ACM SIGPLAN symposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysis: a framework for optimizing speculative execution;pointer and escape analysis for multithread programs;language support for motion-order matrices;efficient load balancing for wide-area divide-and-conquer applications;scalable queue-based spin locks with timeout;contention ellimination by replication of sequential sections in distributed shared memory programs;and accurate data redistribution cost estimation in software distributes shared memory systems.
the proceedings contain 26 papers. the topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programm...
ISBN:
(纸本)0897915895
the proceedings contain 26 papers. the topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programming paradigm;experiences using the ParaScope editor: an interactive parallelprogramming tool;perturbation analysis of high level instrumentation for SPMD programs;integrating message-passing and shared-memory: early experience;using scheduler information to achieve optimal barrier synchronization performance;and a concurrent copying garbage collector for languages that distinguish (im)mutable data.
the proceedings contain 46 papers. the topics discussed include: stream processing with dependency-guided synchronization;mashup: making serverless computing useful for HPC workflows via hybrid execution;parallel bloc...
ISBN:
(纸本)9781450392044
the proceedings contain 46 papers. the topics discussed include: stream processing with dependency-guided synchronization;mashup: making serverless computing useful for HPC workflows via hybrid execution;parallel block-delayed sequences;near-optimal sparse Allreduce for distributed deep learning;Vapro: performance variance detection and diagnosis for production-run parallel applications;interference relation-guided SMT solving for multi-threaded program verification;extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms;scaling graph traversal to 281 trillion edges with 40 million cores;asymmetry-aware scalable locking;the performance power of software combining in persistence;and multi-queues can be state-of-the-art priority schedulers.
the proceedings contain 43 papers. the topics discussed include: predator: predictive false sharing detection;concurrency testing using schedule bounding: an empirical study;trace driven dynamic deadlock detection and...
详细信息
ISBN:
(纸本)9781450326568
the proceedings contain 43 papers. the topics discussed include: predator: predictive false sharing detection;concurrency testing using schedule bounding: an empirical study;trace driven dynamic deadlock detection and reproduction;efficient search for inputs causing high floating-point errors;portable, MPI-interoperable coarray Fortran;eliminating global interpreter locks in ruby through hardware transactional memory;leveraging hardware message passing for efficient thread synchronization;well-structured futures and cache locality;time-warp: lightweight abort minimization in transactional memory;beyond parallelprogramming with domain specific languages;a decomposition for in-place matrix transposition;in-place transposition of rectangular matrices on accelerators;and parallelizing dynamic programmingthrough rank convergence.
the proceedings contain 44 papers. the topics discussed include: predicate RCU: an RCU for scalable concurrent updates;automatic scalable atomicity via semantic locking;a framework for practical parallel fast matrix m...
ISBN:
(纸本)9781450332057
the proceedings contain 44 papers. the topics discussed include: predicate RCU: an RCU for scalable concurrent updates;automatic scalable atomicity via semantic locking;a framework for practical parallel fast matrix multiplication;PLUTO+: near-complete modeling of affine transformations for parallelism and locality;distributed memory code generation for mixed irregular/regular computations;performance implications of dynamic memory allocators on transactional memory systems;low-overhead software transactional memory with progress guarantees and strong semantics∗;barrier elision for production parallel programs;scalable and efficient implementation of 3D unstructured meshes computation: a case study on matrix assembly;and diagnosing the causes and severity of one-sided message contention.
the proceedings contain 48 papers. the topics discussed include: efficient algorithms for persistent transactional memory;investigating the semantics of futures in transactional memory systems;constant-time snapshots ...
ISBN:
(纸本)9781450382946
the proceedings contain 48 papers. the topics discussed include: efficient algorithms for persistent transactional memory;investigating the semantics of futures in transactional memory systems;constant-time snapshots with applications to concurrent data structures;reasoning about recursive tree traversals;synthesizing optimal collective algorithms;scaling implicit parallelism via dynamic control replication;efficiently reclaiming memory in concurrent search data structures while bounding wasted memory;are dynamic memory managers on GPUs slow? a survey and benchmarks;improving communication by optimizing on-node data movement with data layout;and Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.
暂无评论