Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (M...
详细信息
ISBN:
(纸本)9798400714436
Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (ML) tools have been leveraged in MLIPs, but they are not perfectly matched with each other, since many optimization opportunities in MLIPs have been missed by ML tools. this inefficiency arises from the fact that HPC+AI applications work with far more computational complexity compared with pure AI scenarios. this paper has developed an MLIP, named TensorMD, independently from any ML tool. TensorMD has been evaluated on two supercomputers and scaled to 51.8 billion atoms, i.e., similar to 3x compared with state-of-the-art.
the proceedings contains 14 papers from the conference on the Proceedings of the acmsigplansymposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysi...
详细信息
the proceedings contains 14 papers from the conference on the Proceedings of the acmsigplansymposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysis: a framework for optimizing speculative execution;pointer and escape analysis for multithread programs;language support for motion-order matrices;efficient load balancing for wide-area divide-and-conquer applications;scalable queue-based spin locks with timeout;contention ellimination by replication of sequential sections in distributed shared memory programs;and accurate data redistribution cost estimation in software distributes shared memory systems.
the proceedings contain 26 papers. the topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programm...
ISBN:
(纸本)0897915895
the proceedings contain 26 papers. the topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programming paradigm;experiences using the ParaScope editor: an interactive parallelprogramming tool;perturbation analysis of high level instrumentation for SPMD programs;integrating message-passing and shared-memory: early experience;using scheduler information to achieve optimal barrier synchronization performance;and a concurrent copying garbage collector for languages that distinguish (im)mutable data.
the proceedings contain 58 papers. the topics discussed include: beyond human-level accuracy: computational challenges in deep learning;throughput-oriented GPU memory allocation;SEP-graph: finding shortest execution p...
ISBN:
(纸本)9781450362252
the proceedings contain 58 papers. the topics discussed include: beyond human-level accuracy: computational challenges in deep learning;throughput-oriented GPU memory allocation;SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU;incremental flattening for nested data parallelism;modular transactions: bounding mixed races in space and time;processing transactions in a predefined order;data-flow/dependence profiling for structured transformations;lightweight hardware transactional memory profiling;provably and practically efficient granularity control;semantics-aware scheduling policies for synchronization determinism;and a round-efficient distributed betweenness centrality algorithm.
the proceedings contain 43 papers. the topics discussed include: predator: predictive false sharing detection;concurrency testing using schedule bounding: an empirical study;trace driven dynamic deadlock detection and...
详细信息
ISBN:
(纸本)9781450326568
the proceedings contain 43 papers. the topics discussed include: predator: predictive false sharing detection;concurrency testing using schedule bounding: an empirical study;trace driven dynamic deadlock detection and reproduction;efficient search for inputs causing high floating-point errors;portable, MPI-interoperable coarray Fortran;eliminating global interpreter locks in ruby through hardware transactional memory;leveraging hardware message passing for efficient thread synchronization;well-structured futures and cache locality;time-warp: lightweight abort minimization in transactional memory;beyond parallelprogramming with domain specific languages;a decomposition for in-place matrix transposition;in-place transposition of rectangular matrices on accelerators;and parallelizing dynamic programmingthrough rank convergence.
the proceedings contain 44 papers. the topics discussed include: predicate RCU: an RCU for scalable concurrent updates;automatic scalable atomicity via semantic locking;a framework for practical parallel fast matrix m...
ISBN:
(纸本)9781450332057
the proceedings contain 44 papers. the topics discussed include: predicate RCU: an RCU for scalable concurrent updates;automatic scalable atomicity via semantic locking;a framework for practical parallel fast matrix multiplication;PLUTO+: near-complete modeling of affine transformations for parallelism and locality;distributed memory code generation for mixed irregular/regular computations;performance implications of dynamic memory allocators on transactional memory systems;low-overhead software transactional memory with progress guarantees and strong semantics∗;barrier elision for production parallel programs;scalable and efficient implementation of 3D unstructured meshes computation: a case study on matrix assembly;and diagnosing the causes and severity of one-sided message contention.
the proceedings contain 42 papers. the topics discussed include: automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories;type inference for locality anal...
ISBN:
(纸本)9781595939609
the proceedings contain 42 papers. the topics discussed include: automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories;type inference for locality analysis of distributed data structures;quasi-static scheduling for safe futures;scalable packet classification using interpreting: a cross-platform multi-core solution;FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue;matrix product on heterogeneous master-worker platforms;high performance dense linear algebra on a spatially distributed processor;optimization principles and application performance evaluation of a multithreaded GPU using CUDA;a case study in SIMD text processing withparallel bit streams: UTF-8 to UTF-16 transcoding;programming with tiles;design and implementation of a high-performance MPI for C# and the common language infrastructure;and a portable runtime interface for multi-level memory hierarchies.
the proceedings contain 39 papers. the topics discussed include: ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms;programmingthe memory hierarchy revisited: supporting ir...
ISBN:
(纸本)9781450301190
the proceedings contain 39 papers. the topics discussed include: ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms;programmingthe memory hierarchy revisited: supporting irregular parallelism in sequoia;compact data structure and scalable algorithms for the sparse grid technique;a domain-specific approach to heterogeneous parallelism;Copperhead: compiling an embedded data parallel language;OoOJava: software out-of-order execution;SpiceC: scalable parallelism via implicit copying and explicit commit;inferring ownership transfer for efficient message passing;all-window profiling and composable models of cache sharing;ULCC: a user-level facility for optimizing shared cache performance on multicores;ScalaExtrap: trace-based communication extrapolation for SPMD programs;and GRace: a low-overhead mechanism for detecting data races in GPU programs.
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithread...
ISBN:
(纸本)9781450311601
the proceedings contain 57 papers. the topics discussed include: scalable framework for mapping streaming applications onto multi-GPU systems;efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors;extending a C-like language for portable SIMD programming;DOJ: dynamically parallelizing object-oriented programs;GPU-based NFA implementation for memory efficient high speed regular expression matching;concurrent tries with efficient non-blocking snapshots;deterministic parallel random-number generation for dynamic-multithreading platforms;algorithm-based fault tolerance for dense matrix factorizations;revisiting the combining synchronization technique;FlexBFS: a parallelism-aware implementation of breadth-first search on GPU;optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA;and the boat hull model: adapting the roofline model to enable performance prediction for parallel computing.
暂无评论