Rapidly changing computer architectures, such as those found at high-performance computing (HPC) facilities, present the need for mini-applications (miniapps) that capture essential algorithms used in large applicatio...
详细信息
ISBN:
(纸本)9781665422871
Rapidly changing computer architectures, such as those found at high-performance computing (HPC) facilities, present the need for mini-applications (miniapps) that capture essential algorithms used in large applications to test program performance and portability, aiding transitions to new systems. The COVID-19 pandemic has fueled a flurry of activity in computational drug discovery, including the use of supercomputers and GPU acceleration for massive virtual screens for therapeutics. Recent work targeting COVID-19 at the Oak Ridge Leadership Computing Facility (OLCF) used the GPU-accelerated program AutoDock-GPU to screen billions of compounds on the Summit supercomputer. In this paper we present the development of a new miniapp, miniAutoDock-GPU, that can be used to evaluate the performance and portability of GPU-accelerated prote-inligand docking programs on different computer architectures. These tests are especially relevant as facilities transition from petascale systems and prepare for upcoming exascale systems that will use a variety of GPU vendors. The key calculations, namely, the Lamarckian genetic algorithm combined with a local search using a Solis-Wets based random optimization algorithm, are implemented. We developed versions of the miniapp using several different programming models for GPU acceleration, including a version using the CUDA runtime API for NVIDIA GPUs, and the Kokkos middle-ware API which is facilitated by C++ template libraries. A third version, currently in progress, uses the HIP programming model. These efforts will help facilitate the transition to exascale systems for this important emerging HPC application, as well as its use on a wide range of heterogeneous platforms.
The proceedings contain 21 papers. The special focus in this conference is on Mining Humanistic Data. The topics include: Threat Landscape of Next Generation IoT-Enabled Smart Grids;towards a Smart Port: The Role of t...
ISBN:
(纸本)9783030491895
The proceedings contain 21 papers. The special focus in this conference is on Mining Humanistic Data. The topics include: Threat Landscape of Next Generation IoT-Enabled Smart Grids;towards a Smart Port: The Role of the Telecom Industry;a Graph-Based Extension for the Set-Based Model Implementing algorithms Based on Important Nodes;a Sentiment-Based Hotel Review Summarization Using Machine Learning Techniques;an Advanced Deep Learning Model for Short-Term Forecasting U.S. Natural Gas Price and Movement;fake News Detection Regarding the Hong Kong Events from Tweets;improving Movie Recommendation Systems Filtering by Exploiting User-Based Reviews and Movie Synopses;the Converging Triangle of Cultural Content, Cognitive Science, and Behavioral Economics;application and Algorithm: Maximal Motif Discovery for Biological Data in a Sliding Window;A New Approach to 5G and MEC Integration;fingerprints Recognition System-Based on Mobile Device Identification Using Circular String Pattern Matching Techniques;mining and Analysis of Air Quality Data to Aid Climate Change;business Aspects of the Neutral Host Model: The Immersive Video Services Case;combined 5G-Based Video Production and Distribution in a Crowded Stadium Event;dynamic Network Slicing: Challenges and Opportunities;dynamic Resource Allocation and Computation Offloading for Edge Computing System;Intelligent Orchestration of End-to-End Network Slices for the Allocation of Mission Critical Services over NFV architectures;on the Prediction of Future User Connections Based on Historical Records in Wireless Networks;Programmable Edge-to-Cloud Virtualization for 5G Media Industry: The 5G-MEDIA Approach.
GrAPL 2020: workshop on Graphs, architectures, Programming, and Learning, brings together two closely related topics - how the synthesis (representation) and analysis of graphs is supported in hardware and software, a...
The proceedings contain 8 papers. The topics discussed include: a block-oriented, parallel and collective approach to sparse indefinite preconditioning on GPUs;software prefetching for unstructured mesh applications;t...
ISBN:
(纸本)9781728101866
The proceedings contain 8 papers. The topics discussed include: a block-oriented, parallel and collective approach to sparse indefinite preconditioning on GPUs;software prefetching for unstructured mesh applications;there are trillions of little forks in the road. choose wisely! - estimating the cost and likelihood of success of constrained walks to optimize a graph pruning pipeline;scale-free graph processing on a NUMA machine;a fast and simple approach to merge and merge sort using wide vector instructions;impact of traditional sparse optimizations on a migratory thread architecture;mix-and-match: a model-driven runtime optimization strategy for BFS on GPUs;and high-performance GPU implementation of PageRank with reduced precision based on mantissa segmentation.
Heterogeneous parallelarchitectures present many challenges to application developers. One of the most important ones is the decision where to execute a specific task. As today's systems are often dynamic in natu...
详细信息
The proceedings contain 11 papers. The topics discussed include: on advanced Monte Carlo methods for linear algebra on advanced accelerator architectures;event-triggered communication in parallel computing;non-collect...
ISBN:
(纸本)9781728101767
The proceedings contain 11 papers. The topics discussed include: on advanced Monte Carlo methods for linear algebra on advanced accelerator architectures;event-triggered communication in parallel computing;non-collective scalable global network based on local communications;shift-collapse acceleration of generalized polarizable reactive molecular dynamics for machine learning-assisted computational synthesis of layered materials;communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations;dynamic load balancing of plasma and flow simulations;low thread-count Gustavson: a multithreaded algorithm for sparse matrix-matrix multiplication using perfect hashing;and a general-purpose hierarchical mesh partitioning method with node balancing strategies for large-scale numerical simulations.
Summarising distributed data is a central routine for parallel programming, lying at the core of widely used frameworks such as the map/reduce paradigm. In the IoT context it is even more crucial, being a privileged m...
详细信息
Summarising distributed data is a central routine for parallel programming, lying at the core of widely used frameworks such as the map/reduce paradigm. In the IoT context it is even more crucial, being a privileged mean to allow long-range interactions: in fact, summarising is needed to avoid data explosion in each computational unit. We introduce a new algorithm for dynamic summarising of distributed data, weighted multi-path, improving over the state-of-the-art multi-path algorithm. We validate the new algorithm in an archetypal scenario, taking into account sources of volatility of many sorts and comparing it to other existing implementations. We thus show that weighted multi-path retains adequate accuracy even in high-variability scenarios where the other algorithms are diverging significantly from the correct values.
The proceedings contain 11 papers. The topics discussed include: overcoming load imbalance for irregular sparse matrices;optimizing Word2Vec performance on multicore systems;parallel depth-first search for directed ac...
ISBN:
(纸本)9781450351362
The proceedings contain 11 papers. The topics discussed include: overcoming load imbalance for irregular sparse matrices;optimizing Word2Vec performance on multicore systems;parallel depth-first search for directed acyclic graphs;progressive load balancing of asynchronous algorithms;a case for migrating execution for irregular applications;pressure-driven hardware managed thread concurrency for irregular applications;an efficient data layout transformation algorithm for locality-aware parallel sparse FFT;spherical region queries on multicore architectures;evaluation of knight landing high bandwidth memory for HPC workloads;enabling work-efficiency for high performance vertex-centric graph analytics on GPUs;and accelerating energy games solvers on modern architectures.
The proceedings contain 9 papers. The special focus in this conference is on Accelerating Data Analysis and Data Management Systems Using Modern Processor and Storage architectures. The topics include: Efficient range...
ISBN:
(纸本)9783319561103
The proceedings contain 9 papers. The special focus in this conference is on Accelerating Data Analysis and Data Management Systems Using Modern Processor and Storage architectures. The topics include: Efficient range queries on modern CPUs;vectorized time series algorithms on modern commodity CPUs;compression-aware in-memory query processing;overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization;making in-memory databases fast on modern NICs;an analysis on modern hardware;locality-adaptive parallel hash joins using hardware transactional memory;an embedded in-memory DBMS enabling instant snapshot sharing and runtime fragility in main memory.
The proceedings contain 24 papers. The special focus in this conference is on Large Scale parallelism, Resilience, Persistence, Compiler Analysis, Optimization, Dynamic Computation, Languages, Run-time and Performance...
ISBN:
(纸本)9783319527086
The proceedings contain 24 papers. The special focus in this conference is on Large Scale parallelism, Resilience, Persistence, Compiler Analysis, Optimization, Dynamic Computation, Languages, Run-time and Performance Analysis. The topics include: An array programming approach;a new theory for memory wall;parallel and compositional analysis of message passing programs;fast approximate distance queries in unweighted graphs using bounded asynchrony;energy avoiding matrix multiply;language support for reliable memory regions;harnessing parallelism in multicore systems to expedite and improve function approximation;adaptive software caching for efficient NVRAM data persistence;an extended polyhedral model for SPMD programs and its use in static data race detection;polygonal iteration space partitioning;automatically optimizing stencil computations on many-core NUMA architectures;formalizing structured control flow graphs;automatic vectorization for MATLAB analyzing parallel programming models for magnetic resonance imaging;the importance of efficient fine-grain synchronization for many-core systems;optimizing LOBPCG;sparse matrix loop and data transformations in action;an automatic code generator for graph algorithms on GPUs;locality-aware task-parallel execution on GPUs;automatic copying of pointer-based data structures;adaptive parallelism mapping with varying optimization goals;the contention avoiding concurrent priority queue and evaluating performance of task and data coarsening in concurrent collections.
暂无评论