The proceedings contain 9 papers. The topics discussed include: semantic-aware lossless data compression for deep learning recommendation model (DLRM);Colmena: scalable machine-learning-based steering of ensemble simu...
ISBN:
(纸本)9781665411240
The proceedings contain 9 papers. The topics discussed include: semantic-aware lossless data compression for deep learning recommendation model (DLRM);Colmena: scalable machine-learning-based steering of ensemble simulations for highperformancecomputing;production deployment of machine-learned rotorcraft surrogate models on HPC;high-performance deep learning toolbox for genome-scale prediction of protein structure and function;HPCFAIR: enabling FAIR AI for HPC applications;HPC ontology: towards a unified ontology for managing training datasets and AI models for high-performancecomputing;HYPPO: a surrogate-based multi-level parallelism tool for hyperparameter optimization;and is disaggregation possible for hpc cognitive simulation?.
The proceedings contain 11 papers. The topics discussed include: accelerate distributed stochastic descent for Nonconvex optimization with momentum;accelerating GPU-based machinelearning in python using MPI library: ...
ISBN:
(纸本)9780738110783
The proceedings contain 11 papers. The topics discussed include: accelerate distributed stochastic descent for Nonconvex optimization with momentum;accelerating GPU-based machinelearning in python using MPI library: a case study with MVAPICH2-GDR;deep learning-based low-dose tomography reconstruction with hybrid-dose measurements;EventGraD: event-triggered communication in parallel stochastic gradient descent;a benders decomposition approach to correlation clustering;high-bypass learning: automated detection of tumor cells that significantly impact drug response;deep generative models that solve PDEs: distributed computing for training large data-free models;automatic particle trajectory classification in plasma simulations;reinforcement learning-based solution to power grid planning and operation under uncertainties;and predictions of steady and unsteady flows using machine-learned surrogate models.
The proceedings contain 8 papers. The topics discussed include: scalable hyperparameter optimization with lazy Gaussian processes;understanding scalability and fine-grain parallelism of synchronous data parallel train...
ISBN:
(纸本)9781728159850
The proceedings contain 8 papers. The topics discussed include: scalable hyperparameter optimization with lazy Gaussian processes;understanding scalability and fine-grain parallelism of synchronous data parallel training;DisCo: physics-based unsupervised discovery of coherent structures in spatiotemporal systems;GradVis: visualization and second order analysis of optimization surfaces during the training of deep neural networks;metaoptimization on a distributed system for deep reinforcement learning;scheduling optimization of parallel linear algebra algorithms using supervised learning;parallel data-local training for optimizing Word2Vec embeddings for word and graph embeddings;and fine-grained exploitation of mixed precision for faster CNN training.
The synergy of edge computing and machinelearning (ML) holds immense potential for revolutionizing Internet of Things (IoT) applications, particularly in scenarios characterized by high-speed, continuous data generat...
详细信息
The synergy of edge computing and machinelearning (ML) holds immense potential for revolutionizing Internet of Things (IoT) applications, particularly in scenarios characterized by high-speed, continuous data generation. Offline ML algorithms struggle with streaming data as they rely on static datasets for model construction. In contrast, Online machinelearning (OML) adapts to changing environments by training the model with each new observation in real-time. However, developing OML algorithms introduces complexities such as bias and variance considerations, making the selection of suitable estimators challenging. In this challenging landscape, ensemble learning emerges as a promising approach, offering a strategic framework to navigate the bias-variance tradeoff and enhance prediction accuracy by amalgamating outputs from diverse ML models. This paper introduces a novel ensemble method tailored for edge computingenvironments, designed to efficiently operate on resource-constrained devices while accommodating various online learning scenarios. The primary objective is to enhance predictive accuracy at the edge, thereby empowering IoT applications with robust decision-making capabilities. Our study addresses the critical challenges of ML in resource-constrained edge computingenvironments, offering practical insights for enhancing predictive accuracy and scalability in IoT applications. To validate our ensemble's efficacy, we conducted comprehensive experimental evaluations leveraging both synthetic and real-world datasets. The results indicate that our ensemble surpassed state-of-the-art data stream algorithms and ensemble regressors across a range of regression metrics, underlining its superior predictive prowess. Furthermore, we scrutinized the ensemble's performance within the realm of auto-scaling for Virtual Network Function (VNF)-based applications situated at the network's edge, thereby elucidating its applicability and scalability in real-world scenarios.
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performancecomputing (HPC). In recent years, the field of machinelearning has also seen signif...
详细信息
ISBN:
(纸本)9781665411240
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performancecomputing (HPC). In recent years, the field of machinelearning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machinelearning (SciML). One challenging workload for CogSim is the replacement of one component in a ...
详细信息
ISBN:
(纸本)9781665411240
Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machinelearning (SciML). One challenging workload for CogSim is the replacement of one component in a complex physical simulation with a fast, learned, surrogate model that is "inside" of the computational loop. The execution of this in-the-loop inference is particularly challenging because it requires frequent inference across multiple possible target models, can be on the simulation's critical path (latency bound), is subject to requests from multiple MPI ranks, and typically contains a small number of samples per request. In this paper we explore the use of large, dedicated Deep learning / AI accelerators that are disaggregated from compute nodes for this CogSim workload. We compare the trade-offs of using these accelerators versus the node-local GPU accelerators on leadership-class HPC systems.
As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep learning Recommendation Model (DLRM), a new neural network for recommendation systems, introd...
详细信息
ISBN:
(纸本)9781665411240
As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. Unlike other deep neural networks, DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference. In this paper, we propose a novel approach to reduce the communication volume by using DLRM's properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM MLPerf on eight AMD InstinctT MI100 accelerators. The experimental results show 59% and 38% improvement in the time-to-solution of the DLRM MLPerf training for FP32 and mixed-precision, respectively.
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machinelearning (ML) to create proxy...
详细信息
ISBN:
(纸本)9781665411240
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machinelearning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65 536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
暂无评论