the proceedings contain 42 papers. the topics discussed include: an efficient compilation of coarse-grained reconfigurable architectures utilizing pre-optimized sub-graph mappings;evaluating micro-batch and data frequ...
ISBN:
(纸本)9781665469586
the proceedings contain 42 papers. the topics discussed include: an efficient compilation of coarse-grained reconfigurable architectures utilizing pre-optimized sub-graph mappings;evaluating micro-batch and data frequency for stream processing applications on multi-cores;a parallel approximation algorithm for the steiner forest problem;exploiting vector extensions to accelerate time series analysis;a neural network to estimate isolated performance from multi-program execution;a heuristic for constructing minimum average stretch spanning tree using betweenness centrality;accelerating distributed deep reinforcement learning by in-network experience sampling;parallel integer multiplication;advancing database system operators with near-data processing;and clustering datasets in cloud computing environment for user identification.
the proceedings contain 43 papers. the topics discussed include: building representative and balanced datasets of OpenMP parallel regions;oneAPI for GPUs and FPGAs: portability, yes!, performance portability, not quit...
ISBN:
(纸本)9781665414555
the proceedings contain 43 papers. the topics discussed include: building representative and balanced datasets of OpenMP parallel regions;oneAPI for GPUs and FPGAs: portability, yes!, performance portability, not quite;attack surface assessment for cybersecurity engineering in the automotive domain;a compact encoding of security logs for high performance activity detection;a federated content distribution system to build health data synchronization services;nonblocking data structures for distributed-memory machines: stacks as an example;bucket MapReduce: relieving the disk I/O intensity of data-intensive applications in MapReduce frameworks;an efficient practical non-blocking pagerank algorithm for large scale graphs;parallel asynchronous stochastic dual coordinate descent algorithm s for high efficiency and stable convergence;and a synchronized and dynamic distributed graph structure to allow the native distribution of multi-agent system simulations.
the proceedings contain 67 papers. the topics discussed include: windsurfing with APPA: automating computational fluid dynamics simulations of wind flow using cloud computing;parallel comparison of huge DNA sequences ...
ISBN:
(纸本)9781728165820
the proceedings contain 67 papers. the topics discussed include: windsurfing with APPA: automating computational fluid dynamics simulations of wind flow using cloud computing;parallel comparison of huge DNA sequences in multiple GPUs with block pruning;accelerating deep learning using multiple GPUs and FPGA-based 10GbE switch;adaptive load balancing based on machine learning for iterative parallel applications;switching at flit level: a congestion efficient flow control strategy for network-on-chip;robustness and energy-elasticity of crown schedules for sets of parallelizable tasks on many-core systems with DVFS;and scalable parallel genetic algorithm for solving large integer linear programming models derived from behavioral synthesis.
the proceedings contain 62 papers. the topics discussed include: evaluating built-in ECC of FPGA on-chip memories for the mitigation of undervolting faults;parallel computing in deep learning: bioinformatics case stud...
ISBN:
(纸本)9781728116440
the proceedings contain 62 papers. the topics discussed include: evaluating built-in ECC of FPGA on-chip memories for the mitigation of undervolting faults;parallel computing in deep learning: bioinformatics case studies;bloom filter cascade application to SQL query implementation on spark;pragma-oriented parallelization of the direct sparse odometry SLAM algorithm;analyzing the impact of operating system activity of different Linux distributions in a distributed environment;optimizing the ceph distributed file system for high performance computing;and tuning genetic algorithms for resource provisioning and scheduling in uncertain cloud environments: challenges and findings.
the proceedings contain 114 papers. the topics discussed include: a generic learning multi-agent-system approach for spatio-temporal-, thermal- and energy-aware scheduling;a unified programming model for time- and dat...
ISBN:
(纸本)9781538649756
the proceedings contain 114 papers. the topics discussed include: a generic learning multi-agent-system approach for spatio-temporal-, thermal- and energy-aware scheduling;a unified programming model for time- and data-driven embedded applications;an improved one-to-all broadcasting in higher dimensional Eisenstein-Jacobi networks;collective I/O performance on the Santos Dumont supercomputer;context-aware optimization for energy-efficient and QoS wireless body area networks with human dynamics;developing and using a geometric multigrid, unstructured grid mini-application to assess many-core architectures;and divisible load scheduling of image processing applications on the heterogeneous star network using a new genetic algorithm.
A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-network). In distributed reinforcement learning, Actor nodes acquire experien...
详细信息
ISBN:
(纸本)9781665469586
A computing cluster that interconnects multiple compute nodes is used to accelerate distributed reinforcement learning based on DQN (Deep Q-network). In distributed reinforcement learning, Actor nodes acquire experiences by interacting with a given environment and a Learner node optimizes their DQN model. Since data transfer between Actor and Learner nodes increases depending on the number of Actor nodes and their experience size, communication overhead between them is one of major performance bottlenecks. In this paper, their communication performance is optimized by using DPDK (Data Plane Development Kit). Specifically, DPDK-based low-latency experience replay memory server is deployed between Actor and Learner nodes interconnected with a 40GbE (40Gbit Ethernet) network. Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication latencies for prioritized experience sampling by 21.9% to 29.1%.
MPI is the de facto communication standard library for parallel applications in distributed memory architectures. Collective operations performance is critical in HPC applications as they can become the bottleneck of ...
详细信息
ISBN:
(纸本)9781665469586
MPI is the de facto communication standard library for parallel applications in distributed memory architectures. Collective operations performance is critical in HPC applications as they can become the bottleneck of their executions. the advent of larger node sizes on multicore clusters has motivated the exploration of hierarchical collective algorithms aware of the process placement in the cluster and the memory hierarchy. this work analyses and compares several hierarchical collective algorithms from the literature that do not form part of the current MPI standard. We implement the algorithms on top of OpenMPI using the shared-memory facility provided by MPI-3 at the intra-node level and evaluate them on ARM-based multicore clusters. From our results, we evidence aspects of the algorithms that impact the performance and applicability of the different algorithms. Finally, we propose a model that helps us to analyze the scalability of the algorithms.
暂无评论