The growing need for power efficient extreme-scale high-performance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massivelyparallel compute architectures. Tens to many hundreds of c...
详细信息
ISBN:
(纸本)9781450307710
The growing need for power efficient extreme-scale high-performance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massivelyparallel compute architectures. Tens to many hundreds of cores are increasingly made available as compute units, either as the integral part of the main processor or as coprocessors designed for handling massivelyparallel workloads. In the case of many-core graphics processing units (GPUs) hundreds of SIMD cores primarily designed for image and video rendering are used for high-performance scientific computations. The new architectures typically offer ANSI standard programmingmodels such as CUDA (NVIDIA) and OpenCL. However, the wide-ranging adoption of these parallel architectures is steeped in difficult learning curve and requires reengineering of existing applications that mostly leads to expensive and error prone code rewrites without prior guarantee and knowledge of any speedups. Broad range of complex scientific applications across many domains use common algorithms and techniques, such as adaptive mesh refinements (AMR), advanced hydrodynamics partial differential equation (PDE) solvers, Poisson-Gravity solvers etc, that have demonstrably performed highly efficiently on GPU based systems. Taking advantage of the commonalities, we use GPU-aware AMR code, GAMER [1], to examine the unique approach of solving multi-science problems in astrophysics, hydrodynamics and particle physics with single codebase. We demonstrate significant speedups in disparate class of scientific applications on 3 separate clusters, viz., Dirac, Laohu and Mole 8.5. By extensively reusing the extendable single codebase we mitigate the impediments of significant code rewrites. We also collect performance and energy consumption benchmark metrics on 50-nodes NVIDIA C2050 GPU and Intel 8-core Nehalem CPU on Dirac cluster at the National Energy Research Supercomputing Center (NERSC). In addition, we propose a strategy and framework for
We present the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2, and vector or concurrent-vector processing, as implemented in the Cray Research In...
详细信息
A pyramid machine is a massivelyparallel computer architecture configured to allow certain common image processing operations to be executed at a cost which is logarithmic in the size of the image. In the absence of ...
详细信息
ISBN:
(纸本)0818607211
A pyramid machine is a massivelyparallel computer architecture configured to allow certain common image processing operations to be executed at a cost which is logarithmic in the size of the image. In the absence of a working prototype, a flexible simulator is required to study complex algorithms for such a machine effectively. Developing a simulator involves deciding whether to implement basic operations directly or to simulate macros involving multiple operations, and if so, verifying that the simulation correctly implements the macro. The authors briefly describe one implementation of such a simulator and discuss these issues and how they were addressed.
The proceedings contain 26 papers. The special focus in this conference is on Computer Performance Modeling, Measurement and Evaluation. The topics include: parallel simulation;properties and analysis of queueing netw...
ISBN:
(纸本)9783540572978
The proceedings contain 26 papers. The special focus in this conference is on Computer Performance Modeling, Measurement and Evaluation. The topics include: parallel simulation;properties and analysis of queueing network models with finite capacities;performance analysis and optimization with the power-series algorithm;multiprocessor and distributed system design;response time distributions in queueing network models;fast simulation of rare events in queueing and reliability models;an inlxoduction to modeling dynamic behavior with time series analysis;issues in trace-driven simulation;maximum entropy analysis of queueing network models;performance modeling using DSPN express;relaxation for massivelyparallel discrete event simulation;an overview of tes processes and modeling methodology;performance engineering of client-server systems;queueing networks with finite capacities;performance instrumentation techniques for parallel systems;a survey of bottleneck analysis in closed networks of queues;software performance engineering;performance measurement using system monitors;providing quality of service packet switched networks;dependability and performability analysis;architectures and algorithms for digital multimedia on-demand servers;analysis and control of polling systems;modeling and analysis of transaction processing systems.
parallel processing is increasingly important in scientific software, not only for supercomputer simulations, but also for edge computing applications and intelligently processing experimental data in real time. Due t...
详细信息
ISBN:
(纸本)9783030964986;9783030964979
parallel processing is increasingly important in scientific software, not only for supercomputer simulations, but also for edge computing applications and intelligently processing experimental data in real time. Due to this expansion in the diversity of hardware being used for next-generation and fused experimental/HPC facilities, productively writing code that performs well across these diverse environments is an increasing concern. To meet this challenge, NVIDIA is working in the Standard C++ Committee to develop a roadmap for C++ Standard parallelism, a parallelprogramming model that is portable to all platforms, from massivelyparallel HPC to many-core embedded systems, while preserving the defining goals of C++: performance and efficiency for most use cases. Our vision of C++ Standard parallelism consists of three key components: (1) Common parallel algorithms that dispatch to vendor-optimized parallel libraries;(2) Tools to write your own parallel algorithms that run anywhere;(3) Mechanisms for composing parallel invocations of algorithms into task graphs. In this paper, we'll dive into this roadmap and how it fits into NVIDIA's broader strategy that also includes parallelism in ISO Standard Fortran and highly optimized library APIs to enable full-platform scientific productivity. We'll discuss what we already have that you can use today across a wide variety of deployed systems, what's coming down the line, and where the future may lead us.
The proceedings contain 56 papers. The special focus in this conference is on parallel Architectures and Languages Europe. The topics include: Performance analysis of a parallel Prolog: A correlated approach;Visual co...
ISBN:
(纸本)9783540512851
The proceedings contain 56 papers. The special focus in this conference is on parallel Architectures and Languages Europe. The topics include: Performance analysis of a parallel Prolog: A correlated approach;Visual concurrent object-based programming in GARP;parle: A parallel target language for integrating symbolic and numeric processing;a method for refining atomicity in parallel algorithms;comparing two fully abstract dataflow models;learning by back-propagation: Computing in a systolic way;towards systolizing compilation: An overview;strategies for a massivelyparallel implementation of simulated annealing;the compaction of acyclic terms;multiple tuple spaces in Linda;a single-assignment language in a distributed memory multiprocessor;single-assignment semantics for imperative programs;a compiling approach for exploiting and-parallelism in parallel logic programming systems;data structures for parallel execution of functional languages;The typed λ-calculus with first-class processes;ASPEN: A stream processing environment;the expressive power of simple parallelism;compositionality in the temporal logic of concurrent systems;a temporal-logic based compositional proof system for real-time message passing;experiments in mimd parallelism;GTS: Extracting full parallelism out of DO loops;dataflow analysis of term graph rewriting systems;towards a theory of simulation for verification of concurrent systems;eliminating redundant interleavings during concurrent program verification;dataflow programs for parallel computations of logic programs and their semantics;RAPiD a data flow model for implementing parallelism and intelligent backtracking in logic programs.
ProbZelus is a synchronous probabilistic language for the design of reactive probabilistic models in interaction with an environment. Reactive inference methods continuously learn distributions over the unobserved par...
详细信息
ISBN:
(纸本)9781450392662
ProbZelus is a synchronous probabilistic language for the design of reactive probabilistic models in interaction with an environment. Reactive inference methods continuously learn distributions over the unobserved parameters of the model from statistical observations. Unfortunately, this inference problem is in general intractable. Monte Carlo inference techniques thus rely on many independent executions to compute accurate approximations. These methods are expensive but can be parallelized. We propose to use JAX to parallelize ProbZelus reactive inference engine. JAX is a recent library to compile Python code which can then be executed on massivelyparallel architectures such as GPUs or TPUs. In this paper, we describe a new reactive inference engine implemented in JAX and the new associated JAX backend for ProbZelus. We show on existing benchmarks that our new parallel implementation outperforms the original sequential implementation for a high number of particles.
We propose to model integrated reflective and reactive reasoning by massivelyparallel nonmonotonic model generation. To this end, a finite representation of models of normal logic programs is given, which is adapted ...
详细信息
Blue Gene is a massivelyparallel system being developed at the IBM T. J. Watson Research Center. With its 4 million-way parallelism and 1 Petaflop peak performance, Blue Gene is a unique environment for research in p...
详细信息
The proceedings contain 73 papers. The topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal ...
ISBN:
(纸本)9783642330773
The proceedings contain 73 papers. The topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal attacks in distributed wireless sensor networks;optimization of a short-range proximity effect correction algorithm in e-beam lithography using GPGPUs;vectorized algorithms for Quadtree construction and descent;an optimal parallel prefix-sums algorithm on the memory machine models for GPUs;enhancing the performance of a distributed mobile computing environment by topology construction;maintaining consistency in software transactional memory through dynamic versioning tuning;a new low latency parallel turbo decoder employing parallel phase decoding method;high-performance matrix multiply on a massively multithreaded Fiteng1000 processor;and on construction of Cloud IaaS for VM live migration using KVM and OpenNebula.
暂无评论