We present dispel4py, a novel data intensive and highperformancecomputing middleware provided as a standard python library for describing stream-based workows. It allows its users to develop their scientific applica...
详细信息
We introduce Opcodes, a python package which presents x86 and x86-64 instruction sets as a set of high-level objects. Opcodes provides information about instruction names, implicit and explicit operands, and instructi...
详细信息
the programming language python is widely used to create rapidly compact software. However, compared to low-level programming languages like C or Fortran low performance is preventing its use for HPC applications. Eff...
详细信息
the proceedings contain 5 papers. the topics discussed include: managing scientific data with named data networking (NDN);a multi-domain SDN for dynamic layer-2 path service;approximate causal consistency for partiall...
ISBN:
(纸本)9781450340373
the proceedings contain 5 papers. the topics discussed include: managing scientific data with named data networking (NDN);a multi-domain SDN for dynamic layer-2 path service;approximate causal consistency for partially replicated geo-replicated cloud storage;design and implementation of control sequence generator for SDN-enhanced MPI;and hysteresis-based optimization of data transfer throughput.
the proceedings contain 14 papers. the special focus in this conference is on highperformancecomputing Systems. the topics include: Algebraic multigrid on a dragonfly network;performance evaluation of scientific app...
ISBN:
(纸本)9783319172477
the proceedings contain 14 papers. the special focus in this conference is on highperformancecomputing Systems. the topics include: Algebraic multigrid on a dragonfly network;performance evaluation of scientific applications on POWER8;a standard application suite for measuring hardware accelerator performance;a CUDA implementation of the highperformance conjugate gradient benchmark;performance analysis of a high-level abstractions-based hydrocode on future computing systems;insight into application performance using application-dependent characteristics;a practical tool for architectural and program analysis;modeling stencil computations on modern HPC architectures;performance modeling of the HPCG benchmark;on the performance prediction of BLAS-based tensor contractions;assessing general-purpose algorithms to cope with fail-stop and silent errors;a case for epidemic fault detection and group membership in HPC storage systems;analysis of the tradeoffs between energy and run time for multilevel checkpointing and on the energy proportionality of distributed NoSQL data stores.
Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While f...
详细信息
ISBN:
(纸本)9783319172484;9783319172477
Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one-and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. the approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. the main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. the goal is instead accomplished withthe help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. the predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.
Current work on parallel programming models are trending towards the dataflow paradigm. Previous works on that topic have shown that dataflow programming is indeed a natural way to exploit parallelism in programs. How...
详细信息
暂无评论