Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers...
详细信息
ISBN:
(纸本)9781479956166
Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA, a tool which uses an execution trace and the knowledge about the programming models MPI, OpenMP and CUDA as well as their hierarchy among each other to build a distributed event dependency graph. After locating wait states in this graph, we detect their root cause and compute the critical path, an important property for performance optimizations. Compared to existing analysis approaches, we incorporate the hierarchy of multiple programming models and derive a metric from both the time an activity spends on the critical path and the waiting time it caused. For the purpose of visualization, CASITA enriches the input trace with additional counter information so that results can be inspected in the Vampir trace viewer.
The use of Graphics processing Units (GPUs) has recently witnessed ever growing applications for different computational analyses in the field of Life Sciences. In this work we present a CUDA-powered computational too...
详细信息
The use of Graphics processing Units (GPUs) has recently witnessed ever growing applications for different computational analyses in the field of Life Sciences. In this work we present a CUDA-powered computational tool, named coagSODA, that was purposely developed and applied for the analysis of a large model of the blood coagulation cascade defined as a system of ordinary differential equations, based on both mass-action kinetics and Hill functions. We discuss the biological results of the parameter sweep analyses of this model, and show that GPUs can boost the computational performances up to 177x speedup.
The goal of this paper, is to present a new massively parallel virtual machine model, designed for parallel and distributed high performance computing on a distributed system. The proposed model allows us to build a p...
详细信息
ISBN:
(纸本)9781479938254
The goal of this paper, is to present a new massively parallel virtual machine model, designed for parallel and distributed high performance computing on a distributed system. The proposed model allows us to build a polymorphic grid computing assigned to solve fine grained parallel problems over different machine structures. This model is built using distributed Virtual Processors Units (VPU). Each VPU corresponds to a mobile agent deployed in a physical processing unit. Each physical processing unit is associated to a node of the distributed system. The VPUs are designed to communicate with each other asynchronously by exchanging, in local or remote way, ACL messages (Agent Communication Language) containing data, instructions or any task to be performed. In this model a special agent is designed to represent the host of the parallel virtual machine. This agent manages the life cycle of VPUs, the load balancing system, and parallelapplications to run. In this model VPUs can also use a virtual shared memory represented by hierarchical mobile agents. All the properties offered to the proposed model, are easily designed thanks to the flexibility and the mobility of the multi agent systems.
The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems....
详细信息
ISBN:
(纸本)9781479976164
The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallel algorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.
Caches are universally used in computing systems to hide long off-chip memory access latencies. Unlike CPUs, massive threads running simultaneously on GPUs bring a tremendous pressure on memory hierarchy. As a result,...
详细信息
Whispered speech is a natural mode of speech in which voicing is absent - its acoustics differ significantly from normally spoken speech or so-called neutral speech, such that it is challenging to use only neutral spe...
详细信息
Before scientific analyses run on shared infrastructure, such as the Open Science Grid or XSEDE, scientists must often transfer or stage key data sets those resources. Often these datasets consist of many files that m...
详细信息
Before scientific analyses run on shared infrastructure, such as the Open Science Grid or XSEDE, scientists must often transfer or stage key data sets those resources. Often these datasets consist of many files that may be transferred by multiple clients in parallel. We study two techniques that improve the use of available resources for these large, long-running, multi-file transfers. First, we adapt transfer parameters for multi-file transfers based on recent transfer performance. Second, we use VO and site policies to influence the allocation of system resources for transfers, such as available transfer streams. We describe our system design and summarize its implementation and performance.
Accelerators offer the potential to significantly improve the performance of scientific applications when offloading compute intensive portions of programs to the accelerators. However, effectively tapping their full ...
详细信息
Lattice based cryptography is attractive for its quantum computing resistance and efficient encryption/decryption process. However, the big data problem has perplexed lattice based cryptographic systems with the slow ...
详细信息
Lattice based cryptography is attractive for its quantum computing resistance and efficient encryption/decryption process. However, the big data problem has perplexed lattice based cryptographic systems with the slow processing speed. This paper intends to analyze one of the major lattice-based cryptographic systems, Nth-degree truncated polynomial ring (NTRU), and accelerate its execution with Graphic processing Unit (GPU) for acceptable processing performance. Three strategies, including single GPU with zero copy, single GPU with data transfer, and multi-GPU versions are proposed. GPU computing techniques such as stream and zero copy are applied to overlap the computation and communication for possible speedup. Experimental results have demonstrated the effectiveness of GPU acceleration of NTRU. As the number of involved devices increases, better NTRU performance will be achieved.
Paper presents an advanced iterative MapReduce solution that employs Hadoop and MPI technologies. First, we present an overview of working implementations that make use of the same technologies. Then we define an acad...
详细信息
Paper presents an advanced iterative MapReduce solution that employs Hadoop and MPI technologies. First, we present an overview of working implementations that make use of the same technologies. Then we define an academic example of numeric problem with an emphasis on its computational features. The named definition is used to justify the proposed solution design.
暂无评论