Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the mem...
详细信息
Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs bu...
详细信息
Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing focus on only one of these. With the development of Accelerated Open MP for GPUs, both from PGI and Cray, we have a clear path to extend traditional Open MP applications incrementally to use GPUs. The extensions are geared toward switching from CPU parallelism to GPU parallelism. However they do not preserve the former while adding the latter. Thus computational potential is wasted since either the CPU cores or the GPU cores are left idle. Our goal is to create a runtime system that can intelligently divide an accelerated Open MP region across all available resources automatically. This paper presents our proof-of-concept runtime system for dynamic task scheduling across CPUs and GPUs. Further, we motivate the addition of this system into the proposed Open MP for Accelerators standard. Finally, we show that this option can produce as much as a two-fold performance improvement over using either the CPU or GPU alone.
Biologists and biotechnologists need to draw information from numerous distributed and heterogeneous resources, such as online biomedical databases, nomenclatures and specialised bioinformatics tools. These tasks can ...
详细信息
Biologists and biotechnologists need to draw information from numerous distributed and heterogeneous resources, such as online biomedical databases, nomenclatures and specialised bioinformatics tools. These tasks can benefit significantly from semantic data federation with SADI Semantic Web services where multiple resources exposed through SADI services are accessed as a single virtual SPARQL-queriable database. We provide evidence in support of this premise by creating and testing a kit of public SADI services for a number of bioinformatics databases and programs, and by demonstrating how it can be used to serve real information needs of ecotoxicology researchers, by using the services to answer some model queries.
We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implemen...
详细信息
ISBN:
(纸本)9781467362184
We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latency-tolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite and on a new bioinformatics metagenomic classification application. For the complex metagenomics classification application, DI-MMAP performs up to 4.88× better than standard Linux mmap.
The growth in size and complexity of scaling applications and the systems on which they run pose challenges in analyzing and improving their overall performance. With metrics coming from thousands or millions of proce...
The growth in size and complexity of scaling applications and the systems on which they run pose challenges in analyzing and improving their overall performance. With metrics coming from thousands or millions of processes, visualization techniques are necessary to make sense of the increasing amount of data. To aid the process of exploration and understanding, we announce the initial release of Boxfish, an extensible tool for manipulating and visualizing data pertaining to application behavior. Combining and visually presenting data and knowledge from multiple domains, such as the application's communication patterns and the hardware's network configuration and routing policies, can yield the insight necessary to discover the underlying causes of observed behavior. Boxfish allows users to query, filter and project data across these domains to create interactive, linked visualizations.
The rise of multicore cluster architectures has led to intense interest in using a combination of MPI and OpenMP to more effectively program these machines. We present a performance model for hybrid implementation of ...
详细信息
The rise of multicore cluster architectures has led to intense interest in using a combination of MPI and OpenMP to more effectively program these machines. We present a performance model for hybrid implementation of the solve cycle of algebraic multigrid (AMG), a popular iterative solver for large sparse linear systems and a key component of many scientific simulations. We validate the model on two leading parallel platforms, and discuss implications for applications programmed in a hybrid model on future machines.
The IBM Blue Gene/Q represents a large step in the evolution of massively parallel machines. It features 16-core compute nodes, with additional parallelism in the form of four simultaneous hardware threads per core, c...
详细信息
ISBN:
(纸本)9781467362184
The IBM Blue Gene/Q represents a large step in the evolution of massively parallel machines. It features 16-core compute nodes, with additional parallelism in the form of four simultaneous hardware threads per core, connected together by a five-dimensional torus network. Machines are being built with core counts in the hundreds of thousands, with the largest, Sequoia, featuring over 1.5 million cores. In this paper, we develop a performance model for the solve cycle of algebraic multigrid on Blue Gene/Q to help us understand the issues this popular linear solver for large, sparse linear systems faces on this architecture. We validate the model on a Blue Gene/Q at IBM, and conclude with a discussion of the implications of our results.
The aim of the ERAMIS project is to create a Master degree “computer as a Second Competence” in 9 beneficiary universities of Kazakhstan, Kyrgyzstan and Russia. This contribution presents how faculty development is ...
详细信息
The aim of the ERAMIS project is to create a Master degree “computer as a Second Competence” in 9 beneficiary universities of Kazakhstan, Kyrgyzstan and Russia. This contribution presents how faculty development is organized inside this project.
The aim of the ERAMIS project is to set up a network of Master's degree “Informatics as a Second Competence” among 9 beneficiary universities of Kazakhstan, Kyrgyzstan and Russia, and 5 European universities. Th...
详细信息
ISBN:
(纸本)9781467324250
The aim of the ERAMIS project is to set up a network of Master's degree “Informatics as a Second Competence” among 9 beneficiary universities of Kazakhstan, Kyrgyzstan and Russia, and 5 European universities. The trainings of beneficiary academic teachers are one of important tasks in this project. This contribution presents the implementation of trainings and lessons learned from those activities, especially from the trainers' point of view. This experience can be useful for similar kind of trainings.
暂无评论