In the last years CPU manufactures have not been able to substantially increase the Instructions Per Cycle of CPU cores. Trying to overcome this situation manufacturers have increased the raw performance of HPC system...
详细信息
In the last years CPU manufactures have not been able to substantially increase the Instructions Per Cycle of CPU cores. Trying to overcome this situation manufacturers have increased the raw performance of HPC systems by simultaneously increasing the amount of processors, by multiplying the number of cores in each processor and integrating specialized accelerators such as GPGPUs, FPGAs and other ASICs with specialized instruction sets. To be able to exploit the new hardware capabilities applications have to be specifically written with parallelism, to deal with the increasing number of cores available and also need to have parts of the source code written in specialized languages to make use of the integrated accelerators. This creates a major paradigm shift from compute centric to communication centric execution to which most programming models are not properly aligned yet: Classical models are geared towards optimizing the computational operations, assuming data access is almost for free. Languages like C, for example, assume that variables are immediately available and are accessed synchronously. The new situation implies that the data is going to be distributed across the system, and communication latency will have a big impact. A bad data distribution will result in a continue exchange while processing units are idling waiting for the data to compute. Most programming models do not convey the necessary dependency information to the compiler that must be careful not to make wrong assumptions. There are successful attempts, such as OpenMP, to exploit parallelism by introducing structural information of the application. Research projects, such as POLCA, have develop means to introduce functionallike semantics, in the form of directives, to procedural code that describe the structural behavior of the application with the aim to allow compilers to perform aggressive code transformations that increase performance and allow portability across different architectures.
Efficient, scalable and productive parallel programming is a major challenge for exploiting the future multiprocessor SoC platforms. This article presents the MultiFlex programming environment which has been developed...
详细信息
Efficient, scalable and productive parallel programming is a major challenge for exploiting the future multiprocessor SoC platforms. This article presents the MultiFlex programming environment which has been developed to address this challenge. It is targeted for use on Platform 2012, a scalable multi-processor fabric. The MultiFlex environment supports high-level simulation, iterative platform mapping, and includes tools for programming model aware debug, trace, visualization and analysis. This article focuses on the two classes of programming abstractions supported in MultiFlex. The first is a set of Parallel programming Patterns (PPP) which offer a rich set of programming abstractions for implementing efficient data-and task-level parallel applications. The second is a Reactive Task Management (RTM) abstraction, which offers a lightweight C-based API to support dynamic dispatching of small grain tasks on tightly coupled parallel processing resources. The use of the MultiFlex native programming model is illustrated through the capture and mapping of two representative video applications. The first is a high-quality rescaling (HQR) application on a multiprocessor platform. We present the details of the optimization process which was required for mapping the HQR application, for which the reference code requires 350 GIPS (giga instructions per second), onto a 16 processor cluster. Our results show that the parallel implementation using the PPP model offers almost linear acceleration with respect to the number of processing elements. The second application is a high-definition VC-1 decoder. For this application, we illustrate two different parallel programming model variants, one using PPPs, the other based on RTM. These two versions are mapped onto two variants of a homogeneous version of the Platform 2012 multi-core fabric.
Massive computing systems will be needed to maintain competitiveness in all areas of science, engineering and business, to provide both management efficiency and computing capability. From a systems management perspec...
详细信息
ISBN:
(纸本)9781605584980
Massive computing systems will be needed to maintain competitiveness in all areas of science, engineering and business, to provide both management efficiency and computing capability. From a systems management perspective, massive installations offer an efficient platform for resource sharing and service-oriented cloud computing; from a capability perspective, they allow unprecedented performance for supercomputing applications. With top supercomputing systems reaching the PetaFlop barrier, the next challenge is to devise technology to reach, and applications to take advantage of, ExaFlop performance. Multicore chips are already here but will grow in the next decade to several hundred cores. Although these chips will be used for general-purpose computing, they will be the tera-device components of future exascale *** is aware of the importance of having a well-structured supercomputing infrastructure, as well as the need to exchange experiences and know-how across the Union. Infrastructure projects such as the current DEISA (Distributed European Infrastructure for Supercomputing Applications) or the future PRACE (Partnership for Advanced Computing in Europe) aim at setting up such coordinated resources. PRACE, in particular, will create a world-class pan-European high performance computing service and infrastructure, managed as a single European entity. The service will include five superior supercomputing centers, strengthened by regional and national centers, working in collaboration through grid technologies. The BSC-CNS is the Spanish representative, one of five principal partners in the project (the others being Germany, France, UK and the Netherlands). The principal partner countries have agreed to contribute more to the PRACE budget, and to host the tier-0 machines which will form part of the distributed *** hit the power wall, the computing market is now undergoing a shaky era of dispersion, where many kinds of multicore alternat
ABSTRACT: This study analyzes planning under deterministic and stochastic inflows for the Mayurakshi project in India. models are developed to indicate the optimal storage of reservoir water, the transfer of water to ...
详细信息
We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portabi...
详细信息
暂无评论