In high performance computing systems running on native hardware or cloud computing resources, parallel applications can reserve a large number of resources for long time periods. Resource failures trigger the failure...
详细信息
ISBN:
(纸本)9781479940936
In high performance computing systems running on native hardware or cloud computing resources, parallel applications can reserve a large number of resources for long time periods. Resource failures trigger the failure of applications using this resource. Our investigation of large-scale systems in the field has revealed a difference in the operational reliability of nodes. By adding awareness of this difference to the scheduler along with the predicted reliability needs, we match reliable resources with the most demanding applications to reduce the probability of application failure. In this paper, we describe a new approach we developed to enhance reliability and reduce failure costs. Our approach partitions resources based on expected reliability and sizes each partition to bound the probability of blocking requests. Our approach can be used to size systems for peak loads with a bounded probability of blocking requests, and would be useful for operators seeking to improve the reliability and efficiency of systems.
highperformancecomputing (HPC) systems tend to be complex to debug and analyze due to the large number of processes they involve and the way they communicate with each other to perform specific tasks. Recently, ther...
详细信息
highperformancecomputing (HPC) systems tend to be complex to debug and analyze due to the large number of processes they involve and the way they communicate with each other to perform specific tasks. Recently, there has been an increase in the number of tools to help software engineers analyze the behavior of HPC applications. These tools provide several features that facilitate the understanding and analysis of the information contained in inter-process communication traces generated from running an HPC application. They, however, use different formats to represent traces, which hinders interoperability and sharing of data. In this paper, we address this by proposing an exchange format called MTF (MPI Trace Format) for representing and exchanging traces generated from HPC applications based on the MPI (Message Passing Interface) standard, which is a de facto standard for inter-process communication for high performance computing systems. The design of MTF is validated against well-known requirements for a standard exchange format, with an objective being to lead the work towards standardizing the way MPI traces are represented in order to allow better synergy among tools. We have also developed an MTF toolkit that supports the generation of MTF traces equipped with a query engine to facilitate the retrieval of data from MTF traces. Finally, we show how MTF can carry a large trace generated using a commercial off the shelf MPI trace analysis tool. Crown Copyright (C) 2010 Published by Elsevier B.V. All rights reserved.
high performance computing systems have shown an impressive growth so far with a performance increase of 10x every 3.6 years. performance predictions seem to confirm this trend for the future: Roadrunner achieved 1 pe...
详细信息
ISBN:
(纸本)9781424464708
high performance computing systems have shown an impressive growth so far with a performance increase of 10x every 3.6 years. performance predictions seem to confirm this trend for the future: Roadrunner achieved 1 petaFLOPS in 2008 and 10 petaFLOPS system are expected to be operational in the next few years. If these predictions are correct, exascale performance will be achieved by 2018. Power consumption is increasing at a faster rate than performance, which is making HPC not sustainable with current technologies. A DARPA [1] study states that exascale systems will be limited to a total power budget of 20-25 MW: in order to achieve 1 exaFLOPS, HPC systems have to provide 40 GLOPS/W, two orders of magnitude higher than the current fastest supercomputer. It seems clear that current technologies will not provide such efficiency and that there are new research challenges and opportunities on the way to the exascale era. This paper shows how a closer hardware/software interaction can improve HPC system's efficiency. New technologies will provide better power management but will also slow down applications. System software, on the other hand, can detect applications behavior and trigger the most effective hardware mechanisms without introducing excessive performance degradation.
This proposal describes a hardware-software interface which can put into effect a practically unlimited number of processing resources and which allows for completely describing and exploiting the inherent parallelism...
详细信息
ISBN:
(纸本)9781424413478
This proposal describes a hardware-software interface which can put into effect a practically unlimited number of processing resources and which allows for completely describing and exploiting the inherent parallelism of the application problems. Appropriate hardware consists of processing resources, platform resources, and storage means. The particular processing resource is not a complete processor but some kind of a comparatively simple operation unit. The proposed architectural principles may lead to: Instruction set architectures which can cope with a trans-finite number of hardware resources. Processing circuits containing resources of intermediate granularity and appropriately optimized interconnects.
The use of high-performancecomputing for rapid visualization of design alternatives and the subsequent use of such visualization for design steering during the multidisciplinary optimization (MDO) process are investi...
详细信息
The use of high-performancecomputing for rapid visualization of design alternatives and the subsequent use of such visualization for design steering during the multidisciplinary optimization (MDO) process are investigated. Surrogate models based on polynomial response surfaces and message-passing-interface-based parallel programming models are used for rapid visualization of the physical model behavior responses corresponding to changes in the design variables. Application of the proposed procedure for vehicle structure impact design optimization is investigated involving both sizing and shape design variables. Mesh morphing is used in conjunction with the shape design changes. Rapid visualization of physical model behavior for changes in design variables during the MDO process facilitates collaboration of discipline experts that in turn facilitate steering of the design and enhances efficiency of the MDO process.
暂无评论