Achieving fault tolerance is one of the significant challenges of exascale computing due to projected increases in soft/transient failures. While past work on software-based resilience techniques typically focused on ...
详细信息
ISBN:
(数字)9781665415613
ISBN:
(纸本)9781665415620
Achieving fault tolerance is one of the significant challenges of exascale computing due to projected increases in soft/transient failures. While past work on software-based resilience techniques typically focused on traditional bulk-synchronous parallel programming models, we believe that Asynchronous Many-Task (AMT) programming models are better suited to enabling resiliency since they provide explicit abstractions of data and tasks which contribute to increased asynchrony and latency tolerance. In this paper, we extend our past work on enabling application-level resilience in single node AMT programs by integrating the capability to perform asynchronous MPI communication, thereby enabling resiliency across multiple nodes. We also enable resilience against fail-stop errors where our runtime will manage all re-execution of tasks and communication without user intervention. Our results show that we are able to add communication operations to resilient programs with low overhead, by offloading communication to dedicated communication workers and also recover from fail-stop errors transparently, thereby enhancing productivity.
An approach is presented which extends the MOM fault-tolerant implementation of the Linda model of parallel programming. The original MOM system provided persistence of tuples and tuple states across both tuple-space ...
详细信息
An approach is presented which extends the MOM fault-tolerant implementation of the Linda model of parallel programming. The original MOM system provided persistence of tuples and tuple states across both tuple-space and worker node halt failures. Unfortunately, the requirement that system tuple space reside in a central location restricted the scalability of the MOM model. In this work, an approach is presented for distributed system tuple space and tuple states using a hashing function on tuple labels. This approach compares favourably with other tuple-space distribution methods in terms of message costs during non-fault operation, and allows preservation of the fault-tolerant mechanisms of the MOM model.
This paper reports on the design, implementation and performance evaluation of a suite of GridRPC programming middleware called Ninf-G Version 2 (Ninf-G2). Ninf-G2 is a reference implementation of the GridRPC API, a p...
详细信息
This paper reports on the design, implementation and performance evaluation of a suite of GridRPC programming middleware called Ninf-G Version 2 (Ninf-G2). Ninf-G2 is a reference implementation of the GridRPC API, a proposed GGF standard. Ninf-G2 has been designed so that it provides 1) high performance in a large-scale computational Grid, 2) the rich functionalities which are required to adapt to compensate for the heterogeneity and unreliability of a Grid environment, and 3) an API which supports easy development and execution of Grid applications. Ninf-G2 is implemented to work with basic Grid services, such as GSI, GRAM, and MDS in the Globus Toolkit version 2. The performance ofNinf-G2 was evaluated using a weather forecasting system which was developed using Ninf-G2. The experimental results indicate that high performance can be attained even in relatively fine-grained task-parallel applications on hundreds of processors in a Grid environment.
作者:
B. KrysztopH. KrawczykFaculty of Electronics
Telecommunication and Informatics Computer Architecture Department Technical University of GdaDsk GdaDsk Poland Faculty of Electronics
elecommunication and Informatics Computer Architecture Department Technical University of GdaDsk GdaDsk Poland
A new approach for developing efficient and flexible component-based distributed applications is proposed. It is based on a new programming platform TL (Transformation Language) which allows to express both abstract s...
详细信息
A new approach for developing efficient and flexible component-based distributed applications is proposed. It is based on a new programming platform TL (Transformation Language) which allows to express both abstract sequential code and parallel processing model of an application. To minimize execution cost and maximize flexibility, Distributed Partial Executor (DPE) tool and optimization algorithm is introduced. The example of the distributed image processing application is considered and its optimization in TL is analyzed. The obtained results confirm usability of the proposed methodology.
parallel/distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that de...
详细信息
parallel/distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that developing applications using predefined programming structures (frameworks) should be easier than doing it from scratch. We propose to take advantage of the knowledge about the structure of the application in order to develop a dynamic and automatic tuning tool. In this sense, we have designed POETRIES, which is a dynamic performance tuning tool based on the idea that a performance model could be associated to the high-level structure of the application. This way, the tool could efficiently make better tuning decisions. Specifically, we focus this work on the definition of the performance model associated to applications developed with the master-worker framework.
In this article is described the development of soft and hard environment for integrating individual cluster systems in a single, integrate parallel HPC systems. The elaborated applications can be ported to the resour...
详细信息
In this article is described the development of soft and hard environment for integrating individual cluster systems in a single, integrate parallel HPC systems. The elaborated applications can be ported to the resources of the integrated HPC system. To acquire the necessary theoretical and practical skills on using regional HPC clusters, currently it is preparing an interactive educational course for teaching students in the area of parallel programming, HPC clusters, and use of parallel software.
Moving infrared small target detection is widely used in areas such as target surveillance, precise guidance, etc. It has a high demand for the real-time and the probability of detection. Multiband simultaneous detect...
详细信息
Moving infrared small target detection is widely used in areas such as target surveillance, precise guidance, etc. It has a high demand for the real-time and the probability of detection. Multiband simultaneous detection can get a high detection probability, but it leads to substantial growth in computing time, which cannot meet the real-time requirement in practical applications. To solve this problem, this paper presents a parallel hierarchical approach based on MPI(Message Passing Interface) and OpenMP, which makes full use of the advantages of both message-passing model and shared storage model. It is based on the cluster system of multiprocessor nodes for testing. Experimental results show that, this parallel method can achieve a speedup of 7.8 compared with the serial method in the condition of the same detection probability. And its computational performance is also superior to the methods of pure OpenMP parallel and pure MPI parallel. This method has strong scalability.
With the advent of high-performance COTS clusters, there is a need for a simple, scalable and fault-tolerant parallel programming and execution paradigm. In this paper, we show that the popular MapReduce programming m...
详细信息
With the advent of high-performance COTS clusters, there is a need for a simple, scalable and fault-tolerant parallel programming and execution paradigm. In this paper, we show that the popular MapReduce programming model can be utilized to solve many interesting scientific simulation problems with much higher performance than regular cluster computers by leveraging GPGPU accelerators in cluster nodes. We use the Massive Unordered Distributed (MUD) formalism and establish a one-to-one correspondence between it and general Monte Carlo simulation methods. Our architecture, MITHRA, leverages NVIDIA CUDA technology along with Apache Hadoop to produce scalable performance gains using the MapReduce programming model. The evaluation of our proposed architecture using the Black Scholes option pricing model shows that a MITHRA cluster of 4 GPUs can outperform a regular cluster of 62 nodes, achieving a speedup of about 254 times in our testbed, while providing scalable near linear performance with additional nodes.
Image processing promotes many of the technological advancements these days. The main aspect while performing image processing operations is the time taken to deal with the application of different routines on these i...
详细信息
ISBN:
(数字)9781728149882
ISBN:
(纸本)9781728149899
Image processing promotes many of the technological advancements these days. The main aspect while performing image processing operations is the time taken to deal with the application of different routines on these images. Thus, time is an important criterion for the efficiency of the systems. With the given situation, the idea of giving images to the processors and then depending upon code all the cores will be either dealing with one image and performing operations on the image or distributing the images to each core to perform the operations. This uses the idea of parallel programming i.e. the use of all computer resources that are cores here. The paper focuses on implementing different image-enhancing techniques integrated into a system that will execute it on single as well as multiple cores. The image processing operations implemented sequentially as well as parallelly in this paper are Image Blurring, Edge Detection, Contrast Stretching, and Image Negation the average speed for all the operations obtained when executed on multiple cores are 9.94, 9.54, 11.12, and 11.21 respectively.
暂无评论