We review the authors' HPspmd Programming Model1 as a contribution towards the programming support for High-Performance Grid-Enabled Environments. Future grid computing systems will need to provide programming mod...
详细信息
ISBN:
(纸本)1892512416
We review the authors' HPspmd Programming Model1 as a contribution towards the programming support for High-Performance Grid-Enabled Environments. Future grid computing systems will need to provide programming models. In a proper programming model for grid-enabled environments and applications, high performance on multi-processor systems is critical issue. We argue with simple experiments that we can in fact hope to achieve high performance in a similar ballpark to more traditional HPC languages.
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing interest, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dy...
详细信息
ISBN:
(纸本)9781538608623
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing interest, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dynamic loop scheduling (DLS) techniques are an effective approach towards performance improvement of scientific applications via load balancing. These techniques address algorithmic and systemic sources of load imbalance by dynamically assigning tasks to processing elements. The DLS techniques have demonstrated their effectiveness when applied in real applications. Complementing native experiments, simulation is a powerful tool for studying the behavior of parallel and distributedapplications. This work is a comprehensive reproducibility study of experiments using DLS techniques published in earlier literature to verify their implementations into SimGrid-MSG. In earlier work, it was shown that having a detailed degree of information regarding the experiments to be reproduced is essential for a successful reproduction. This work concentrates on the reproduction of experiments with variable application behaviour and high degree of parallelism of applications and systems. It is shown that reproducing measurements of applications with high variance in task execution times is challenging, albeit feasible and useful. The success of the present reproducibility study signifies the fact that the implementation of the DLS techniques in SimGrid-MSG is verified for the considered applications and systems. Thus, it enables well-founded future research using the DLS techniques in simulation.
This paper describes two parallel algorithms for the 2D knapsack (or cutting-stock) problem which is the optimal packing of multiples of n rectangular objects into a knapsack of size L ×;W and are only obtainable...
详细信息
ISBN:
(纸本)1601320841
This paper describes two parallel algorithms for the 2D knapsack (or cutting-stock) problem which is the optimal packing of multiples of n rectangular objects into a knapsack of size L ×;W and are only obtainable with guillotine-type (side to side) cuts. Here, we describe and analyze this problem for parallel computing with GPUs. These algorithms solve this NP class problem with a parallel algorithm that runs in O(W(n+L + W)) time using L processors, where LW for a 2D knapsack problem with a capacity of L × W. The new multiple IS version using LW processors and max(L,M) ISs runs in O(n+L + W) given practical hardware considerations. Both of these results are cost optimal with respect to the best sequential implementation. Moreover, an efficient GPGPU algorithm for this well-known problem should give insight to how the parallel computing using graphics processing units compares to other parallel models such as PRAM.
Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54...
详细信息
ISBN:
(纸本)9780889866379
Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.
Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software...
详细信息
ISBN:
(纸本)0769511651
Testing is a difficult and time-consuming part of the software development cycle. This is because an error may happen in an unexpected way at an unexpected spot. Testing and debugging parallel and distributed software are much more difficult than testing and debugging sequential software. This is due to the fact that errors are usually reproducible in sequential programs while they may not be reproducible in parallel and distributed programs. In addition, parallel and distributed programs introduce new types of errors and anomalies, race conditions and deadlocks, that do not exist in sequential software. In this paper I present a survey and a taxonomy of existing approaches for detecting race conditions and deadlocks in parallel and distributed programs. These approaches can be classified into two main classes. Static analysis techniques, and dynamic analysis techniques. I have subdivided further the static analysis techniques into three different subgroups: The concurrency analysis methods, The data-flow analysis methods, and the formal proof methods. A brief discussion and highlighting of main problems in most known approaches is given. The paper is concluded with tables summarizing the comparison between the surveyed approaches.
We propose a framework built around a JavaSpace to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distribute...
详细信息
ISBN:
(纸本)9780769543284
We propose a framework built around a JavaSpace to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on checkpointing and underlying middleware mechanisms to do so. To further improve checkpointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical JavaSpace-based master-worker application profiles.
The evolution of distributedapplications to reflect structural changes or to adapt to specific conditions of the run-time environment is a difficult issue especially if continuous service is required from end-users. ...
详细信息
ISBN:
(纸本)1892512416
The evolution of distributedapplications to reflect structural changes or to adapt to specific conditions of the run-time environment is a difficult issue especially if continuous service is required from end-users. This latter constraint implies to perform changes with minimal penalty on the service provisioning. The set of tools and services that allow such a goal to be achieved is usually designated as dynamic reconfiguration capabilities. A major issue related to dynamic reconfiguration is to ensure applications consistency after a reconfiguration. Classical transactional models provide a way to identify application calculations and to ensure consistency of these calculations despite faults. In a first step, we propose an extended transaction model to ensure such a consistency when a reconfiguration occurs. We argue that ensuring consistency can be done easily by using an extended transaction model with a strong isolation property. Then we discuss possible solutions for non transactional applications.
Mismatches between on-chip high performance CPU and data access times is the basic reason for the increasing gap between peak and sustained performance in distributed memory parallel computers. We propose the concept ...
详细信息
Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. The very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process....
详细信息
ISBN:
(纸本)9781665497473
Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. The very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process. Numerous surveys on these architectures exist. Other surveys also gather the tools and methods, but none of them focuses on the mapping process only. This paper focuses solely on automated methods and techniques for mapping applications on CGRA and covers the last two decades of research. This paper aims at providing the terminology, the problem formulation, and a classification of existing methods. The paper ends with research challenges and trends for the future.
Object-oriented languages and practices have long been regarded by computational scientists as inefficient or difficult to use in the production of high-performance applications. We have shown that the benefits of usi...
详细信息
ISBN:
(纸本)1932415262
Object-oriented languages and practices have long been regarded by computational scientists as inefficient or difficult to use in the production of high-performance applications. We have shown that the benefits of using object-oriented practices far out-weigh a possible small decrease in run-time performance. The Simple parallel Object-Oriented Computing Environment for the Finite Element Method (SPOOCEFEM) developed at the U.S. Army Research Laboratory provides an object-oriented framework for the development of multidisciplinary computational tools. Using SPOOCEFEM, we have developed a Virtual Manufacturing Environment (VME) for the resin transfer molding process that brings together three unique parallelapplications. This paper will discuss the benefits experienced during the parallelization and deployment of this VME using object-oriented design.
暂无评论