作者:
Dwivedula, M.Hariri, S.Parashar, M.Laboratory
Department of Electrical and Computer Engineering University of Arizona TucsonAZ85721 United States
Department of Electrical and Computer Engineering CAIP Center Rutgers University 94 Brett Road PiscatawayNJ08854 United States
Overlap of computations and communications is an effective mechanism to improve the performance of parallel/distributed applications significantly. This overlap can be achieved efficiently by using data partitioning a...
详细信息
Broadcast, Reduction and Scan are popular functional skeletons which are used in distributed algorithms to distribute and gather data. We derive new parallel implementations of combinations of Broadcast, Reduction and...
详细信息
ISBN:
(纸本)0818680431
Broadcast, Reduction and Scan are popular functional skeletons which are used in distributed algorithms to distribute and gather data. We derive new parallel implementations of combinations of Broadcast, Reduction and Scan via a tabular classification of linearly recursive functions. The trick in the derivation is to not simply combine the individual parallel implementations of Broadcast, Reduction and Scan, but to transform these combinations to skeletons with a better performance. These skeletons are also linearly recursive.
A general methodology based on softwareengineering principles is proposed for the parallelization of existing sequential code. The utility of the proposed methodology is evaluated through a case study involving a num...
详细信息
Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exasc...
详细信息
ISBN:
(纸本)9781509021406
Memory reliability will be one of the major concerns for future HPC and Exascale systems. This concern is mostly attributed to the expected massive increase in memory capacity and the number of memory devices in Exascale systems. For memory systems Error Correcting Codes (ECC) are the most commonly used mechanism. However state-of-the art hardware ECCs will not be sufficient in terms of error coverage for future computing systems and stronger hardware ECCs providing more coverage have prohibitive costs in terms of area, power and latency. software-based solutions are needed to cooperate with hardware. In this work, we propose a Cyclic Redundancy Checks (CRCs) based software mechanism for task-parallel HPC applications. Our mechanism incurs only 1.7% performance overhead with hardware acceleration while being highly scalable at large scale. Our mathematical analysis demonstrates the effectiveness of our scheme and its error coverage. Results show that our CRCbased mechanism reduces the memory vulnerability by 87% on average with up to 32-bit burst (consecutive) and 5-bit arbitrary error correction capability.
Interacting processes an distributedsystems save their checkpoints on local disks for efficiency reasons. But, because local checkpoints get unavailable with failing hosts, redundancy schemes similar to RAID-like sto...
详细信息
Context-awareness is a key feature of ubiquitous computing, but it is difficult to develop context-aware systems because they are complex and because developers are not familiar with the development methodology for th...
详细信息
ISBN:
(纸本)9780769532639
Context-awareness is a key feature of ubiquitous computing, but it is difficult to develop context-aware systems because they are complex and because developers are not familiar with the development methodology for the systems. In this paper, we introduce a method to apply softwareengineering techniques when developing a context-aware system. Based on our experience in developing a context-aware exhibition guide system, we show how to elicit requirements, how to model context, how to determine system architecture, how to design the system, and how to implement the system. Our study will help developers to efficiently develop context-aware systems according to softwareengineering principles.
This paper describes an environment for performance-oriented design of portable parallelsoftware. The environment consists of a graphical design tool for building parallel algorithms, a state-of-the-art simulation en...
详细信息
This paper describes an environment for performance-oriented design of portable parallelsoftware. The environment consists of a graphical design tool for building parallel algorithms, a state-of-the-art simulation engine, a CPU characterisation tool, a distributed debugging tool and a visualisation/replay tool. The environment is used to model a virtual machine composed of a cluster of heterogeneous workstations interconnected by a local area network. The simulation model used is modular and its components are interchangeable which allows easy re-configuration of the platform. The model is validated using experiments on two parallel Givens linear solver algorithms with average errors of about 8%. (C) 2000 Elsevier Science B.V. All rights reserved.
The cost of poor or repeat engineering in complex control systems is extremely high, and flexibility in software design and implementation is one of the key factors in staying competitive in the market. Complexity can...
详细信息
ISBN:
(纸本)9780769544298
The cost of poor or repeat engineering in complex control systems is extremely high, and flexibility in software design and implementation is one of the key factors in staying competitive in the market. Complexity can be managed most effectively if the underlying softwaresystems support structured, standardised, high-level abstraction layers that encapsulate unnecessary details behind well-defined interfaces. Moreover, since the costs of software maintenance are often as high as that of initial development, the ease with which it is possible flexibly to reconfigure, re-engineer, and replace software components in operational systems is also critical. In this paper, we present a lightweight, component-based approach to engineering embedded real-time control software, which is realized in the form of a middleware system named MIREA. The middleware supports dynamic reconfiguration of components written in C/C++, and addresses variability management in relation to non-functional properties, such as quality-of-service (QoS) and real-time scheduling. Users are allowed to componentize existing libraries easily, such as the standard NIST 4D/Real-time Control systems (RCS) library, which has been successfully used in many U.S government-driven intelligent control projects, and to reuse them as dynamically reconfigurable components. A realistic illustration is provided showing how control systems are structured and reconfigured using our approach. In fact, we discuss our approach to control using a fusion of NIST RCS as a means of architecting a real time control system and MIREA as a means of realising that architecture. Our progress to date suggests that MIREA is indeed well suited as a middleware facilitating the construction of efficient, lightweight, and scalable real-time embedded control systems.
The rise of explicit parallel programming involves new problems: lack of structure for parallel algorithms and the ad hoc development of parallel algorithms. We use skeletons to characterize and design parallel algori...
详细信息
ISBN:
(纸本)0818675829
The rise of explicit parallel programming involves new problems: lack of structure for parallel algorithms and the ad hoc development of parallel algorithms. We use skeletons to characterize and design parallel algorithms and define a process to refine the designs step-by-step into programs. This paper introduces a high-level library on top of MPI which is derived from the skeleton concept to achieve better programmability and obtain portability. We conclude with a CFD application to demonstrate our idea.
To face the rapid growth of DNA sequencing data, it is of great importance to study high efficiency compression techniques to reduce the cost of storing the massive amount of sequencing data. In this paper, we propose...
详细信息
ISBN:
(纸本)9781479941698
To face the rapid growth of DNA sequencing data, it is of great importance to study high efficiency compression techniques to reduce the cost of storing the massive amount of sequencing data. In this paper, we propose a parallel DNA data compressor/decompressor, PLDSRC, based on the famous serial DSRC software. We first analyze the compression and decompression algorithm in DSRC and identity three basic operations, namely read, work, and write. Then a single pipeline parallel algorithm is proposed to accelerate the compression/decompression procedure. To further exploit today's popular multi-core, multi-socket systems based on the non-uniform memory access (NUMA) architecture, we extend the single pipeline approach to the multi-pipeline case. Experiments on two different platforms are done and show that PLDSRC in both single and multiple pipeline forms is able to speed up DNA sequencing data compression/decompression greatly, while maintaining the same compressing ratio. Examples indicate that the maximum speedup of PLDSRC on compressing and decompressing is respectively around 24.71x and 22.00x, as compared to the serial DSRC software.
暂无评论