With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management prob...
详细信息
With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.
The Hybrid Technology Multi-Threading project is a long-term study of the feasibility of combining several emerging technologies to reach 1 petaFLOPS within ten years. HTMT will combine high-speed superconductor proce...
详细信息
The Hybrid Technology Multi-Threading project is a long-term study of the feasibility of combining several emerging technologies to reach 1 petaFLOPS within ten years. HTMT will combine high-speed superconductor processors, semiconductor memories with built-in processors, high-speed optical interconnects, and high-density holographic storage. While there are major challenges in all aspects of this project, those in processor architecture are the focus of this paper. Fundamental differences between RSFQ circuits and conventional semiconductor circuits, including a radical jump in clock speed, make today's processor design approaches inappropriate for HTMT. Sequential instruction dispatching, even within the lowest programming unit (a strand), will lead to unacceptably high latencies, hence poor performance. We propose alternative processor designs which use fine-grain synchronizations between individual instructions in order to avoid these bottlenecks.
In the recovery of failed processes in a distributed program, causal logging schemes offer several benefits. These benefits include no rollback of unfailed processes and simple approaches to output commit. Unfortunate...
详细信息
In the recovery of failed processes in a distributed program, causal logging schemes offer several benefits. These benefits include no rollback of unfailed processes and simple approaches to output commit. Unfortunately, previous approaches to the recovery of multiple simultaneous failures require that the distributed execution be blocked or that recovering processes coordinate. The latter requires assumptions which are not satisfactory. In this paper we present a solution that has neither of these drawbacks.
Introduces Strings, a high-performance distributed shared memory system designed for clusters of symmetrical multiprocessors (SMPs). The distinguishing feature of this system is the use of a fully multithreaded runtim...
详细信息
ISBN:
(纸本)9780818685798
Introduces Strings, a high-performance distributed shared memory system designed for clusters of symmetrical multiprocessors (SMPs). The distinguishing feature of this system is the use of a fully multithreaded runtime system, written using POSIX threads. Strings also allows multiple application threads to be run on each node in a cluster. Since most modern UNIX systems can multiplex these threads on kernel-level lightweight processes, applications written using Strings can use all the processors in a SMP machine. This paper describes some of the architectural details of the system and analyzes the performance improvements with two example programs and a few benchmark programs from the SPLASH-2 suite.
With small device features in submicron technologies, interconnection delays play a dominant part in cycle time. Hence, it is important to consider the impact of physical design during high level synthesis. In compari...
详细信息
With small device features in submicron technologies, interconnection delays play a dominant part in cycle time. Hence, it is important to consider the impact of physical design during high level synthesis. In comparison to a traditional approach which separates high-level synthesis from physical design, an algorithm which is able to make these stages interact very closely, would result in solutions with lower latency and area. However, such an approach could result in increased runtimes. parallel processing is an attractive way of reducing the runtimes. In this paper, two parallel algorithms for simultaneous scheduling, binding and floorplanning algorithm are presented. A detailed hardware model is considered, taking into account multiplexor and register areas and delays. Experimental results are reported on an IBM SP-2 multicomputer, with close to linear speedups for a set of benchmark circuits.
Data staging is an important data management problem for a distributed heterogeneous networking environment, where each data storage location and intermediate node may have specific data available, storage limitations...
详细信息
Data staging is an important data management problem for a distributed heterogeneous networking environment, where each data storage location and intermediate node may have specific data available, storage limitations, and communication links. Sites in the network request data items and each item is associated with a specific deadline and priority. It is assumed that not all requests can be satisfied by their deadline. The work concentrates on solving a basic version of the data staging problem in which all parameter values for the communication system and the data request information represent the best known information collected so far and stay fixed throughout the scheduling process. A mathematical model for the basic data staging problem is introduced. Then, a multiple-source shortest-path algorithm based heuristic for finding a suboptimal schedule of the communication steps for data staging is presented. A simulation study is provided, which evaluates the performance of the proposed heuristic. The results show the advantages of the proposed heuristic over two random based scheduling techniques. This research, based on the simplified static model, serves as a necessary step toward solving the more realistic and complicated version of the data staging problem involving dynamic scheduling, fault tolerance, and determining where to stage data.
The domain of a global function is the set of all global states of an execution of a distributed program, We show how to monitor a program in order to determine if there exists a global state in which the sum x(1) + x...
The domain of a global function is the set of all global states of an execution of a distributed program, We show how to monitor a program in order to determine if there exists a global state in which the sum x(1) + x(2) +...+ x(N), exceeds some constant K, where x(i) is defined in process i. We examine the cases where x(i) is an integer variable for N = 2 and where x(i) is a boolean variable for general N, For both cases we provide algorithms, prove their correctness, and analyze their complexity. (C) 1997 Academic Press.
A system for specification and proof of distributed programs is presented. The method is based directly on the partial order of local states (poset) and avoids the notions of time and simultaneity. Programs are specif...
详细信息
A system for specification and proof of distributed programs is presented. The method is based directly on the partial order of local states (poset) and avoids the notions of time and simultaneity. Programs are specified by documenting the relationship between local states which are adjacent to each other in the poset. Program properties are defined by stating properties of the poset. Many program properties can be expressed succinctly and elegantly using this method because poset properties inherently account for varying processor execution speeds. The system utilizes a proof technique which uses induction on the complement of the causally precedes relation and is shown to be useful in proving poset properties. We demonstrate the system on three example algorithms: vector clocks, mutual exclusion, and direct dependency clocks.
In this paper we address the problem of partitioning nested loops with non-uniform (irregular) dependence vectors. parallelizing and partitioning of nested loops requires efficient inter-iteration dependence analysis....
详细信息
In this paper we address the problem of partitioning nested loops with non-uniform (irregular) dependence vectors. parallelizing and partitioning of nested loops requires efficient inter-iteration dependence analysis. Although many methods exist for nested loop partitioning, most of these perform poorly when parallelizing nested loops with irregular dependences. Unlike the case of nested loops with uniform dependences these will have a complicated dependence pattern which forms a non-uniform dependence vector set. We apply the results of classical convex theory and principles of linear programming to iteration spaces and show the correspondence between minimum dependence distance computation and iteration space tiling. Cross-iteration dependences are analyzed by forming an Integer Dependence Convex Hull (IDCH). Every integer point in this IDCH corresponds to a dependence vector in the iteration space of the nested loops. A simple way to compute minimum dependence distances from the dependence distance vectors of the extreme points of the IDCH is presented. Using these minimum dependence distances the iteration space can be tiled. Iterations within a tile can be executed in parallel and the different tiles can then be executed with proper synchronization. We demonstrate that our technique gives much better speedup and extracts more parallelism than the existing techniques.
A heterogeneous environment for hardware/software cosimulation is described. This environment permits a portion of an application's subsystems to be simulated using reconfigurable hardware while the remainder of t...
详细信息
A heterogeneous environment for hardware/software cosimulation is described. This environment permits a portion of an application's subsystems to be simulated using reconfigurable hardware while the remainder of the subsystems are simulated using software. An Aptix FPCB populated with Xilinx FPGAs serves as the hardware simulation platform while an IBM-compatible PC serves as the software simulation platform. The two platforms are connected using an Altera reconfigurable logic board which allows the development of a high-speed interface for communication. This paper focuses on the difficulties associated with designing and interfacing simulation entities in this heterogeneous environment. Strategies for designing hardware and software simulation entities are introduced. These strategies reduce the impact of size and performance constraints imposed by the cosimulation environment while addressing the issues of time management and synchronization. A simple queueing application is used to illustrate a design methodology which incorporates these design strategies.
暂无评论