In recent years a lot of research has been invested in parallelprocessing of numerical applications. However, parallelprocessing of Symbolic and AI applications has received less attention. this paper presents a sys...
详细信息
In recent years a lot of research has been invested in parallelprocessing of numerical applications. However, parallelprocessing of Symbolic and AI applications has received less attention. this paper presents a system for parallel symbolic computing, named ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parallelism and Independent And-parallelism. In this paper we focus on the implementation of the and-parallel part of the ACE system (called &ACE) on a shared memory multiprocessor, describing its organization, some optimizations, and presenting some performance figures, proving the ability of &ACE to efficiently exploit parallelism.
the increasing gap between the speed of microprocessors and memory subsystems makes it imperative to exploit locality of reference in sequential irregular applications. the parallelization of such applications require...
详细信息
the increasing gap between the speed of microprocessors and memory subsystems makes it imperative to exploit locality of reference in sequential irregular applications. the parallelization of such applications requires special considerations. Current RTS (Run-Time Support) for irregular computations fail to exploit the fine grain regularity present in these applications, producing unnecessary time and memory overheads. PILAR (parallel Irregular Library with Application of Regularity) is a new RTS for irregular computations that provides a variety of internal representations of communication patterns based on their regularity;allowing for the efficient support of a wide spectrum of regularity under a common framework. Experimental results on the IBM SP-1 and Intel Paragon demonstrate the validity of our approach.
the importance of adapting networks of workstations for use as parallelprocessing platforms is well established. However, current solutions do not always address important issues that exist in real networks. External...
详细信息
the importance of adapting networks of workstations for use as parallelprocessing platforms is well established. However, current solutions do not always address important issues that exist in real networks. External factors like the sharing of resources, unpredictable behavior of the network, and failures, are present in multiuser networks and must be addressed. Calypso is a prototype software system for writing and executing parallel programs on non-dedicated platforms, based on COTS networked workstations, operating systems, and compilers. Among notable properties of the system are: (1) simple programming paradigm incorporating shared memory constructs and separating the programming and the execution parallelism, (2) transparent utilization of unreliable shared resources by providing dynamic load balancing and fault tolerance, and (3) effective performance for large classes of coarse-grained computations. We present the system and report our initial experiments and performance results in settings that closely resemble the dynamic behavior of a 'real' network. Under varying work-load conditions, resource availability and process failures, the efficiency of the test program we present ranged from 84% to 94% bench-marked against a sequential program.
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributed computing, and in the area of high-speed networking. these advances in compu...
详细信息
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributed computing, and in the area of high-speed networking. these advances in computing technology have also led to a tremendous increase in the amount of data being manipulated and produced by scientific and commercial application programs. Despite their popularity, message-passing libraries only provide part of the support necessary for most high performance distributed computing applications - support for high speed parallel I/O is still lacking. In this paper, we provide an overview of the conceptual design of a parallel and distributed I/O file system, the Virtual parallel File System (VIP-FS), and describe its implementation. VIP-FS makes use of message-passing libraries to provide a parallel and distributed file system which can execute over multi-processor machines or heterogeneous network environments.
Tremendous strides are being made in the development of applications for scalable, parallel, high performance computing systems. One of the factors limiting further applications has been the lack of small, rugged, emb...
详细信息
Tremendous strides are being made in the development of applications for scalable, parallel, high performance computing systems. One of the factors limiting further applications has been the lack of small, rugged, embeddable systems to support embedded airborne, shipboard, and landbased installations operating in severe environments. Litton Guidance and Control Systems, together with MasPar Computer Corporation, and supported by the Advanced Research Projects Agency (ARPA), Computer Systems Technology Office (CSTO), are addressing this problem. Together, we are repackaging MasPar's highly successful commercial massively parallelprocessing system to minimize size and maximize survivability in severe environments. the resulting rugged scalable parallel system is software transparent withthe commercial systems.
In this paper, we describe a new approach for clustering workstations into a dedicated, medium-sized, shared memory parallel processor. this new approach, called the network shared memory (NSM) approach, is based upon...
详细信息
In this paper, we describe a new approach for clustering workstations into a dedicated, medium-sized, shared memory parallel processor. this new approach, called the network shared memory (NSM) approach, is based upon a new way of looking at the role of communication networks in a multi-computer system. We develop an implementation model of the architecture of an NSM based workstation cluster. this model serves as the basis for simulations that we use to assess the performance of NSM based workstation clusters. We also use simulations to evaluate the performance of architectures representative of existing approaches for workstation clustering as well architectures representative of commercial symmetric multi-processors. the results of the performance assessment show that the NSM approach outperforms existing approaches for clustering workstations.
A parallel program archetype aids in the development of reliable, efficient parallelapplications with common computation/communication structures by providing stepwise refinement methods and code libraries specific t...
详细信息
A parallel program archetype aids in the development of reliable, efficient parallelapplications with common computation/communication structures by providing stepwise refinement methods and code libraries specific to the structure. the methods and libraries help in transforming a sequential program into a parallel program via a sequence of refinement steps that help maintain correctness while refining the program to obtain the appropriate level of granularity for a target machine. the specific archetype discussed here deals withthe integration of task and data parallelism using group communication. this archetype has been used to develop several applications.
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) mac...
详细信息
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) machines since the number of computations can be increased to match the available number of processors. Performance tuning can be particularly difficult for these applications since it must often be performed with a smaller problem size than that targeted for eventual execution. this research develops a performance prediction methodology that addresses this problem through symbolic analysis of program source code. Algebraic manipulations can then be performed on the resulting analytical model to determine performance for scaled up applications on different hardware architectures.
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the gr...
详细信息
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points than others. We introduce a new block decomposition method, Fair Binary Recursive Decomposition (FBRD), which is suitable for a collection of heterogeneous processors, and extend it to accommodate non-uniform problems (NUFBRD). Mathematical comparisons of the NUFBRD method and other common partitioning schemes are presented to show the expected performance level of this new decomposition technique.
the RACE(R) parallel computer system provides a high-performance parallel interconnection network at low cost. this paper describes the architecture and implementation of the RACE system, a parallel computer for embed...
详细信息
the RACE(R) parallel computer system provides a high-performance parallel interconnection network at low cost. this paper describes the architecture and implementation of the RACE system, a parallel computer for embedded applications. the topology of the network, which is constructed with 6-port switches, can be specified by the customer, and is typically a fat-tree, a Clos network, or a mesh. the network employs a preemptable circuit switched strategy. the network and the processor-network interface work together to provide high performance: 160 megabytes per second transfer rates with about 1 microsecond of latency. Priorities can be used to guarantee tight real-time constraints of a few microseconds through a congested network. A self-regulating circuit adjusts the impedence and output delay of the pin-driver pads.
暂无评论