Object-oriented logic programming (OOLP) is a hybrid of object-orientation and logic programming paradigms. In this paper, we present a new object-oriented logic programming language P&P. P&P supports programm...
详细信息
ISBN:
(纸本)0818674601
Object-oriented logic programming (OOLP) is a hybrid of object-orientation and logic programming paradigms. In this paper, we present a new object-oriented logic programming language P&P. P&P supports programming with communicating nondeterministic objects and stream parallelism for communication among objects. Intuitively, each object has a Parlog 'shell' with Prolog 'contents'. One concern in our design is to integrate Prolog's backtracking, sequential search with Parlog's concurrent execution. Hence P&P supports committed inter-object message passing via the Parlog 'shell' of the objects. Also, object-oriented features are added to provide encapsulation and code reuse.
A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a paral...
详细信息
A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank i, for an arbitrarily given integer i. Practical algorithms needed by our selection algorithm for the dynamic redistribution of data are also discussed. Our general framework is a distributed memory programming model enhanced by a set of communication primitives. We use efficient techniques for distributing, coalescing, and load balancing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results illustrate the scalability and efficiency of our algorithms across different platforms and improve upon all the related experimental results known to the authors.
The coordination style programming language T-Cham extends chemical abstract machine (Cham) with transactions. The Cham is an interactive computational model based on chemical reaction metaphor, where a computation pr...
详细信息
The coordination style programming language T-Cham extends chemical abstract machine (Cham) with transactions. The Cham is an interactive computational model based on chemical reaction metaphor, where a computation proceeds as a succession of chemical reactions. A transaction is a piece of sequentially executed codes and could be written in any language, such as C, Pascal, or Fortran etc., as long as it satisfies its pre-condition and post-condition. Every transaction begins its execution whenever its execution condition is satisfied. A T-Cham program can be executed in a parallel, distributed, or sequential manner based on the available computer resources.
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs withou...
详细信息
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs without being troubled by the details of parallel-programming languages and parallelarchitectures, and the same parallel program can be executed efficiently on different machines. This paper presents a 2D Compression (2DC) grain packing method for determining optimal grain size and inherent parallelism concurrently. This ability is mainly due to 2DC's continuing efforts for achieving conflicting objectives. Experimental results demonstrate that 2DC increases the solution effectiveness, in comparison with state-of-art approaches that aim at economizing either speedup or resource utilization. Additionally, 2DC can determine inherent parallelism, which means that users will no longer be required to specify the number of processors before the compilation stage.
The last few years have seen the introduction of a large number of parallel computers, as well as the failures of several manufacturers. As the machines have been released, they have been thoroughly evaluated, and a w...
详细信息
The last few years have seen the introduction of a large number of parallel computers, as well as the failures of several manufacturers. As the machines have been released, they have been thoroughly evaluated, and a wide range of benchmark data is now available. Furthermore, in this time our expertise in programming and extracting performance from parallel machines has greatly increased, a theory of parallel and network algorithms has developed, and our understanding of the modeling process has improved. Hence, now seems a propitious time to take a critical look at our maxims. This paper takes a critical look at the following three maxims: parallel architecture is converging on a design based on commodity microprocessor chips; wormhole routing is decidedly more efficient than store-and-forward routing; and the PRAM is an unrealistically ideal model of computation.
Many current performance analysis systems offer little more than basic measurement and analysis facilities for locating the sources of poor performance, such as load imbalance, communication overhead and synchronizati...
详细信息
Many current performance analysis systems offer little more than basic measurement and analysis facilities for locating the sources of poor performance, such as load imbalance, communication overhead and synchronization loss. We believe that this is only part of the solution and a system which can provide higher level performance measurement and analysis is highly desirable. In this paper, we describe a new approach to designing performance tuning tools for parallel processing systems. A primary contribution of this work is to explore the way in which the strategies and algorithms used in parallel programs contribute to the poor performance. In order to detect the strategies and algorithms used in parallel programs, a technique called Automatic Program Analysis is used. Our goal is to provide users with higher level performance advices. We present a case study describing how a prototype implementation of our technique was able to identify the performance problem and provide tuning advice.
The MULTIPLUS project aims at the development of a modular parallel architecture suitable for the study of several aspects of parallelism in both true shared memory and virtual shared memory environments. The MULTIPLU...
详细信息
The MULTIPLUS project aims at the development of a modular parallel architecture suitable for the study of several aspects of parallelism in both true shared memory and virtual shared memory environments. The MULTIPLUS architecture is able to support up to 1024 Processing Elements based on SPARC microprocessors. The MULPLIX Unix-like operating system offers a suitable parallelprogramming environment for the MULTIPLUS architecture by providing facilities for the creation of threads, the allocation of private and shared memory space and the efficient use of synchronization primitives. After presenting the main features of the MULTIPLUS architecture and of the MULPLIX operating system, the paper describes in detail the design and the implementation of the three MULTIPLUS architecture basic hardware modules: the Processing Element, the Multistage Interconnection Network and the I/O Processor. In addition, the definition of the MULPLIX parallelprogramming primitives is discussed and their use is illustrated through an example. Finally, future directions in the development of the MULTIPLUS research project are commented.
Software generation in the OORHS (object-oriented reciprocative hypercomputing system) is user-transparent. It addresses the issue of ease of use by minimizing the number of steps leading to a programming solution. Th...
详细信息
Software generation in the OORHS (object-oriented reciprocative hypercomputing system) is user-transparent. It addresses the issue of ease of use by minimizing the number of steps leading to a programming solution. The OORHS requires from the user only a high level APPL program, which is, in effect, a specification. For every APPL program, the system automatically performs all the necessary distributed computing steps. The precompiler, based on the object-oriented paradigm, instantiates the encapsulated program objects embedded in an APPL program. These program objects are distributed at the source level. They are compiled and then executed at the allocated sites. This unique approach, known as local compilation, eliminates the need to store the compilers used by other machines locally. It enhances the compatibility between the compiled program and the host processor. The precompiler generates a program objects dictionary for every APPL program. The contents in the dictionary facilitates program visualization.
Whole array operations and array section operations are important features of many data-parallel languages. Efficient implementation of these operations on distributed-memory multicomputers is critical to the scalabil...
详细信息
Whole array operations and array section operations are important features of many data-parallel languages. Efficient implementation of these operations on distributed-memory multicomputers is critical to the scalability and high-performance of data-parallel programs. We present an approach for analyzing communication patterns induced by array operations and for scheduling message flow based on the information. Our scheduling algorithm guarantees contention-free data transfer and utilizes network resources optimally. It incurs little overhead and is suitable to be used in compilers and in runtime libraries. We also present simulation results that demonstrate the algorithm's superiority to the asynchronous transfer mode that is commonly used for this type of communication.
Most parallel languages provide means to express parallelism, e.g. a parallel-do construct, but no means to terminate the parallel activities spawned by such constructs. We propose three high-level primitives for this...
详细信息
Most parallel languages provide means to express parallelism, e.g. a parallel-do construct, but no means to terminate the parallel activities spawned by such constructs. We propose three high-level primitives for this purpose, which are defined by analogies with primitives that break out of sequential iterative constructs. The primitives are pcontinue, which terminates the calling activity, pbreak, which terminates all the activities in the construct that spawned the calling activity, and return, which terminates all the activities created in the current function call. These constructs are especially useful in search problems, where an activity that finds a solution can terminate other activities that are investigating inferior approaches. Given that parallel constructs can be nested, activities form a tree rooted at the original activity that started the program. The main challenge in implementing pbreak and return is identifying the subtree of activities that should be killed. Three algorithms were designed and implemented, and experiments show that using these constructs can provide significant performance benefits.
暂无评论