High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel mess...
详细信息
High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language. (C) 1999 Elsevier Science B.V. All rights reserved.
Simulated annealing is an effective method for solving large combinatorial optimisation problems. Because of its iterative nature the annealing process requires a substantial amount of computation time. A new parallel...
详细信息
Simulated annealing is an effective method for solving large combinatorial optimisation problems. Because of its iterative nature the annealing process requires a substantial amount of computation time. A new parallel implementation based on the concurrency control theory of database systems is presented;the parallelised annealing process is serialisable. Concurrent updates to the base solution are allowed provided that they do not have data conflict. Using the travelling salesman problem as the example application, the parallel simulated annealing algorithm is implemented on a Motorola Delta 3000 shared-memory multiprocessor system with eight processors. With a moderate problem size of 400 cities, a speedup efficiency of over 90% is achieved at high annealing temperature and close to 100% at a low annealing temperature.
CAP, a computer-aided parallelization tool, generates highly pipelined applications that run communication and I/O operations in parallel with processing operations. One of CAP's successes is the Visible Human Sli...
详细信息
CAP, a computer-aided parallelization tool, generates highly pipelined applications that run communication and I/O operations in parallel with processing operations. One of CAP's successes is the Visible Human Slice Server (http://visible ***), a 3D tomographic image server that allows clients to choose and view any cross section of the human body.
The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers ...
详细信息
The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it, The Cache-Only Memory Architecture (CO,MA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state. The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.
We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of m...
详细信息
We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule "space efficient" if the amount of memory required is at mast equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We will describe a simple, locally depth-first, scheduling algorithm and shaw that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism.
Graphical visualisation plays an important role in parallel program development. Researchers have proposed and developed many visualisation tools that assist the development of parallel programs. A number of graph for...
详细信息
Graphical visualisation plays an important role in parallel program development. Researchers have proposed and developed many visualisation tools that assist the development of parallel programs. A number of graph formalisms or notations have been used to visualise various aspects of parallel programs and their executions. This paper attempts to classify and compare these graph formalisms and notations which provide different information at different stages of parallel program development. (C) 1999 Academic Press.
Once in a while, a great idea makes it across the boundary of one discipline to take root in another. The adoption of Christopher Alexander's patterns by the software community is one such event. Alexander both co...
详细信息
Once in a while, a great idea makes it across the boundary of one discipline to take root in another. The adoption of Christopher Alexander's patterns by the software community is one such event. Alexander both commands respect and inspires controversy in his own discipline. It is odd that his ideas should have found a home in software, a discipline that deals not with timbers and tiles but with pure thought stuff, and with ephemeral and weightless products called programs. The software community embraced the pattern vision for its relevance to problems that had long plagued software design in general and object-oriented design in particular. Focusing on objects had caused us to lose the system perspective. Preoccupation with design method had caused us to lose the human perspective. The curious parallels between Alexander's world of buildings and our world of software construction helped the ideas to take root and thrive in grassroots programming communities worldwide. The pattern discipline has become one of the most widely applied and important ideas of the past decade in software architecture and design.
A chemical mixture under conditions of constant temperature and pressure may split into different phases. The number of phases and the composition of each may be determined by globally minimizing the Gibbs free energy...
详细信息
A chemical mixture under conditions of constant temperature and pressure may split into different phases. The number of phases and the composition of each may be determined by globally minimizing the Gibbs free energy of the system. This can be done by iterating between an easy local minimization problem with a high number of variables and a difficult global search and verification problem in a small number of variables. The global problem can be solved by a branch and bound method, using bounds from interval analysis. When implemented in parallel, the method has lower communication requirements than other related branch and bound approaches for general global minimization. We present a parallel implementation on a network cluster of workstations that exploits this characteristic. On difficult instances, utilizations of over 90% are obtained using up to 14 processors. The algorithm copes well with varying workstation loads and has low communication overheads. A method of assessing the performance of a parallel algorithm on a shared heterogeneous network of workstations is developed.
There are important classes of parallel systems built from components that need to be described in terms of asynchronous concurrent activities. For such systems, the model on which MPI relies proves to be far too rest...
详细信息
暂无评论