The paper proposes two parallel meta-heuristics. One is a cooperative parallel tabu search which incorporates historical information exchange among processors in addition to its own searching of each processor. The ot...
详细信息
ISBN:
(纸本)0769509363
The paper proposes two parallel meta-heuristics. One is a cooperative parallel tabu search which incorporates historical information exchange among processors in addition to its own searching of each processor. The other is a cooperative parallel search between genetic algorithm and tabu search processes. Through computational experiment, we observe the improvement of solutions by our proposed method.
I/O intensive applications have posed great challenges to computational scientists. A major problem of these ap plications is that users have to sacrifice performance requirement in order to satisfy storage capacity r...
详细信息
ISBN:
(纸本)0769507840
I/O intensive applications have posed great challenges to computational scientists. A major problem of these ap plications is that users have to sacrifice performance requirement in order to satisfy storage capacity requirement bl a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media evert state-of-the-art I/O optimizations are employed. In this paper we present a distributed multi-storage resource architecture that carr satisfy both performance and capacity requirements by employing multiple storage resources. Compared to traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. It can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. We also develop an Application programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. As I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate the performance database. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing.
This paper adopts a transformational programming approach for deriving massively parallel algorithms from functional specifications. It gives a brief description of a framework for relating key higher order functions ...
详细信息
ISBN:
(纸本)0780365429
This paper adopts a transformational programming approach for deriving massively parallel algorithms from functional specifications. It gives a brief description of a framework for relating key higher order functions such as map, reduce, and scan with communicating processes with different configurations. The parallelisation of many interesting functional algorithms can then be systematically synthesized by combining "off the shelf" parallel implementations of instances of these higher order functions. Efficiency in the final message-passing algorithms is achieved by exploiting data parallelism, for generating the intermediate results in parallel; and functional parallelism, for processing intermediate results in stages such that the output of one stage is simultaneously input to the next one. This approach is illustrated through a case study for testing whether all the elements of a given list are distinct. Bird-Meertens formalism is used to concisely carry out algebraic transformations.
We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera speci...
详细信息
We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, which uses MPI library calls, has been implemented on both the CRAY T3E and the 8-way SMP IBM SP with Power3 processors. The third version is a hybrid MPI-OpenMP implementation and is used on the SMP IBM SP. This version uses MPI to communicate between nodes and OpenMP to perform shared memory operations among processors within a node. We explain the three different parallelization approaches and present parallel performance results of these three parallel implementations on three different machines. We observe near perfect speedup for the three versions on the three architectures. The results on the SMP IBM SP suggest that the hybrid MPI-OpenMP programming is suitable for SMP type machines.
We study the problem of scheduling a set of n independent parallel tasks on m processors, where in addition to the processing time there is a size associated with each task indicating that the task can be processed on...
详细信息
The bulk synchronous task scheduling problem (BSSP) is known as an effective task scheduling problem for distributed-memory machines, but the time complexity of BSSP is unknown. This paper presents a proof of NP-compl...
详细信息
ISBN:
(纸本)0769509363
The bulk synchronous task scheduling problem (BSSP) is known as an effective task scheduling problem for distributed-memory machines, but the time complexity of BSSP is unknown. This paper presents a proof of NP-completeness of BSSP even in the case of unit time tasks and positive integer constant communication delays. This paper also gives an approximation algorithm for BSSP in several restricted cases.
In this work we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wav...
详细信息
In this work we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wavelet packet subband structure (as required for adaptive applications), on the other hand algorithms generating the lowest subband level only (as required for numerical applications). We investigate several optimizations and generalizations of corresponding message passing algorithms and finally compare the results obtained on a Cray T3D and a Parsytec GCel 1024.
In this paper we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wa...
详细信息
In this paper we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wavelet packet subband structure (as required for adaptive applications), on the other hand algorithms generating the lowest subband level only (as required for numerical applications). We investigate several optimizations and generalizations of corresponding message passing algorithms and finally compare the results obtained on a Cray T3D and a Parsytec GCel 1024.
We show that within the paradigm of real time computation, some classes of problems have the property that a solution to a problem in the class, when computed in parallel, is far superior in quality than the best one ...
详细信息
ISBN:
(纸本)0769509363
We show that within the paradigm of real time computation, some classes of problems have the property that a solution to a problem in the class, when computed in parallel, is far superior in quality than the best one obtained on a sequential computer. Examples from these classes are presented. In each case, the solution obtained in parallel is significantly, provably, and consistently better than a sequential one. It is important to note that the purpose of the paper is not to demonstrate merely that a parallel computer can obtain a better solution to a computational problem than one derived sequentially. The latter is an interesting (and often surprising) observation in its own right, but we wish to go further. It is shown that the improvement in quality can be arbitrarily high (and certainly superlinear in the number of processors used by the parallel computer). This result is akin to superlinear speedup, a phenomenon itself originally thought to be impossible.
Prefetching brings data into the cache before it is expected by the processor, thereby eliminating a potential cache miss. There are two major prefetching schemes. In a software scheme, the compiler predicts the memor...
详细信息
ISBN:
(纸本)0769509363
Prefetching brings data into the cache before it is expected by the processor, thereby eliminating a potential cache miss. There are two major prefetching schemes. In a software scheme, the compiler predicts the memory access pattern and places prefetch instructions into the code. In a hardware scheme the hardware predicts the memory access pattern and brings data into the cache before required by the processor. This paper proposes a hardware prefetching scheme, where a second processor is used for prefetching data for the primary processor. The scheme does not predict memory access patterns, but rather uses the second processor. To run ahead of the primary processor so as to detect future memory accesses and prefetch these references.
暂无评论