This paper proposes two parallel meta-heuristics. One is a cooperative parallel tabu search which incorporates with the historical information exchange among processors in addition to its own searching of each process...
详细信息
ISBN:
(纸本)0769509363
This paper proposes two parallel meta-heuristics. One is a cooperative parallel tabu search which incorporates with the historical information exchange among processors in addition to its own searching of each processor. The other is a cooperative parallel starch between genetic algorithm and tabu search processes. Through computational experiment we observe the improvement of solutions by our proposed method.
Prefetching brings data into the cache before it is expected by the processor, thereby eliminating a potential cache miss. There are two major prefetching schemes. I,I a software scheme, the compiler predicts the memo...
详细信息
ISBN:
(纸本)0769509363
Prefetching brings data into the cache before it is expected by the processor, thereby eliminating a potential cache miss. There are two major prefetching schemes. I,I a software scheme, the compiler predicts the memory access pattern and places prefetch instructions into the code. In a hardware scheme the hardware predicts the memory access pattern and brings data into the cache before required by the processor: This paper proposes a hardware prefetching scheme, where a second processor is used for prefetching data for the primary processor. The scheme does not predict memory access patterns, but rather uses the second processor to run ahead of the primary processor so as to detect future memory accesses and prefetch these references.
This paper describes Uintah, a component-based visual problem solving environment (PSE) that is designed to specifically address the unique problems of massively parallel computation, on terascale computing platforms....
详细信息
ISBN:
(纸本)0769507840
This paper describes Uintah, a component-based visual problem solving environment (PSE) that is designed to specifically address the unique problems of massively parallel computation, on terascale computing platforms. Uintah supports the entire life cycle of scientific applications by allowing scientific programmers to quickly and easily develop new techniques, debug new implementations, and apply known algorithms to solve novel problems. Uintah is built on three principles: 1) As much as possible, the complexities of parallel execution should be handled for the scientist, 2) software should be reusable at the component level, and 3) scientists should be able to dynamically steer and visualize their simulation results as the simulation executes. To provide this functionality, Uintah builds upon the best features of the SCIRun PSE and the DoE Common Component Architecture (CCA).
Merging is one of the most fundamental problems in computer science. it is well known that Omega (N/p+log log N) time is required to merge two sorted sequences each of length N on CRCW PRAM with p processors. where p ...
详细信息
ISBN:
(纸本)0769509363
Merging is one of the most fundamental problems in computer science. it is well known that Omega (N/p+log log N) time is required to merge two sorted sequences each of length N on CRCW PRAM with p processors. where p less than or equal to N log(alpha) N for any constant alpha. In this paper, we describe two optimal O(1) time solutions to the problem for p = N on BSR (Broadcasting with Selective Reduction). They are the first constant time solutions to the problem on any model of computation We also give an optimal solution to the problem for p < N on BSR with O(N/p) time, which is the first improvement with non-constant time but still better than the lower bound for PRAM.
作者:
Grosz, LMassey Univ
Inst Informat & Math Sci N Shore Mail Ctr Auckland New Zealand
We consider the algebraic multilevel iteration (AMLI) for the solution of systems of linear equations as they arise from a finite-difference discretization on a rectangular grid. Key operation is the matrix-vector pro...
详细信息
We consider the algebraic multilevel iteration (AMLI) for the solution of systems of linear equations as they arise from a finite-difference discretization on a rectangular grid. Key operation is the matrix-vector product, which can efficiently be executed on vector and parallel-vector computer architectures if the nonzero entries of the matrix are concentrated in a few diagonals. In order to maintain this structure for all matrices on all levels coarsening in alternating directions is used. In some cases it is necessary to introduce additional dummy grid hyperplanes. The data movements in the restriction and prolongation are crucial, as they produce massive memory conflicts on vector architectures. By using a simple performance model the best of the possible vectorization strategies is automatically selected at runtime. Examples show that on a Fujitsu VPP300 the presented implementation of AMLI reaches about 85% of the useful performance, and scalability with respect to computing time can be achieved.
We show that within the paradigm of real-time computation, some classes of problems have the pl-ope,tv that a solution to a problem in the class, when computed in parallel, is far superior in quality than the best one...
ISBN:
(纸本)0769509363
We show that within the paradigm of real-time computation, some classes of problems have the pl-ope,tv that a solution to a problem in the class, when computed in parallel, is far superior in quality than the best one obtained on a sequential computer. Example from these classes are presented. In each case, the solution obtained in parallel is significantly, provably, and consistently better than a sequential one. It is important to note that the purpose of this paper is not to demonstrate merely that a parallel computer can obtain a better solution to a computational problem than one derived sequentially. The latter is an interesting (and often surprising) observation in its own right, but we wish to go further. It is shown here that the improvement in quality can be arbitrarily high (and certainly superlinear irt the number of processors used by the parallel computer). This result is akin to superlinear speedup-a phenomenon itself originally thought to be impossible.
Matrix operations are the core of many linear systems. Efficient matrix multiplication is critical to many numerical applications, such as climate modeling, molecular dynamics computational fluid dynamics and etc. Muc...
详细信息
ISBN:
(纸本)0769509363
Matrix operations are the core of many linear systems. Efficient matrix multiplication is critical to many numerical applications, such as climate modeling, molecular dynamics computational fluid dynamics and etc. Much research work has been done to improve the performance of matrix operations. However, the majority of these works is focused on two-dimensional (2D) matrix. Very little research work has been done on three or higher dimensional matrix. Recently. a new structure called Extended Karnaugh Map Representation (EKMR) for n-dimensional (nD) matrix representation has been proposed, which provides better matrix operations performance compared to the Traditional matrix representation (TMR). The main idea of EKMR is to represent any no matrix by 2D matrices. Hence, efficient algorithms design for no matrices becomes less complicated. parallel matrix operation algorithms based oil EKMR and TMR are presented Analysis and experiments are conducted to assess their performance. Both our analysis and experimental result show that parallel algorithms based on EKMR outperform those based on TMR.
The present paper investigates parallelization of general purpose numerical optimization algorithms, where the optimization algorithm is coupled with an existing analysis program. Since these optimization algorithms m...
详细信息
We propose a new topology for multicomputer networks: Parametrically described, Regular, anal based on Semigroups (PRS) networks (or R-s(N, v, g) graphs with the order NI the degree tl, the girth g, and the number of ...
详细信息
ISBN:
(纸本)0769509363
We propose a new topology for multicomputer networks: Parametrically described, Regular, anal based on Semigroups (PRS) networks (or R-s(N, v, g) graphs with the order NI the degree tl, the girth g, and the number of equivalence classes s). Many classes of networks such as hypercubes, circulants, cube-connected cycles, etc. are shown to be special cases of the proposed network. Here. we explore the basic structure, topological properties, optimization of parameters and syntheses of optimal networks having the minimal Corameter for the given parameters of the graph. Correspondingly, we examine the optimal characteristics with respect to transit delays and structural survival in such networks. The PRS networks reaching the lover bounds on the diameter were synthesized. In some cases, we found that the new network has a better diameter than classes of networks described in the literature provided they have the same vertex and edge complexity.
Three dimensional (3D) stacked implementation has been proposed as a new technology for massively parallel computers. However, two major limitations have hindered the progress in this direction: the technology of vert...
详细信息
ISBN:
(纸本)0769509363
Three dimensional (3D) stacked implementation has been proposed as a new technology for massively parallel computers. However, two major limitations have hindered the progress in this direction: the technology of vertical interconnects and the cost in terms of area for these vertical interconnects. Each vertical interconnect requires 300 mum 300mum area, thus liberal is prohibited. Clearly, an interconnection philosophy which minimizes these vertical links can contribute to the success of a 3D implementation. Hierarchical 3D-Torus network, called H3D-torus has been proposed to reduce the number of vertical links in 3D stacked implementation but keeping good network feature. This paper addresses the architectural details of H3D-torus network, and explores aspects such as the network diameter, the peak number of vertical links, and VLSI layout area for the H3D-torus network as well as for several commonly used networks for parallel computers.
暂无评论