In this paper we present an improved search procedure for the General Parameter Method (GPM) [1]. Our procedure maps uniform dependence algorithms to application-specific processor arrays (PAs). It can optimize genera...
详细信息
ISBN:
(纸本)0818656026
In this paper we present an improved search procedure for the General Parameter Method (GPM) [1]. Our procedure maps uniform dependence algorithms to application-specific processor arrays (PAs). It can optimize general design objectives with certain non-monotonicity properties, i.e., those that do not increase monotonically withthe parameters. An example of such an objective is the minimization of the total completion time, including load and drain times. In contrast, earlier design methods can only deal with monotonic objectives. We present results for the matrix-product problem using this search technique;application of the technique on the transitive-closure problem is presented elsewhere [2]. We also show that the parameters in GPM can be expressed in terms of the schedule vector II and allocation matrix S in the popular dependence-based method (DM), thereby allowing GPM to be used in DM for finding optimal designs for uniform dependence algorithms.
the extended GCD algorithm is very useful for data dependence tests, for example, the Power Test, on supercomputers. this paper parallelizes the extended GCD algorithm on a CREW SM MIMD computer with O(n) processors. ...
详细信息
ISBN:
(纸本)0818656026
the extended GCD algorithm is very useful for data dependence tests, for example, the Power Test, on supercomputers. this paper parallelizes the extended GCD algorithm on a CREW SM MIMD computer with O(n) processors. At first, we improve the sequential extended GCD algorithm. then, we parallelize the extended GCD algorithm by two methods. First, we parallelize to triangularize the matrix by reducing elements in the same column simultaneously. this algorithm almost has no algorithmic redundancy, but has the efficiency O(1/ log2n) where n is the number of variables. Second, some rows, which have been reduced at the current column in the above algorithm, can be reduced at the next column immediately. this algorithms has the efficiency O(min(m, n-1(/n) where m is the number of linear diophantine equations, and is powerful when m is large, when m is equal to n-1, the efficiency of the algorithm is O(1).
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields ...
详细信息
ISBN:
(纸本)0818656026
this paper presents the results of running the some benchmarks from the Genesis suite on the Transtech Paramid. the benchmarks use the PARMACS parallelprocessing standard, and are based on applications in the fields of general relativity, molecular dynamics and QCD. the Paramid is a distributed memory parallel computer, using up to 64 Intel i860-XP processors. the results demonstrate good parallel performance, and the ability of the machine to run standard portable software.
Scalable parallel computer architectures provide the computational performance demanded by advanced biological computing problems. NIH has developed a number of parallelalgorithms and techniques useful in determining...
详细信息
Scalable parallel computer architectures provide the computational performance demanded by advanced biological computing problems. NIH has developed a number of parallelalgorithms and techniques useful in determining biological structure and function. these applications include processing electron micrographs to determine the three-dimensional structure of viruses, calculating the solvent accessible surface area of proteins to predict the three-dimensional conformation of these molecules from their primary structure, and searching for homologous DNA sequences in large genetic databases. Timing results demonstrate substantial performance improvements withparallel implementations compared with conventional sequential systems.
Performance analysis of concurrent executions in parallel systems has been recognized as a challenging problem. the aim of this research is to study approximate solution techniques for this problem. We model the struc...
详细信息
ISBN:
(纸本)0818656026
Performance analysis of concurrent executions in parallel systems has been recognized as a challenging problem. the aim of this research is to study approximate solution techniques for this problem. We model the structure of the parallel machine and the structure of the jobs executing on such a system. WE investigate rich classes of jobs, which can be expressed by sequential, And-Fork/ And-join, Or-Fork/OR-Join, and probabilistic-Fork. We propose an efficient performance prediction technique for these classes of jobs.
In this paper we present a new mapping strategy of the dynamic space warping algorithm (DSWA) onto our micro-grained array processor (MGAP). this new mapping strategy reduces the communication complexity between proce...
详细信息
ISBN:
(纸本)0818656026
In this paper we present a new mapping strategy of the dynamic space warping algorithm (DSWA) onto our micro-grained array processor (MGAP). this new mapping strategy reduces the communication complexity between processing elements and increases the performance due to data pipelining and interleaving. the DSWA, which can be applied to image recognition, originally needs a four-dimensional array. Practically, however, this four-dimensional algorithm must be mapped onto a two-dimensional array processor. A previous mapping used O(NW) processors to compute the distance between an N × N input image and a reference image withthe warping distance W in O(NW) time. Our new mapping scheme uses O(N)2 processors to generate each computation result in O(N+W2) time. We also show the experimental results and performance comparison between Connection Machine (CM) 200 and the MGAP.
Previous work on parallel database systems has paid little attention to the interaction of asynchronous disk prefetching and processor parallelism. this paper investigates this issue for scan operations on shared-memo...
详细信息
ISBN:
(纸本)0818656026
Previous work on parallel database systems has paid little attention to the interaction of asynchronous disk prefetching and processor parallelism. this paper investigates this issue for scan operations on shared-memory multi-processors. Two heuristic methods are developed for the allocation of processors and memory to optimize either the speedup or the benefit/cost ratio of database scan operations. the speedup optimization balances the data production rate of the disks and the data consumption rate of the processors, aiming at optimal speedup while ensuring that resources are not allocated unnecessarily.
the thOREAU simulation of vehicular traffic on city streets and freeways, developed by the MITRE Corporation, has been adapted to run in parallel on a network of Unix workstations connected by ethernet. Tenfold and la...
详细信息
ISBN:
(纸本)0818656026
the thOREAU simulation of vehicular traffic on city streets and freeways, developed by the MITRE Corporation, has been adapted to run in parallel on a network of Unix workstations connected by ethernet. Tenfold and larger speedups were observed by running as many as 40 parallelthreads on 34 processors. the performance curves shows little sign of leveling off with higher degrees of parallelism, which may mean that further gains can be obtained with more processors.
We present an optimal parallel algorithm that runs in O(√n) time on a √n×ROOTn mesh to compute the constrained Delaunay triangulation of a planar straight-line graph G whose vertices lie in an n-element set S. ...
详细信息
ISBN:
(纸本)0818656026
We present an optimal parallel algorithm that runs in O(√n) time on a √n×ROOTn mesh to compute the constrained Delaunay triangulation of a planar straight-line graph G whose vertices lie in an n-element set S. Implications of our result also include an efficient PRAM algorithm for the same problem, a new optimal mesh algorithm to compute a planar Voronoi diagram, as well as a partial solution to the problem of efficient parallel computation of the geodesic Voronoi diagram of a point set inside a simple polygon.
Workstation farms are primarily used in single-threaded batch jobs, but there is a growing community of users interested in using farms as 'degenerate' MPP systems. the lack of stable system software has stall...
详细信息
ISBN:
(纸本)0818656026
Workstation farms are primarily used in single-threaded batch jobs, but there is a growing community of users interested in using farms as 'degenerate' MPP systems. the lack of stable system software has stalled the growth of the MPP market, and 'parallel' farms may potentially follow the same fate. We believe the High Performance Fortran is the right answer to the needs of the parallel programming community. the paper will discuss workstation farms and focus on the HPF language and compiler.
暂无评论