The parallel computation thesis states that time-bounded parallel machines are polynomially related to space-bounded computers. Dymond and Cook (1980) state an extended parallel computation thesis: 1. parallel time ...
详细信息
The parallel computation thesis states that time-bounded parallel machines are polynomially related to space-bounded computers. Dymond and Cook (1980) state an extended parallel computation thesis: 1. parallel time and hardware requirements are simultaneously polynomially related to sequential (TM) reversal and space requirements. 2. parallel time and hardware are polynomially related. It is proved that every set that can be accepted by a TM in time T(n) can be accepted by a parallel machine in time log T(n), which gives evidence that both the parallel computation thesis and the extended parallel computation thesis are incorrect. The proof of the theorem crucially depends on the properties that: 1. An arbitrary finite number of processors can be activated in one parallel step. 2. Each memory cell of an arbitrary large finite global memory can be accessed from each processor. However, in many papers, at least the first property does not hold.
The solution of the algebraic eigenvalue problem is an important component of many applications in science and engineering. With the advent of novel architecture machines, much research effort is now being expended in...
详细信息
The solution of the algebraic eigenvalue problem is an important component of many applications in science and engineering. With the advent of novel architecture machines, much research effort is now being expended in the search for parallel algorithms for the computation of eigensystems which can gainfully exploit the processing power which these machines provide. Among important recent work References 1-4 address the real symmetric eigenproblem in both its dense and sparse forms, Reference 5 treats the unsymmetric eigenproblem, and Reference 6 investigates the solution of the generalized eigenproblem. In this paper two algorithms for the parallel computation of the eigensolution of Hermitian matrices on an array processor are presented. These algorithms are based on the parallel Orthogonal Transformation algorithm (POT) for the solution of real symmetric matrices[7,8]. POT was developed to exploit the SIMD parallelism supported by array processors such as the AMT DAP 510. The new algorithms use the highly efficient implementation strategies devised for use in POT. The implementations of the algorithms permit the computation of the eigensolution of matrices whose order exceeds the mesh size of the array processor used. A comparison of the efficiency of the two algorithms for the solution of a variety of matrices is given.
We present an efficient method for the partitioning of rectangular domains into equi-area sub-domains of minimum total perimeter. For a variety of applications in parallel computation, this corresponds to a load-balan...
详细信息
We present an efficient method for the partitioning of rectangular domains into equi-area sub-domains of minimum total perimeter. For a variety of applications in parallel computation, this corresponds to a load-balanced distribution of tasks that minimize interprocessor communication. Our method is based on utilizing, to the maximum extent possible, a set of optimal shapes for sub-domains. We prove that for a large class of these problems, we can construct solutions whose relative distance from a computable lower bound converges to zero as the problem size tends to infinity. PERIX-GA, a genetic algorithm employing this approach, has successfully solved to optimality million-variable instances of the perimeter-minimization problem and for a one-billion-variable problem has generated a solution within 0.32% of the lower bound. We report on the results of an implementation on a CM-5 supercomputer and make comparisons with other existing codes.
The research in parallel machine scheduling in combinatorial optimization suggests that the desirable parallel efficiency could be achieved when the jobs are sorted in the non-increasing order of processing times. In ...
详细信息
The research in parallel machine scheduling in combinatorial optimization suggests that the desirable parallel efficiency could be achieved when the jobs are sorted in the non-increasing order of processing times. In this paper, we find that the time spending for computing the permanent of a sparse matrix by hybrid algorithm is strongly correlated to its permanent value. A strategy is introduced to improve a parallel algorithm for sparse permanent. Methods for approximating permanents, which have been studied extensively, are used to approximate the permanent values of submatrices to decide the processing order of jobs. This gives an improved load balancing method. Numerical results show that the parallel efficiency is improved remarkably for the permanents of fullerene graphs, which are of great interests in nanoscience. Copyright (c) 2012 John Wiley & Sons, Ltd.
作者:
KRUSKAL, CPDepartment of Computer Science
University of Illinois Abstract Authors References Cited By Keywords Metrics Similar Download Citation Email Print Request Permissions
We study the number of comparison steps required for searching, merging, and sorting with P processors. We present a merging algorithm that is optimal up to a constant factor when merging two lists of equal size (inde...
详细信息
We study the number of comparison steps required for searching, merging, and sorting with P processors. We present a merging algorithm that is optimal up to a constant factor when merging two lists of equal size (independent of the number of processors); as a special case, with N processors it merges two lists, each of size N, in 1.893 lg lg N + 4 comparison steps. We use the merging algorithm to obtain a sorting algorithm that, in particular, sorts N values with N processors in 1.893 lg N lg lg N/lg lg lg N(plus lower order terms) comparison steps. The algorithms can be implemented on a shared memory machine that allows concurrent reads from the same location with constant overhead at each comparison step.
Convergence proofs are given for one-sided Jacobi/Hestenes methods for the singular value problem. The limiting form of the matrix iterates for the Hestenes method with optimization when the original matrix is normal ...
详细信息
Convergence proofs are given for one-sided Jacobi/Hestenes methods for the singular value problem. The limiting form of the matrix iterates for the Hestenes method with optimization when the original matrix is normal is derived; this limiting matrix is block diagonal, where the blocks are multiples of unitary matrices. A variation in the algorithm to guarantee convergence to a diagonal matrix for the symmetric eigenvalue problem is shown. Implementation techniques for parallel computation, in particular, on the hypercube are indicated.
The family of decision problems of the threshold languages L(g) is considered. A threshold language L(g) is the set of n bit vectors having at least g(n) "1"s. Using a new technique for controlling the size ...
详细信息
The family of decision problems of the threshold languages L(g) is considered. A threshold language L(g) is the set of n bit vectors having at least g(n) "1"s. Using a new technique for controlling the size and structure of a hypergraph by a potential function, lower bounds are proven for these decision problems on a PRIORITY PRAM with m shared memory cells and any polynomial number of processors. The lower bounds are almost tight for the admissible range (m less-than-or-equal-to n is-an-element-of). By combining these results with the results of Vishkin and Wigderson and the results of Li and Yesha, this paper is able to show a complexity gap between an m cell PRIORITY PRAM having an exponential (or unlimited) number of processors and one having only a polynomial number. A consequence of these results is that PRIORITY PRAM and ARBITRARY PRAM with m shared memory cells and any given polynomial number of processors have the same power (up to a small factor) for computing symmetric functions.
A time cost model for parallel computation in CORBA-distributed objects is introduced and a methodology for enhancing performance of distributed applications is proposed. A new four-tiered architecture, against tradit...
详细信息
A time cost model for parallel computation in CORBA-distributed objects is introduced and a methodology for enhancing performance of distributed applications is proposed. A new four-tiered architecture, against traditional three-tiered one, is derived form constructed cost model for Internet distributed applications. (C) 2002 Elsevier Science B.V. All rights reserved.
With the development of the active distribution network (ADN), distributed state estimation (DSE) has become an inevitable trend for state estimation (SE). The efficient partitioned strategy is an essential prerequisi...
详细信息
ISBN:
(纸本)9789881563903
With the development of the active distribution network (ADN), distributed state estimation (DSE) has become an inevitable trend for state estimation (SE). The efficient partitioned strategy is an essential prerequisite for DSE. However, the existing methods only consider the basic requirements of equilibrium connectivity, and the similarity of buses in one sub-region is ignored. A new partitioned strategy is proposed in this paper, this method learns from the idea of hierarchical clustering, the buses of the distribution network are aggregated to form sub-regions, and CUDA platform is used to realize the parallel computation of SE. The improved IEEE33-node system and a real distribution network are analyzed as a case study. The results show that, compared with traditional CSE, the estimation accuracy of DSE is higher than that of CSE, which indicates that the estimation accuracy of the proposed method is higher than that of CSE. Besides, compared with serial computation, this method can effectively reduce the running time of DSE in each sub-region and improve the overall calculation efficiency.
Up to now,so much casting analysis software has been continuing to develop the new access way to real casting processes. Those include the melt flow analysis,heat transfer analysis for solidification calculation,mecha...
详细信息
Up to now,so much casting analysis software has been continuing to develop the new access way to real casting processes. Those include the melt flow analysis,heat transfer analysis for solidification calculation,mechanical property predictions and microstructure predictions. These trials were successful to obtain the ideal results comparing with real situations,so that CAE technologies became inevitable to design or develop new casting processes. But for manufacturing fields,CAE technologies are not so frequently being used because of their difficulties in using the software or insufficient computing performances. To introduce CAE technologies to manufacturing field,the high performance analysis is essential to shorten the gap between product designing time and prototyping time. The software code optimization can be helpful,but it is not enough,because the codes developed by software experts are already optimized enough. As an alternative proposal for high performance computations,the parallel computation technologies are eagerly being applied to CAE technologies to make the analysis time shorter. In this research,SMP (Shared Memory Processing) and MPI (Message Passing Interface) (1) methods for parallelization were applied to commercial software "Z-Cast" to calculate the casting processes. In the code parallelizing processes,the network stabilization,core optimization were also carried out under Microsoft Windows platform and their performances and results were compared with those of normal linear analysis codes.
暂无评论