This paper presents a new highly parallel algorithm for computing the minimum-norm (T) least-squares (S) solution of inconsistent linear equations Ax = b(A is an element of R-r(mxn), b is an element of R(A)). By this ...
详细信息
This paper presents a new highly parallel algorithm for computing the minimum-norm (T) least-squares (S) solution of inconsistent linear equations Ax = b(A is an element of R-r(mxn), b is an element of R(A)). By this algorithm the solution x = A(S,T)(+)b is obtained in T = (1 + m)(1 + log(2) m) + n (6 + log(2)(n - r + 1) + log(2) m + log(2) n) - r(1 + log(2) n) steps with P = mn processors when m greater than or equal to 2(n - 1) and with P = 2n(n - 1) processors otherwise. (C) 2002 Elsevier Science Inc. All rights reserved.
A parallel domain decomposition algorithm of Schwarz type is introduced to solve the mixed finite element equation for the Dirichlet boundary value problem of a second-order partial differential equation of elliptic t...
详细信息
A parallel domain decomposition algorithm of Schwarz type is introduced to solve the mixed finite element equation for the Dirichlet boundary value problem of a second-order partial differential equation of elliptic type and convergence of this algorithm is analyzed. (C) 2002 Elsevier Science Inc. All rights reserved.
The simulation of biologically realistic neural networks requires the numerical solution of very large systems of differential equations. Variables within the system can be changing at rates that vary by orders of mag...
详细信息
The simulation of biologically realistic neural networks requires the numerical solution of very large systems of differential equations. Variables within the system can be changing at rates that vary by orders of magnitude, not only at different times of the solution, but at the same time in different parts of the network. Therefore, an efficient implementation must be able to vary the solution step size, and do so independently in different subsystems. A single processor algorithm is presented in which each neuron can be solved with its own step size by using a priority queue to integrate them in the correct order. But this leaves the problem of how communication and synchronisation between neurons should be managed when executing in parallel. The proposed solution uses an algorithm based on waveform relaxation, which allows groups of neurons on different processors to be solved independently and hence in parallel, for substantial parts of the computation. Realistic test problems were run on a distributed memory parallel computer and results show that speedups of 10 using 16 processors are achievable, and indicate that further speedups may be possible. (C) 2000 Elsevier Science B.V. All rights reserved.
Based on the general methods in power flow calculation of power system and on conceptions and classifications of parallel algorithm, a new approach named Dynamic Asynchronous parallel algorithm that applies to the onl...
详细信息
Based on the general methods in power flow calculation of power system and on conceptions and classifications of parallel algorithm, a new approach named Dynamic Asynchronous parallel algorithm that applies to the online analysis and real-time dispatching and controlling of large-scale power network was put forward in this paper. Its performances of high speed and dynamic following have been verified on IEEE-14 bus system.
With the emerging of new applications,especially in Web,Such as E-Commerce,Digital Library and DNA Bank,object database systems show their stronger funcitons than other kinds of database systems due to their powerful ...
详细信息
With the emerging of new applications,especially in Web,Such as E-Commerce,Digital Library and DNA Bank,object database systems show their stronger funcitons than other kinds of database systems due to their powerful representation ability on complex semantics and *** distinguished feature of object database systems is path expression,and most queries on an object database ar based on path expression because it is the most natural and convenient way to access the object databse,for example,to navigate the hyper-links in a web-based database,The execution of path expression is usually extremely expensive on a very large ***,the improvement of path expression eecution efficiency is critical for the performance ofobject *** an importan approach realizing high-performance query processing ,the parallel processing of path expression on distributed object databases is explored in this *** to now,some algorithms about how to compute path expressions and how to optimize path expression processing have been proposed for ***,few approaches have been presented for computing path expressions in *** this paper,a new paralle algorithm for computing path expression named parallel Cascade Semijoin(PCSJ)is ***,a new scheduling strategy called right-deep zigzag tree is designed to further improve the performance of the PCSJ *** exper-iments have been implemented in an NOW distributed and parallel *** results show that the PCSJ algorithm outperforms the other two parallel algorithms(the parallel version of forward pointer chasing algorithm(PFPC)and the index splitting parallel algorithm(IndexSplit) when computing path expressions with restrictive predicates and that the right-deep zigzage tree scheduling strategy has better performance than the right-deep tree scheduling strategy.
Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and crim...
详细信息
Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database, previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it is a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.
A parallel algorithm for prefix computation of N = n(4) elements on an n x n extended multi-mesh network is presented. The A parallel algorithm for prefix computation of N = n(4) network is a modified version of an ea...
详细信息
A parallel algorithm for prefix computation of N = n(4) elements on an n x n extended multi-mesh network is presented. The A parallel algorithm for prefix computation of N = n(4) network is a modified version of an earlier multi-mesh network with a 4-regular structure. The algorithm takes O(N-1/4) time on N processors (13N(1/4)-5 communication steps and log N + 4 arithmetic/logic steps). (C) 2002 Elsevier Science B.V. All rights reserved.
A new algorithm for parallel calculation of the second derivatives (Hessian) of the conformational energy function of biomolecules in internal coordinates is proposed. The basic scheme of this algorithm is the divisio...
详细信息
A new algorithm for parallel calculation of the second derivatives (Hessian) of the conformational energy function of biomolecules in internal coordinates is proposed. The basic scheme of this algorithm is the division of the entire calculation of the Hessian matrix (called "task") into subtasks and the optimization of the assignment of processors to each subtask by considering both the load balancing and reduction of the communication cost. A genetic algorithm is used for this optimization considering the dependencies between subtasks. We applied this method to a glutaminyl transfer RNA (Gln-tRNA) molecule for which the scalability of our previously developed parallel algorithm was significantly decreased when the large number of processors was used. The speedup for the calculation was 32.6 times with 60 processors, which is considerably better than the speedup for our previously reported parallel algorithm. The elapsed time for the calculation of subtasks, data sending, and data receiving was analyzed, and the effect of the optimization using the genetic algorithm is discussed. (C) 2002 John Wiley Sons, Inc.
In this paper, a high-speed parallel residue-to-binary converter is proposed for a recently introduced moduli set S(k) = {2(m) - 1, 2(20m) + 1, 2(21m) + 1, ..., 2(2km) + 1} for a general value of k. The proposed conve...
详细信息
In this paper, a high-speed parallel residue-to-binary converter is proposed for a recently introduced moduli set S(k) = {2(m) - 1, 2(20m) + 1, 2(21m) + 1, ..., 2(2km) + 1} for a general value of k. The proposed converter uses simple cyclic shift and concatenation operations and does not require any multiplier. Individual converters for the cases of k = 0 and k = 1 are derived from the general architecture and compared with those existing in the literature. The converter for S(0) is twice as fast requiring only one-half of the hardware, while that of S(1) is three times as fast, but requiring only 60% of the hardware, as compared to the corresponding ones existing in the literature. Furthermore, the proposed converters are implemented using 0.5-micron CMOS VLSI technology. Based on S(0), the layouts for 8-bit, 16-bit, 32-bit and 64-bit converters are generated, and the corresponding simulation results obtained.
Dissipative particle dynamics (DPD) and its generalization-the fluid particle model (FPM)-represent the 'fluid particle' approach for simulating fluid-like behavior in the mesoscale. Unlike particles from the ...
详细信息
Dissipative particle dynamics (DPD) and its generalization-the fluid particle model (FPM)-represent the 'fluid particle' approach for simulating fluid-like behavior in the mesoscale. Unlike particles from the molecular dynamics (MD) method, the 'fluid particle' can be viewed as a 'droplet' consisting of liquid molecules. In the FPM, 'fluid particles' interact by both central and non-central, short-range forces with conservative, dissipative and Brownian character. In comparison to MD, the FPM method in three dimensions requires two to three times more memory load and a three times greater communication overhead. Computational load per step per particle is comparable to MD due to the shorter interaction range allowed between 'fluid particles' than between MD atoms. The classical linked-cells technique and decomposing the computational box into strips allow for rapid modifications of the code and for implementing non-cubic computational boxes. We show that the efficiency of the FPM code depends strongly on the number of particles simulated, the geometry of the box and the computer architecture. We give a few examples from long FPM simulations involving up to 8 million fluid particles and 32 processors. Results from FPM simulations in three dimensions of the phase separation in binary fluid and dispersion of the colloidal slab are presented. A scaling law for symmetric quench in phase separation has been properly reconstructed. We also show that the microstructure of dispersed fluid depends strongly on the contrast between the kinematic viscosities of this fluid phase and the bulk phase. This FPM code can be applied for simulating mesoscopic flow dynamics in capillary pipes or critical flow phenomena in narrow blood vessels. Copyright (C) 2002 John Wiley Sons, Ltd.
暂无评论