Preconditioning techniques based on ILU decomposition, on Frobenius norm minimization and on factorized sparse approximate inverse are considered. These algorithms are applied with conjugate gradient-type methods, nam...
详细信息
Preconditioning techniques based on ILU decomposition, on Frobenius norm minimization and on factorized sparse approximate inverse are considered. These algorithms are applied with conjugate gradient-type methods, namely Bi-CGSTAB, QMR and TFQMR for the solution of complex, large, sparse linear systems. The results of numerical experiments in scalar environment with matrices arising from transport in porous media, quantum chemistry, structural dynamics and electromagnetism are analysed. The preconditioner that appears most significant in parallel environment (based on factorized sparse approximate inverse) is then employed on a Cray T3E supercomputer. The experimental results show the satisfactory parallel performance of the proposed algorithm. Copyright (C) 2003 John Wiley Sons, Ltd.
This paper describes a newly proposed simple and efficient parallel algorithm for the construction of the Delaunay triangulation (DT) in E 2 by randomized incremental insertion. The construction of the DT is one of th...
详细信息
ISBN:
(纸本)158113861X
This paper describes a newly proposed simple and efficient parallel algorithm for the construction of the Delaunay triangulation (DT) in E 2 by randomized incremental insertion. The construction of the DT is one of the fundamental problems in computer graphics. The proposed algorithm is designed for parallel systems with shared memory and several processors. Such hardware (especially with two-processors) became available in the last few years thanks to low prices and at present, there is still a lack of parallel algorithms that are simple to implement and efficient enough to be an attractive alternative to long existing serial algorithms. The designed algorithm incorporates new method for synchronization among PEs based on the simple geometric test (i.e. if no other points lie in the circum-circle of accessed triangle, this triangle can be modified independently on others PEs). We implemented the algorithm in C++ and tested it on workstations up to four processors where we reached relatively good speed-up to our serial implementation. When only two processors were used we reached even super-linear speed-up.
The class of complex modified Korteweg-de Vriet (CMKdV) equations has many applications. One form of the CMKdV equation has been used to create models for the nonlinear evolution of plasma waves [5], for the propagati...
详细信息
ISBN:
(纸本)1892512416
The class of complex modified Korteweg-de Vriet (CMKdV) equations has many applications. One form of the CMKdV equation has been used to create models for the nonlinear evolution of plasma waves [5], for the propagation of transverse waves in a molecular chain [3], Another form of the CMKdV equation has been used for the traveling-wave and for a double homoclinic orbit [4]. In this paper we introduce sequential and parallel split-step Fourier methods for numerical simulations of the above-equation. The parallel methods are implemented on the Origin 2000 multiprocessor computer. Our numerical experiments have shown that these methods give considerable speedup.
A parallel algorithm for the solution of unsteady Euler equations on unstructured and moving meshes is developed. A cell-centered finite volume scheme is used. The temporal discretization involves an implicit time-int...
详细信息
A parallel algorithm for the solution of unsteady Euler equations on unstructured and moving meshes is developed. A cell-centered finite volume scheme is used. The temporal discretization involves an implicit time-integration scheme based on backward-Euler time differencing. The movement of the computational mesh is accomplished by means of a dynamically deforming mesh algorithm. The parallelization is based on decomposition of the domain into a series of subdomains with overlapped interfaces. The scheme is computationally efficient, time accurate, and stable for large time increments. Detailed descriptions of the solution algorithm are given, and computations for airflow around a NACA0012 airfoil and a missile configuration are presented to demonstrate the applications.
The usual concern when scaling an algorithm on a parallel model ofcomputation is preserving efficiency while increasing or decreasing the number of processors. Manyalgorithms for reconfigurable models, however, attain...
详细信息
The usual concern when scaling an algorithm on a parallel model ofcomputation is preserving efficiency while increasing or decreasing the number of processors. Manyalgorithms for reconfigurable models, however, attain constant time at the expense of an inefficientalgorithm. For these algorithms, scaling down the number of processors while preservinginefficiency is no benefit once constant time execution is lost. In fact, one can often acceleratethe efficiency of these algorithms while reducing the number of processors. To quantify thisimprovement in efficiency, this paper introduces the measure of degree of scalability to complementthe insight obtained from efficiency for such algorithms. Demonstrating the utility of this measure,we present new reconfigurable mesh (R-Mesh) algorithms for multiple addition and matrix-vectormultiplication, improving both the number of processors and the degree of scalability compared toprevious algorithms. We also extend these results to floating point number operands, which havepreviously received little attention on the R-Mesh.
The list-ranking problem is considered for parallel computers which communicate through an interconnection network. Each PU holds k nodes of a set of linked lists. A no-vel randomized algorithm gives a considerable im...
详细信息
The list-ranking problem is considered for parallel computers which communicate through an interconnection network. Each PU holds k nodes of a set of linked lists. A no-vel randomized algorithm gives a considerable improvement over earlier ones: for a large class of networks and sufficiently large k, it takes only twice the number of steps required by a k-k routing. For hypercubes the condition is k = omega(log(2) N). Even better results are achieved for d-dimensional meshes: we show that the ranking time exceeds the routing time only by lower-order terms for all k = omega(d(2)). We also show that list-ranking requires at least the time required for k-k routing. Thus, the results are within a factor two from optimal, those for meshes even match the lower bound up to lower-order terms. (C) 2002 Elsevier Science (USA). All rights reserved.
This paper presents a fault-tolerant technique based on the modulus replication residue number system. (MRRNS) which allows for modular arithmetic computations over identical channels. In this system, fault tolerance ...
详细信息
This paper presents a fault-tolerant technique based on the modulus replication residue number system. (MRRNS) which allows for modular arithmetic computations over identical channels. In this system, fault tolerance is provided by adding extra computational channels that can be used to redundantly compute the mapped output. An algebraic technique is used to determine the error position in the mapped outputs and provide corrections. We also show that by taking advantage of some elementary polynomial properties we obtain the same level of fault tolerance with about a 30% decrease in the number of channels. This new system is referred to as.. the symmetric MRRNS (SMRRNS).
Transactions within a mobile database management system face many restrictions. These cannot afford unlimited delays or participate in multiple retry attempts for execution. The proposed embedded concurrency control (...
详细信息
Transactions within a mobile database management system face many restrictions. These cannot afford unlimited delays or participate in multiple retry attempts for execution. The proposed embedded concurrency control (ECC techniques provide support on three counts, namely-to enhance concurrency, to overcome problems due to heterogeneity, and to allocate priority to transactions that originate from mobile hosts. These proposed ECC techniques can be used to enhance the server capabilities within a mobile database management system. Adoption of the techniques can be beneficial in general, and for other special cases of transaction management in distributed real-time database management systems. The proposed model can be applied to other similar problems related to synchronization, such as the generation of a backup copy of an operational database system. (C) 2003 Elsevier Science B.V. All rights reserved.
Although evolutionary algorithm is a powerful optimization tool, its computation cost involved in terms of, time and hardware resources increases as the size or complexity of the problem increases. One promising appro...
详细信息
Although evolutionary algorithm is a powerful optimization tool, its computation cost involved in terms of, time and hardware resources increases as the size or complexity of the problem increases. One promising approach to overcome this limitation is to exploit the inherent parallelism of evolutionary algorithms by creating an infrastructure necessary to support distributed evolutionary computing using existing Internet, and hardware resources. This paper presents a Java-based distributed evolutionary computing software (Paladin-DEC), which enhances the concurrent processing and performance of evolutionary algorithms by allowing inter-communications of subpopulations among various computers over the Internet. Such a distributed system enables individuals to migrate among multiple subpopulations according to some patterns to induce diversity of elite individuals periodically, in a way that simulates the species evolve in natural environment. The Paladin-DEC software is capable of keeping data integrity throughout the computation, and is incorporated with the features of robustness, security, fault tolerance, and work balancing. The effectiveness and advantages of the Paladin-DEC are illustrated upon two case studies of drug scheduling in cancer chemotherapy and searching probe sets of yeast genome.
Based on Luo's parallel algorithm [4] for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer, we present an improved algorithm. Its communication mechanism is simple and redundant comp...
详细信息
ISBN:
(纸本)3540200541
Based on Luo's parallel algorithm [4] for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer, we present an improved algorithm. Its communication mechanism is simple and redundant computing is small for solving massively systems. The numerical experiments show that the parallel efficiency of the improved algorithm is higher than Luo's algorithm [4].
暂无评论