On the basis of Floyd algorithm with the extended path matrix, a parallel algorithm which resolves all-pair shortest path (APSP) problem on cluster environment is analyzed and designed. Meanwhile, the parallel APSP ...
详细信息
On the basis of Floyd algorithm with the extended path matrix, a parallel algorithm which resolves all-pair shortest path (APSP) problem on cluster environment is analyzed and designed. Meanwhile, the parallel APSP pipelining algorithm makes full use of overlapping technique between computation and communication. Compared with broadcast operation, the parallel algorithm reduces communication cost. This algorithm has been implemented on MPI on PC-cluster. The theoretical analysis and experimental results show that the parallel algorithm is an efficient and scalable algorithm.
A model of cellular automata (CA) is considered to be a well-studied non-linear model of complex systems in which an infinite one-dimensional array of finite state machines (cells) updates itself in a synchronous mann...
详细信息
A model of cellular automata (CA) is considered to be a well-studied non-linear model of complex systems in which an infinite one-dimensional array of finite state machines (cells) updates itself in a synchronous manner according to a uniform local rule. A sequence generation problem on the CAs has been studied and many scholars proposed several real-time sequence generation algorithms for a variety of non-regular sequences such as prime, Fibonacci, and {2n|n=1,2,3,...} sequences etc. The paper describes the sequence generation powers of CAs having a small number of states, focusing on the CAs with one, two, and three internal states, respectively. The authors enumerate all of the sequences generated by two-state CAs and present several non-regular sequences that can be generated in real-time by three-state CAs, but not generated by any two-state CA. It is shown that there exists a sequence generation gap among the powers of those small CAs.
This paper presents two parallel algorithms of basic arithmetic operations concerning multiple-precision integers over finite field GF(2(n)). The parallel algorithms of reduction operation and inversion-multiplication...
详细信息
This paper presents two parallel algorithms of basic arithmetic operations concerning multiple-precision integers over finite field GF(2(n)). The parallel algorithms of reduction operation and inversion-multiplication operation are designed by analyzing their data dependencies. Time complexities of the parallel algorithms and the sequential algorithms are calculated to make the quantitative comparison. The performance evaluation shows high efficiencies of the proposed parallel algorithms.
This paper implements parallel differential evolution algorithm with circular migration strategy. Each population exchanges good individuals with others in a specific generation frequency. This implementation can acce...
详细信息
This paper implements parallel differential evolution algorithm with circular migration strategy. Each population exchanges good individuals with others in a specific generation frequency. This implementation can accelerate calculation speed and enhance global search ability. parallel parameters and algorithm parameters are determined by tests on CEC 2006 benchmark problems. The best set of parameters is used to design four-arm antenna. The antenna is competitive to ST5-3-10 designed by NASA Ames Research Center.
Finding longest common subsequence (LCS) is one of the most important bioinformatics tasks. The algorithm's time and space consumption will increase dramatically with the scale of the problem. This paper analyzed ...
详细信息
Finding longest common subsequence (LCS) is one of the most important bioinformatics tasks. The algorithm's time and space consumption will increase dramatically with the scale of the problem. This paper analyzed the existing LCS algorithms, and proposed a parallel algorithm. The algorithm was designed to running on PC clusters to achieve high performance. Experimental results showed that it is a practical low cost and efficient solution for sequences problem.
In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, ...
详细信息
In this paper, a parallel Surface Extraction from Binary Volumes with Higher-Order Smoothness (SEBVHOS) algorithm is proposed to accelerate the SEBVHOS execution. The original SEBVHOS algorithm is parallelized first, and then several performance optimization techniques which are loop optimization, cache optimization, false sharing optimization, synchronization overhead op-timization, and thread affinity optimization, are used to improve the implementation's performance on multi-core systems. The performance of the parallel SEBVHOS algorithm is analyzed on a dual-core system. The experimental results show that the parallel SEBVHOS algorithm achieves an average of 1.86x speedup. More importantly, our method does not come with additional aliasing artifacts, com-paring to the original SEBVHOS algorithm.
Ray representation (Ray-rep) of a solid has been studied and used in the solid modeling community for many years because of its compactness and simplicity. This paper presents a parallel approach for mesh surface mode...
详细信息
Ray representation (Ray-rep) of a solid has been studied and used in the solid modeling community for many years because of its compactness and simplicity. This paper presents a parallel approach for mesh surface modeling from multi-material volume data using an extended Ray-rep as an intermediate, where every homogeneous region is enclosed by a set of two-manifold surface meshes on the resultant model. The approach consists of three major algorithms: firstly, an algorithm is developed to convert the given multi-material volumetric data into a Ray-rep for heterogeneous solid;secondly, filtering algorithm is exploited to process the rays of heterogeneous solid in parallel;and lastly, the adaptive mesh surfaces are generated from the Ray-rep through a dual-contouring like algorithm. Here the intermediate surfaces between two constituent materials can be directly extracted without building the volumetric mesh, and the manifold topology is preserved on each surface patch. Furthermore, general offset surface can be easily computed in this paradigm by designing a special parallel operator for the rays. (C) 2011 Elsevier B.V. All rights reserved.
General purpose computing on graphical processing units (GPGPU) is a paradigm shift in computing that promises a dramatic increase in performance. GPGPU also brings an unprecedented level of complexity in algorithmic ...
详细信息
General purpose computing on graphical processing units (GPGPU) is a paradigm shift in computing that promises a dramatic increase in performance. GPGPU also brings an unprecedented level of complexity in algorithmic design and software development. In this paper, we present an efficient parallel fault simulator, FSimGP(2), that exploits the high degree of parallelism supported by a state-of-the-art graphic processing unit (GPU) with the NVIDIA compute unified device architecture. A novel 3-D parallel fault simulation technique is proposed to achieve extremely high computation efficiency on the GPU. Global communication is minimized by concentrating as much work as possible on the local device's memory. We present results on a GPU platform from NVIDIA (a GeForce GTX 285 graphics card) that demonstrate a speedup of up to 63x and 4x compared to two other GPU-based fault simulators and up to 95x over a state-of-the-art algorithm on conventional processor architectures.
In this paper, we consider the planar multi-facility Weber problem with restricted zones and non-Euclidean distances, propose an algorithm based on the probability changing method (special kind of genetic algorithms) ...
详细信息
In this paper, we consider the planar multi-facility Weber problem with restricted zones and non-Euclidean distances, propose an algorithm based on the probability changing method (special kind of genetic algorithms) and prove its efficiency for approximate solving this problem by replacing the continuous coordinate values by discrete ones. Version of the algorithm for multiprocessor systems is proposed. Experimental results for a high-performance cluster are given.
Based on the domain decomposition and finite element discretization, a parallel two-level linearization method for the stationary incompressible Navier-Stokes equations is proposed and analyzed. The basic idea of the ...
详细信息
Based on the domain decomposition and finite element discretization, a parallel two-level linearization method for the stationary incompressible Navier-Stokes equations is proposed and analyzed. The basic idea of the method is to first solve the nonlinear problem by Newton's iterations of m times on a coarse grid, and then solve a linearized Oseen problem in parallel on a fine grid to correct the coarse grid solution. The efficiency of the method is illustrated by numerical results. (C) 2010 Elsevier Ltd. All rights reserved.
暂无评论