We present a parallel elite-subspace evolutionary algorithm for solving systems of nonlinear equations. On the basis of numerical experiments, we find that our algorithm has an outstanding universal characteristics an...
详细信息
ISBN:
(纸本)0780378040
We present a parallel elite-subspace evolutionary algorithm for solving systems of nonlinear equations. On the basis of numerical experiments, we find that our algorithm has an outstanding universal characteristics and superior convergence as well. All of the solutions can be obtained within a short period of time.
In this paper we present a 3-D parallel filtering algorithm. This algorithm is highly parallel and efficient as it eliminates the overhead associated with the overlapping segments in the block-filtering approach. It a...
详细信息
In this paper we present a 3-D parallel filtering algorithm. This algorithm is highly parallel and efficient as it eliminates the overhead associated with the overlapping segments in the block-filtering approach. It also lifts the restrictions on the input size for high efficiency in the block-filtering algorithm, as both the 3-D input data and impulse response of the system are decimated into eight subsections each. These subsections can be simultaneously and independently processed. The results of the implementation of the 3-D parallel filtering algorithm on multiDSP platform is presented and discussed showing a high performance reflected by the highly parallel architecture and good memory distribution of the 3-D parallel algorithm.
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, i...
详细信息
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. We have developed a set of parallel algorithms for data cube construction using a new data structure called aggregation tree. Our experience has shown that a number of performance trade-offs arise in developing a parallel data cube implementation. We focus on three important issues, which are: (1) data distribution, i.e., how the original array is distributed among the processors; (2) level of parallelism, i.e., what parts of the computation are parallelized and sequentialized; and (3) frequency of communication, i.e., does the implementation require frequent interprocessor communication (and less memory) or less frequent communication (and more memory). We present a detailed experimental study evaluating the above trade-offs. We consider parallel data cube construction with different cube sizes and sparsity levels. Our experimental results show the following: (1) In all cases, reducing the frequency of communication and using higher memory gave better performance, though the difference was relatively small. (2) Choosing data distribution to minimize communication volume made a substantial difference in the performance in most of the cases. (3) Finally, using parallelism at all levels gave better performance, even though it increases the total communication volume.
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we ...
详细信息
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O (log(3) n) time using O (nl log(2) n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O (log n) time using O (n) processors if the input intervals are not given already sorted and using O (n/log n) processors otherwise, on the EREW PRAM. On n-processor hypercubes, our algorithm for the proper interval case takes O (log n log log n) time for unsorted input and O (log n) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs.
This paper discusses the parallel propagation algorithm applied to the decoding of convolutional codes. The parallel algorithm surpasses the BCJR algorithm in parallel computational complexity and performance when it ...
详细信息
This paper discusses the parallel propagation algorithm applied to the decoding of convolutional codes. The parallel algorithm surpasses the BCJR algorithm in parallel computational complexity and performance when it is applied to tail-biting codes. The complexity of iterative decoding algorithms depends on the calculation of posterior probability and bit error probability by using clique and number of iterations. The computational complexity of the parallel algorithm for tail-biting codes is almost the same as that for zero-tail convolutional codes.
We have implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000. Our computational experiments show that parallel Quicksort outperf...
详细信息
We have implemented sample sort and a parallel version of Quicksort on a cache-coherent shared address space multiprocessor: the SUN ENTERPRISE 10000. Our computational experiments show that parallel Quicksort outperforms sample sort. Sample sort has been long thought to be the best, general parallel sorting algorithm, especially for larger data sets. On 32 processors of the ENTERPRISE 10000 the speedup of parallel Quicksort is more than six units higher than the speedup of sample sort, resulting in execution times that were more than 50% faster than sample sort. On one processor, parallel quicksort achieved 15% percent faster execution times than sample sorting. Moreover, because of its low memory requirements, parallel Quicksort could sort data sets at twice the size that sample sort could under the same system memory restrictions.
In this paper, the characteristic of parallel computing model of BSP and NOWS was analyzed. The research indicates that based from some algorithm that designed the rationalization parallel computing, the parallel comp...
详细信息
ISBN:
(纸本)0780377028
In this paper, the characteristic of parallel computing model of BSP and NOWS was analyzed. The research indicates that based from some algorithm that designed the rationalization parallel computing, the parallel computing model is fit with the environment. With the improvement of parallel computing algorithm based on NOWS of linear programming normal, the simplex method has obtained the best result to validate the conclusion.
In this paper, we report on 2-D and 3-D versions of parallelization, and efforts to simplify reconstruction of data during post processing. The parallelization scheme uses single-coordinate domain decomposition with 1...
详细信息
In this paper, we report on 2-D and 3-D versions of parallelization, and efforts to simplify reconstruction of data during post processing. The parallelization scheme uses single-coordinate domain decomposition with 1-cell overlap, and asynchronous time-stepping. We have performed a series of calibration on sheet beam klystron cavity with long drift tube using MAGIC software.
We propose a practical parallel on-the-fly algorithm for enumerative LTL (linear temporal logic) model checking. The algorithm is designed for a cluster of workstations communicating via MPI (message passing interface...
详细信息
ISBN:
(纸本)9780769520353
We propose a practical parallel on-the-fly algorithm for enumerative LTL (linear temporal logic) model checking. The algorithm is designed for a cluster of workstations communicating via MPI (message passing interface). The detection of cycles (faulty runs) effectively employs the so called back-level edges. In particular, a parallel level-synchronized breadth-first search of the graph is performed to discover back-level edges. For each level, the back-level edges are checked in parallel by a nested depth-first search to confirm or refute the presence of a cycle. Several optimizations of the basic algorithm are presented and advantages and drawbacks of their application to distributed LTL model-checking are discussed. Experimental implementation of the algorithm shows promising results.
The processing of three-dimensional (3-D) objects from 3-D digital image data is an important task in the image processing and the computer vision fields. The distance transform (DT) is extensively applied in the imag...
详细信息
The processing of three-dimensional (3-D) objects from 3-D digital image data is an important task in the image processing and the computer vision fields. The distance transform (DT) is extensively applied in the image processing and computer vision areas as a key operation. In a two or three-dimensional image array, the computation of distance transform (DT) is an important task. With the increasing application of 3-D voxel images, it is useful to consider the distance transform of a 3-D digital image array. In order to provide the efficient transform computations, parallelism is employed. We develop parallel algorithms for the three-dimensional Euclidean distance transform (3D-EDT) on the SIMD hypercube computer. The time complexity of our parallel algorithm is O(log/sup 2/N) for an N/spl times/N/spl times/N image array using N/sup 3/ processors. A generalized parallel algorithm for the 3D-EDT is also proposed and it runs O((N/p)/sup 3/log(N)+(N/p)/sup 2/log/sup 2/p) time for an N/spl times/N/spl times/N binary image array on the SIMD hypercube computer using p/sup 3/ PE's, where 1/spl les/p/spl les/N.
暂无评论