A new parallel algorithm for finding the maximum value of a data set is proposed. Execution times are investigated by taking into account the effect of the overhead time of communication for four kinds of interconnect...
详细信息
A new parallel algorithm for finding the maximum value of a data set is proposed. Execution times are investigated by taking into account the effect of the overhead time of communication for four kinds of interconnection networks; cube connection array, linear array, mesh array, and three-dimensional mesh array. The optimal numbers of processors are derived in the case where the number of processors is less than the number of data. Those are O (N 1 2 ), O (N 2 3 ), O (N 3 4 ), and O (N) , respectively, for linear array, mesh array, three-dimensional mesh array, and cube-connected arrays.
This paper describes an O(log(3) n) time O(n/log n) processors parallel algorithm for determining the congruence (exact matching) of two point sets in three-dimensions on a CREW PRAM, where n is the maximum size of th...
详细信息
This paper describes an O(log(3) n) time O(n/log n) processors parallel algorithm for determining the congruence (exact matching) of two point sets in three-dimensions on a CREW PRAM, where n is the maximum size of the input point sets. Although optimal O(n log n) time sequential algorithms were developed for this problem, no efficient parallel algorithm was known previously. In the algorithm, the original problem is reduced to the two-dimensional congruence problem by computing a three-dimensional point set cps(S) for each input point set S, where cps(S) satisfies the following conditions: 0 < \cps(S)\ less than or equal to 12;cps(T(S)) = T(cps(S)) for all isometric transformations T. The two-dimensional problem can be solved efficiently in parallel using a parallel version of a previously-known sequential algorithm. cps(S) is computed recursively in the following way: the size of a point set is reduced by a constant factor in each recursive step. To reduce the size of a point set, a convex hull is constructed and then it is regarded as a planar graph, so that combinatorial properties of a planar graph are used effectively. A sequential version of the algorithm works in O(n log n) time, so that this paper gives another optimal sequential algorithm. The presented algorithm can be applied for graphs such that each vertex corresponds to a point and each edge corresponds to a line segment connecting its endpoints. Moreover, the algorithm can be modified for computing the canonical form of a point set or a graph.
作者:
SHYU, CHAdvanced Technology Center
Electronics Research and Service Organization Industrial Technology Research Institute Taiwan Republic of China
In this paper we study the problem of finding a maximum weight clique of an interval graph. Let n be the number of vertices in the graph. We show that a maximum weight clique of an interval graph can be found in time ...
详细信息
In this paper we study the problem of finding a maximum weight clique of an interval graph. Let n be the number of vertices in the graph. We show that a maximum weight clique of an interval graph can be found in time O(log n ) using n processors on an exclusive-read exclusive-write parallel random access machine.
In this work, the method based on the work of Huang and McColl on analytical inversion of general tridiagonal matrices is parallelized with MPI. The proposed method is not only capable of finding inverses of full pent...
详细信息
In this work, the method based on the work of Huang and McColl on analytical inversion of general tridiagonal matrices is parallelized with MPI. The proposed method is not only capable of finding inverses of full pentadiagonal matrices, but also of those with pentadiagonal envelope, such as tridiagonal matrices. The method is modified to generate an MPI algorithm. The speed-up performance of the parallelized algorithm is also analyzed on different cases.
Reducing work-in-process (WIP) inventory is continuing to be an important business need because of several factors including the need to reduce working capital. Numerous techniques have been suggested for WIP reductio...
详细信息
Reducing work-in-process (WIP) inventory is continuing to be an important business need because of several factors including the need to reduce working capital. Numerous techniques have been suggested for WIP reduction, and CONWIP is a competitive algorithm for WIP reduction. Prior CONWIP algorithms have been primarily sequential algorithms and can be potentially incur significant computing time, especially when dealing with inventories for multiple products. The paper proposes a card-setting algorithm for multiple product types subject to routing and throughput requirements. The proposed algorithm searches the WIP space iteratively and the step-size is adaptively selected based on the known properties of multi-chain, multi-class, closed queuing networks. Furthermore, parallelization of this search algorithm across multiple processors is proposed where each processor searches a different segment of the WIP space while adaptively adjusting its step size for all product types to ensure fast convergence. The proposed parallel algorithm can take advantage of distributed computing architectures to speed-up the overall computation. An experimental implementation of the parallel algorithm using Message Passing Interface (MPI) over a high-speed network is described. Computational results demonstrate that the proposed parallel algorithm can be parallelized over eight to ten processors to obtain a speed-up of three to five.
This study provides a parallel algorithm for meshfree analysis based on the radial point interpolation method. This algorithm is designed to suit graphics hardware that support compute unified device architecture, NVI...
详细信息
This study provides a parallel algorithm for meshfree analysis based on the radial point interpolation method. This algorithm is designed to suit graphics hardware that support compute unified device architecture, NVIDIA's software development environment on graphics processing units (GPUs). The radial point interpolation method is a meshfree method that enables strict imposition of Dirichlet boundary conditions, and it mainly consists of two steps: the process for assembly of the discrete linear system and the process for solving the linear system. This paper shows that both the processes can be performed effectively on the GPU by applying our parallel algorithm and at the same time, the costs for data transfer to and from the GPU can be reduced, because the processes related to the discrete linear system are performed independently on the GPU. Our numerical tests confirm that the algorithm performed on the GPU accelerates the computations by a factor of maximum 15.
In this paper, we propose a parallel algorithm for solving the graph isomorphism problem. Our goal is to construct a suitable vertex substitution or to prove the absence of such. The problem is solved for undirected g...
详细信息
In this paper, we propose a parallel algorithm for solving the graph isomorphism problem. Our goal is to construct a suitable vertex substitution or to prove the absence of such. The problem is solved for undirected graphs without loops and multiple edges;it is assumed that the graphs can be disconnected. The question of the existence or absence of an algorithm with a polynomial complexity is currently open. Therefore, as for any time-consuming task, the question arises about speeding up its solution by parallelizing the algorithm. The RPM_ParLib library developed by the author allows developing effective applications for parallel computing on a local network under the control of the runtime environment .NET Framework. Supporting a recursive-parallel programming style, such applications have the ability to generate parallel branches of computation directly during program execution and dynamically redistribute work between computing modules. The library can be used for applications written in any programming language supported by the .NET Framework. To solve our problem and conduct a numerical experiment, several applications in the C# language were developed. The purpose of the experiment was to study the acceleration achieved due to the recursive-parallel organization of calculations. Specially generated random regular graphs with various degrees of vertices were used as the initial data. A detailed description of the algorithm and experiment, as well as the results obtained, are also given.
Extraction of two-dimensional object locations using current techniques is a computationally intensive process. In this paper a parallel algorithm is presented that can specify the location of objects from edge streak...
详细信息
Extraction of two-dimensional object locations using current techniques is a computationally intensive process. In this paper a parallel algorithm is presented that can specify the location of objects from edge streaks produced by an edge operator. Best-first searches are carried out in a number of non-interacting and localized edge streak spaces. The outcome of each search is a hypothesis. Each edge streak votes for a single hypothesis; it may also take part in the formation of other hypotheses. A poll of the votes determined the stronger hypotheses. The algorithm can be used as a ❉ end to a visual pattern recognition system where features are extracted from the hypothesized object boundary or from the area localized by the hypothesized boundary. Experimental results from a biomedical domain are presented.
This paper presents a parallel algorithm for computing fixpoints of Galois connections induced by object-attribute relational data. The algorithm results as a parallelization of CbO (Kuznetsov 1999) in which we proces...
详细信息
This paper presents a parallel algorithm for computing fixpoints of Galois connections induced by object-attribute relational data. The algorithm results as a parallelization of CbO (Kuznetsov 1999) in which we process disjoint sets of fixpoints simultaneously. One of the distinctive features of the algorithm compared to other parallel algorithms is that it avoids synchronization which has positive impacts on its speed and implementation. We describe the parallel algorithm, prove its correctness, and analyze its asymptotic complexity. Furthermore, we focus on implementation issues, scalability of the algorithm, and provide an evaluation of its efficiency on various data sets.
A parallel algorithm is presented for generating n! permutations of n given items. The algorithm requires 0(n!) time steps and runs on a linear array in which each processing element (PE) contains only constant stora...
详细信息
A parallel algorithm is presented for generating n! permutations of n given items. The algorithm requires 0(n!) time steps and runs on a linear array in which each processing element (PE) contains only constant storage size and the elapsed time of a time step is independent of n. The basic idea for designing the algorithm is the use of iterative method and adding modulo operation. The essential part in the algorithm is that, during the execution of current cycle-n-permutations with a header, the next header for the next cycle-n-permutations must be prepared in time. In the linear array, since each PE is identical and executes the same program, it is suitable for very large-scale integration implementation.
暂无评论