Given two strings A and B of lengths n(a) and n(b), respectively, the All-substrings Longest Common Subsequence (ALCS) problem obtains, for any substring B' of B, the length of the longest string that is a subsequ...
详细信息
Given two strings A and B of lengths n(a) and n(b), respectively, the All-substrings Longest Common Subsequence (ALCS) problem obtains, for any substring B' of B, the length of the longest string that is a subsequence of both A and B'. The sequential algorithm for this problem takes O(n(a)n(b)) time and O(n(b)) space. We present a parallel algorithm for the ALCS problem on the Coarse-Grained Multicomputer (BSP/CGM) model with p < root n(a) processors, that takes O(n(a)n(b)/p) time, O(log p) communication rounds and O(n(b)root n(a)) space per processor. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) problem that finds the longest string ( and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm in the literature for the LCS and ALCS problems.
The paper presents a sublinear time parallel algorithm for computing the greatest common divisor of two integers. Its running time on two n bit integers is $O({{n\log \log n} / {\log n}})$ using the weak concurrent re...
详细信息
The paper presents a sublinear time parallel algorithm for computing the greatest common divisor of two integers. Its running time on two n bit integers is $O({{n\log \log n} / {\log n}})$ using the weak concurrent read concurrent write model.
A parallel algorithm for finding a maximal matching in an undirected graph is presented. Such a matching is maximal if it is not properly contained in any other matching. The model of computation is the CRCW-PRAM. ...
详细信息
A parallel algorithm for finding a maximal matching in an undirected graph is presented. Such a matching is maximal if it is not properly contained in any other matching. The model of computation is the CRCW-PRAM. This algorithm is a great improvement over the 2 previous algorithms offered by Karp and Wigderson (1984) and Lev (1980). The latter, though having a better performance than the former, is suitable only for bipartite graphs; the algorithm developed here applies to general graphs. It also employs a new technique for finding Euler circuits in graphs.
A planar fuel cell stack is a layered structure consisting of repeated modules-membrane electrode assemblies (MEAs) separated by bipolar plates (BPs). Generally, the distributions of voltage and temperature over the B...
详细信息
A planar fuel cell stack is a layered structure consisting of repeated modules-membrane electrode assemblies (MEAs) separated by bipolar plates (BPs). Generally, the distributions of voltage and temperature over the BP volume are described by three-dimensional Laplace equations. However, the thickness of a BP is much smaller than its in-plane size. This enables us to reduce a three-dimensional Laplace equation to a two-dimensional Poisson equation and to develop an efficient parallel algorithm for stack simulation. In the simplest variant, each individual module "MEA + BP" is solved on a separate processor. Typically, the number of cells in a stack is 10 to 100;this algorithm is thus most suitable for small- and medium-scale parallel machines. A much faster method is to cut every module into a number of "stripes" and to solve each stripe on a separate processor. Numerical tests with this method show that with eight stripes per module the solution of the electric problem is obtained roughly ten times faster than expected. Evidently, the striping algorithm provides much faster convergence of the iterative Poisson solver. The effect is presumably due to fast damping of high-frequency modes of potential in the iteration process. This algorithm may open up possibilities for fast simulation of real 100-cell stacks using massively parallel machines.
Graphics processing units (GPUs) have attracted a lot of attention due to their cost-effective and enormous power for massive data parallel computing. In this paper, we propose a novel parallel algorithm for exact pat...
详细信息
Graphics processing units (GPUs) have attracted a lot of attention due to their cost-effective and enormous power for massive data parallel computing. In this paper, we propose a novel parallel algorithm for exact pattern matching on GPUs. A traditional exact pattern matching algorithm matches multiple patterns simultaneously by traversing a special state machine called an Aho-Corasick machine. Considering the particular parallel architecture of GPUs, in this paper, we first propose an efficient state machine on which we perform very efficient parallel algorithms. Also, several techniques are introduced to do optimization on GPUs, including reducing global memory transactions of input buffer, reducing latency of transition table lookup, eliminating output table accesses, avoiding bank-conflict of shared memory, coalescing writes to global memory, and enhancing data transmission via peripheral component interconnect express. We evaluate the performance of the proposed algorithm using attack patterns from Snort V2.8 and input streams from DEFCON. The experimental results show that the proposed algorithm performed on NVIDIA GPUs achieves up to 143.16-Gbps throughput, 14.74 times faster than the Aho-Corasick algorithm implemented on a 3.06-GHz quad-core CPU with the OpenMP. The library of the proposed algorithm is publically accessible through Google Code.
In manufacturing memory chips, Redundant Random Access Memory (RRAM) technology has been widely used because it not only provides repair of faulty cells but also enhances the production yield. RRAM has several rows an...
详细信息
In manufacturing memory chips, Redundant Random Access Memory (RRAM) technology has been widely used because it not only provides repair of faulty cells but also enhances the production yield. RRAM has several rows and columns of spare memory cells which are used to replace the faulty cells. The goal of our algorithm is to find a spare allocation which repairs all the faulty cells in the given faulty-cell map. The parallel algorithm requires 2n processing elements for the n x n faulty-cell map problem. The algorithm is verified by many simulation runs. Under the simulation the algorithm finds one of the near-optimum solutions in a nearly constant time with O(n) processors. The simulation results show the consistency of our algorithm. The algorithm can be easily extended for solving rectangular or other shapes of fault map problems.
A simple and efficient algorithm for the bandwidth reduction of sparse symmetric matrices is proposed. It involves column-row permutations and is well-suited to map onto the linear array topology of the SIMD architect...
详细信息
A simple and efficient algorithm for the bandwidth reduction of sparse symmetric matrices is proposed. It involves column-row permutations and is well-suited to map onto the linear array topology of the SIMD architectures. The efficiency of the algorithm is compared with the other existing algorithms. The interconnectivity and the memory requirement of the linear array are discussed and the complexity of its layout area is derived. The parallel version of the algorithm mapped onto the linear array is then introduced and is explained with the help of an example. The optimality of the parallel algorithm is proved by deriving the time complexities of the algorithm on a single processor and the linear array.
A parallel algorithm for analyzing activity networks is presented. The model of computation is a shared memory single-instruction-stream, multiple-data-stream computer that does not allow read or write conflicts. The ...
详细信息
A parallel algorithm for analyzing activity networks is presented. The model of computation is a shared memory single-instruction-stream, multiple-data-stream computer that does not allow read or write conflicts. The algorithm is adaptive in the sense that it takes O( n 1+ h ) time with n 1− h processors for an activity network with n events (nodes), where h (0⩽ h ⩽1) depends on the number of available processors.
We propose a parallel algorithm for mining non-redundant recurrent rules from a sequence database. Recurrent rules, proposed by Lo et al. [1], can express "Whenever a series of precedent events occurs, eventually...
详细信息
We propose a parallel algorithm for mining non-redundant recurrent rules from a sequence database. Recurrent rules, proposed by Lo et al. [1], can express "Whenever a series of precedent events occurs, eventually a series of consequent events occurs," and they have shown the usefulness of recurrent rules in various domains, including software specification and verification. Although some algorithms such as NR3 have been proposed, mining non-redundant recurrent rules still requires considerable processing time. To reduce the computation cost, we present a parallel approach to mining non-redundant recurrent rules, which fully utilizes the task-parallelism in NR3. We also give some experimental results, which show the effectiveness of our proposed method.
In this paper, we offer an efficient parallel algorithm for solving the NP-complete Knapsack Problem in its basic, so-called 0-1 variant. To find its exact solution, algorithms belonging to the category branch-and-bou...
详细信息
In this paper, we offer an efficient parallel algorithm for solving the NP-complete Knapsack Problem in its basic, so-called 0-1 variant. To find its exact solution, algorithms belonging to the category branch-and-bound methods have long been used. To speed up the solving with varying degrees of efficiency, various options for parallelizing computations are also used. We propose here an algorithm for solving the problem, based on the paradigm of recursive-parallel computations. We consider it suited well for problems of this kind, when it is difficult to immediately break up the computations into a sufficient number of subtasks that are comparable in complexity, since they appear dynamically at run time. We used the RPM_ParLib library, developed by the author, as the main tool to program the algorithm. This library allows us to develop effective applications for parallel computing on a local network in the .NET Framework. Such applications have the ability to generate parallel branches of computation directly during program execution and dynamically redistribute work between computing modules. Any language with support for the .NET Framework can be used as a programming language in conjunction with this library. For our experiments, we developed some C# applications using this library. The main purpose of these experiments was to study the acceleration achieved by recursive-parallel computing. A detailed description of the algorithm and its testing, as well as the results obtained, are also given in the paper.
暂无评论