IP address lookup is a multifaceted problem because of increasing routing table sizes, increased traffic, higher speed links, and the migration to 128 bit IPv6 addresses. Routers bring into play the packet's desti...
详细信息
ISBN:
(纸本)0769525784
IP address lookup is a multifaceted problem because of increasing routing table sizes, increased traffic, higher speed links, and the migration to 128 bit IPv6 addresses. Routers bring into play the packet's destination address to determine for each packet the next hop. IP address lookup is complicated because it requires a longest matching prefix search. We propose a CREW based Multiprocessor Organization Lookup (CMOL) that uses contemporaneous sorting and searching. This technique further uses N+1-ary search to speedup the address lookup operations. Since multiple processors are used, the prefixes to be compared are reduced for each processor. The number of comparisons for each processor is reduced by a factor of n/N. The time complexity of searching and sorting has been reduced to O(log(N+1) (N+1)). The use of controlled prefix expansion further reduces the storage space.
Phylogenetic trees are commonly reconstructed based on hard optimization problems such as Maximum parsimony (MP) and Maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good so...
详细信息
ISBN:
(纸本)9781424400379
Phylogenetic trees are commonly reconstructed based on hard optimization problems such as Maximum parsimony (MP) and Maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small databases (up to a few thousand sequences) while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP and ML are NP-hard, application of such approaches do not scale large datasets. In this paper, we present a promising divide-and-conquer technique, the ZAZ method to construct an evolutionary tree. The algorithm has been implemented and tested against five large biological datasets ranging from 5000-7000 sequences and dramatic speedup with significant improvement in accuracy (better than 94%), in comparison to existing approaches, has been obtained. Thus, high quality reconstruction can be obtained for large datasets by using this approach. Moreover, we present here another approach to construct the tree dynamically (when sequences come dynamically with partial information). Finally combining the two approaches, we show parallel approaches to construct the tree when sequences are generated or obtained dynamically.
Recent processors utilize a variety of parallel processing technologies to boost its performance, and thus it is required that multimedia applications can be efficiently parallelized and can be easily implemented on s...
详细信息
ISBN:
(纸本)9781424400379
Recent processors utilize a variety of parallel processing technologies to boost its performance, and thus it is required that multimedia applications can be efficiently parallelized and can be easily implemented on such a processor with parallel processing features. We implemented parallel algorithms for VQ compression on a shared-memory parallel environment and evaluated the effectivess of the parallel algorithms. On such a system, we evaluate two parallel algorithms for the codebook generation of the VQ compression: parallel LBG and parallel tPNN and find that the parallel tPNN is superior in terms of space complexity, whereas the parallel LBG is superior in terms of time complexity and parallelism On the other hand, for a codeword search, the p-dist approach and the c-dist approach with the aggregation of synchronizations are suitable for a small codebook and the c-dist approach and the p-dist approach with the ADM or the strip-mining method are suitable for a large codebook. However, since the aggregation of synchronizations and the strip-mining method increases the space complexity of the algorithm, the p-dist approach and the c-dist approach are more suitable for a small codebook and for a large codebook, respectively.
We present an adaptive and cost-optimal parallel algorithm for generating t-ary trees represented by extended Ballot-sequences. This algorithm is a parallel version of a sequential generation algorithm that we discuss...
详细信息
We present an adaptive and cost-optimal parallel algorithm for generating t-ary trees represented by extended Ballot-sequences. This algorithm is a parallel version of a sequential generation algorithm that we discuss it first. Our parallel algorithm generates t-ary trees in B-order and can be executed on an EREW SM SIMD model.
As product designs have become more sophisticated, both the simulation models (e.g., finite element models) and the design optimization models have grown bigger. To keep pace with this increase in problem size, we pre...
详细信息
As product designs have become more sophisticated, both the simulation models (e.g., finite element models) and the design optimization models have grown bigger. To keep pace with this increase in problem size, we present and implement an optimization strategy that can run on a computing cluster with demonstrable efficiency. First, parallelism is implemented in the context of gradient calculations using divided differences. Then, parallelism is achieved in the context of both direction-finding and line-search steps. parallel direction finding improves the convergence rate as opposed to just cutting down the amount of arithmetic. A new algorithm based on method of feasible directions is discussed that obtains better optima and is also computationally faster. Implementation details regarding distribution of computing tasks to improve scalability and load balancing are presented. Numerical examples show the efficiency of the developed methodology on a relatively small computing cluster. Gains of about 7:1 have been obtainable using 16 processors on some test problems. Importantly, the framework presented can be developed by researchers using other gradient-based optimization codes on different computing platforms.
Computing the shortest path between a pair of points is an important problem in robotics and intelligent transportation systems. The ability to compute this path in real time is valuable in a number of situations. The...
详细信息
Computing the shortest path between a pair of points is an important problem in robotics and intelligent transportation systems. The ability to compute this path in real time is valuable in a number of situations. These include an automaton attempting to reach its destination minimizing chances of collision with obstacles. Previous work on shortest path is limited to sequential algorithms and parallel algorithms (for some versions of the problem) on general-purpose architectures. The authors develop a new hardware-efficient algorithm and present an FPGA implementation for shortest path calculation between an automaton's start point and its destination. Results of implementation in Xilinx Virtex FPGA are promising: the solution operates at approximately 68 MHz and the implementation for a graph with 58 nodes and 82 edges fits in one XC2V6000 device. (C) 2006 Elsevier B.V. All rights reserved.
This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-do...
详细信息
This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-down contour tracing method to generate the chain codes for the outer and inner object contours. A parallel algorithm that contains the chain code generating rules and operations needed is also described, and the algorithm is mapped onto a one-dimensional systolic array containing [(1)/(2)(N + 1)] processing elements (PEs) to devise this architecture. The architecture extracts the contours of objects and quickly generates the corresponding chain codes after the image data in all rows are inputted in a linear fashion. The total processing time for generating the chain codes in an N x N image is O(3N). By doing so, the real-time requirement is fulfilled and its execution time is independent of the image content. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of PEs;say two or more. The total execution time for an N x N image by employing a fixed number of PEs is N(N + 1)/M + 2(M - 1), when M is the fixed number of PEs. (C) 2002 Elsevier Science Inc. All rights reserved.
We present an O((log log N)(2)) -time algorithm for computing the distance transform of an N x N binary image. Our algorithm is designed for the common concurrent read concurrent write parallel random access machine (...
详细信息
We present an O((log log N)(2)) -time algorithm for computing the distance transform of an N x N binary image. Our algorithm is designed for the common concurrent read concurrent write parallel random access machine (CRCW PRAM) and requires O(N2+epsilon / log log N) processors, for any epsilon such that 0 < E < 1. Our algorithm is based on a novel deterministic sampling scheme and can be used for computing distance transforms for a very general class of distance functions. We also present a scalable version of our algorithm when the number of processors is available p(2+epsilon) / log log p for some p < N. In this case, our algorithm runs in O((N-2/p(2)) + (N/p) log log p + (log log p)(2)) time. This scalable algorithm is more practical since usually the number of available processors is much less than the size of the image.
In this paper, we propose two novel parallel algorithms for identifying all the basis polygons in an image formed by n straight line segments each of which is represented by its two end points. The first algorithm is ...
详细信息
In this paper, we propose two novel parallel algorithms for identifying all the basis polygons in an image formed by n straight line segments each of which is represented by its two end points. The first algorithm is designed to tackle the simple situation where all basis polygons are convex. The second one deals with the general situation when the basis polygons can be both convex and non-convex. These algorithms are based on an idea of traversal along the periphery of the basis polygons in a well-defined manner so that each of these needs only 0(n) time using an n x n processor array. Simulation results on various test input sets of intersecting line segments have also been found satisfactory. (c) 2005 Elsevier B.V. All rights reserved.
暂无评论