Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in t...
详细信息
Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in their time complexities. These results improve the known bounds in the literature.","doi":"10.1109/34.24802","publicationTitle":"IEEE Transactions on Pattern Analysis and Machine Intelligence","startPage":"665","endPage":"669","rightsLink":"http://***/AppDispatchServlet?publisherName=ieee&publication=0162-8828&title=Efficient+parallel+algorithms+for+image+template+matching+on+hypercube+SIMD+machines&isbn=&publicationDate=June+1989&author=K.V.K.+Prasanna&ContentID=10.1109/34.24802&orderBeanReset=true&startPage=665&endPage=669&volumeNum=11&issueNum=6","displayPublicationTitle":"IEEE Transactions on Pattern Analysis and Machine Intelligence","pdfPath":"/iel1/34/938/***","keywords":[{"type":"IEEE Keywords","kwd":["parallel algorithms","Hypercubes","Algorithm design and analysis","Image processing","Technological innovation","Computer vision","Filtering","Image edge detection","Image registration","Object detection"]},{"type":"INSPEC: Controlled Indexing","kwd":["parallel algorithms","computational complexity","computerised pattern recognition","computerised picture processing"]},{"type":"INSPEC: Non-Controlled Indexing","kwd":["bounds","computerised picture processing","computerised pattern recognition","parallel algorithms","image template matching","hypercube SIMD machines","time complexities"]}],"allowComments":false,"pubLink":"/xpl/***?punumber=34","issueLink":"/xpl/***?isnumber=938","standardTitle":"Efficient parallel algorithms for image template matching on hypercube SIMD machines
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining t...
详细信息
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining their incident vertices. Given an n-vertex embedded planar graph with n greater than or equal to 3, a straight-line embedding on a grid of size (n - 2) x (n - 2) can be computed deterministically in O(log n log log n) time with n/log n log log n processors. If randomization is used, the complexity is improved to O(log n) expected time with the same optimal linear work. These algorithms run on a parallel random access machine that allows concurrent reads and concurrent writes of the shared memory and permits an arbitrary processor to succeed in case of a write conflict.
A model, or paradigm, for the development of parallel algorithms is proposed, an example of the paradigm presented, and algorithms developed by application of the technique displayed. The paradigm proposed is to creat...
详细信息
A model, or paradigm, for the development of parallel algorithms is proposed, an example of the paradigm presented, and algorithms developed by application of the technique displayed. The paradigm proposed is to create composite unit operations combining data movement between data structures with a conventional operation such as compare or add. The composite operation is based upon partitioning the data elements into 2 linear lists. Exchange of data between adjacent elements in each list is then combined with compares and adds to complete the composite operations. This composite operation can be implemented on several computational architectures. The algorithms developed all have the property of linear speed-up with the number of processing elements and include sorting, merging, selection among sets, set interconnection, set difference, subset testing, and string matching.
In VLSI circuits, signal delays play an important role in design, timing verification andsignal integrity checks. These delays are attributed to the presence of parasitic resistance,capacitance and inductance. With in...
详细信息
In VLSI circuits, signal delays play an important role in design, timing verification and
signal integrity checks. These delays are attributed to the presence of parasitic resistance,
capacitance and inductance. With increasing clock speed and reducing feature sizes, these
delays will be dominated by parasitic inductance. In the next generation VLSI circuits, with
more than millions of components and interconnect segments, fast and accurate inductance
estimation becomes a crucial step.
A generalized approach for inductance extraction requires the solution of a large,
dense, complex linear system that models mutual inductive effects among circuit elements.
Iterative methods are used to solve the system without explicit computation of the system
matrix itself. Fast hierarchical techniques are used to compute approximate matrix-vector
products with the dense system matrix in a matrix-free way. Due to unavailability of system
matrix, constructing a preconditioner to accelerate the convergence of the iterative method
becomes a challenging task.
This work presents a class of parallel algorithms for fast and accurate inductance extraction
of VLSI circuits. We use the solenoidal basis approach that converts the linear
system into a reduced system. The reduced system of equations is solved by a preconditioned
iterative solver that uses fast hierarchical methods to compute products with the
dense coefficient matrix. A GreenâÃÂÃÂs function based preconditioner is proposed that achieves
near-optimal convergence rates in several cases. By formulating the preconditioner as a
dense matrix similar to the coefficient matrix, we are able to use fast hierarchical methods for the preconditioning step as well. Experiments on a number of benchmark problems
highlight the efficient preconditioning scheme and its advantages over FastHenry.
To further reduce the solution time of the software, we have developed a parallel implementation.
The parallel software package is capable of anal
Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This pape...
详细信息
Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This paper analyzes energy consumption of parallel algorithms executed on a model of shared memory multicore processors. Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied. We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement, and the optimal number of cores to maximize the performance of a parallel algorithms for a specific problem size under a given energy budget. We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory. The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters. (C) 2011 Elsevier Inc. All rights reserved.
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the str...
详细信息
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the streamline simulation technology, the three-dimensional problem is split into a set of onedimensional subproblems, for which efficient parallel algorithms can be constructed. A new geometric formulation of the streamline simulation technology is presented, which is in a sense rigorous. It is demonstrated how the accuracy of the method can be estimated with the help of this formulation.
The authors give a parallel algorithm for finding vertex disjoint s1, t1, and s2, t2 paths in an undirected graph G. An important step in solving the general problem is solving the planar case. A new structural proper...
详细信息
The authors give a parallel algorithm for finding vertex disjoint s1, t1, and s2, t2 paths in an undirected graph G. An important step in solving the general problem is solving the planar case. A new structural property yields the parallelization, as well as a simpler linear-time sequential algorithm for this case. The algorithm is extended to the nonplanar case by giving a parallel algorithm for finding a Kuratowski homeomorph, and, in particular, a homeomorph of K3,3, in a nonplanar graph. The algorithms are processor efficient;in each case, the processor-time product of the algorithms is within a polylogarithmic factor of the best-known sequential algorithm.
Several new variants of the hierarchical basis (HB) preconditioner and Bramble, Pasciak, and Xu's multilevel preconditioner (BPX) are presented and studied, and new parallel algorithms are introduced for both meth...
详细信息
Several new variants of the hierarchical basis (HB) preconditioner and Bramble, Pasciak, and Xu's multilevel preconditioner (BPX) are presented and studied, and new parallel algorithms are introduced for both methodologies. It is shown that the performance of both preconditioners is improved by not solving the linear system associated with the initial (coarsest) grid, which need not be a trivial grid. For a class of problems consisting of anisotropic and ''piecewise anisotropic'' problems, a grid-generation strategy that relates the multilevel preconditioners to the underlying operator is developed, and it is demonstrated that this strategy is effective in terms of both iteration counts and elapsed time. In addition, a new parallel algorithm is presented for the BPX preconditioner that is as efficient as parallel algorithms for the HB preconditioner, requiring O(j) parallel steps for problems with j levels;and a parallel algorithm is presented for the HB preconditioner that requires O(inverted right perpendicular log2j inverted left perpendicular) parallel steps. Numerical results on a Connection Machine are reported.
Program visualisation can help make an algorithm understandable. Program visualisation is especially challenging in the area of parallel computations where many processors are executing simultaneously algorithms for p...
详细信息
ISBN:
(纸本)0818692065
Program visualisation can help make an algorithm understandable. Program visualisation is especially challenging in the area of parallel computations where many processors are executing simultaneously algorithms for parallel machines take advantage of the simultaneous activity of processors to perform operations very quickly?: As a result, these algorithms can be difficult to understand In this paper we describe a visualisation tool developed specifically for explaining algorithms written for single-instruction, multiple-data (SIMD) computers called torus computers. This tool helps its users to visualise the patterns of activities of the processors in the process of a computation.
The multidimensional assignment problem (MAP) is a combinatorial optimization problem arising in diverse applications such as computer vision and motion tracking. In the MAP, the objective is to match tuples of object...
详细信息
The multidimensional assignment problem (MAP) is a combinatorial optimization problem arising in diverse applications such as computer vision and motion tracking. In the MAP, the objective is to match tuples of objects with minimum total cost. Randomized parallel algorithms are proposed to solve MAPs appearing in multi-sensor multi-target applications. A parallel construction heuristic is described, together with some variations, as well as a parallel local search heuristic. Experimental results using the proposed algorithms are discussed. (C) 2003 IMACS. Published by Elsevier B.V. All rights reserved.
暂无评论