The article presents a mathematical research which develops and examines the properties of high-performance smoothed particles hydrodynamics (SPH)-based algorithms for solving continuum mechanics problems in central p...
详细信息
The article presents a mathematical research which develops and examines the properties of high-performance smoothed particles hydrodynamics (SPH)-based algorithms for solving continuum mechanics problems in central processing unit (CPU) and hybrid architectures. Details include the advantages of using SPH, estimates of errors in the case of SPH-type approximations, and the selection of parameters. Also mentioned are parallel algorithms for SPH, including the computation time and acceleration.
We present a sequential and a parallel algorithm to solve the maximum-weight independent set problem on a permutation graph. Our input data is a permutation pi = [pi1, pi2,...,pi(n)] and the weights of these vertices....
详细信息
We present a sequential and a parallel algorithm to solve the maximum-weight independent set problem on a permutation graph. Our input data is a permutation pi = [pi1, pi2,...,pi(n)] and the weights of these vertices. our sequential algorithm takes O(n log log n) time and our parallel algorithm is of O(log2n) time and O(n3/(log n)) processors under the CREW PRAM model.
Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in t...
详细信息
Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in their time complexities. These results improve the known bounds in the literature.","doi":"10.1109/34.24802","publicationTitle":"IEEE Transactions on Pattern Analysis and Machine Intelligence","startPage":"665","endPage":"669","rightsLink":"http://***/AppDispatchServlet?publisherName=ieee&publication=0162-8828&title=Efficient+parallel+algorithms+for+image+template+matching+on+hypercube+SIMD+machines&isbn=&publicationDate=June+1989&author=K.V.K.+Prasanna&ContentID=10.1109/34.24802&orderBeanReset=true&startPage=665&endPage=669&volumeNum=11&issueNum=6","displayPublicationTitle":"IEEE Transactions on Pattern Analysis and Machine Intelligence","pdfPath":"/iel1/34/938/***","keywords":[{"type":"IEEE Keywords","kwd":["parallel algorithms","Hypercubes","Algorithm design and analysis","Image processing","Technological innovation","Computer vision","Filtering","Image edge detection","Image registration","Object detection"]},{"type":"INSPEC: Controlled Indexing","kwd":["parallel algorithms","computational complexity","computerised pattern recognition","computerised picture processing"]},{"type":"INSPEC: Non-Controlled Indexing","kwd":["bounds","computerised picture processing","computerised pattern recognition","parallel algorithms","image template matching","hypercube SIMD machines","time complexities"]}],"allowComments":false,"pubLink":"/xpl/***?punumber=34","issueLink":"/xpl/***?isnumber=938","standardTitle":"Efficient parallel algorithms for image template matching on hypercube SIMD machines
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining t...
详细信息
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining their incident vertices. Given an n-vertex embedded planar graph with n greater than or equal to 3, a straight-line embedding on a grid of size (n - 2) x (n - 2) can be computed deterministically in O(log n log log n) time with n/log n log log n processors. If randomization is used, the complexity is improved to O(log n) expected time with the same optimal linear work. These algorithms run on a parallel random access machine that allows concurrent reads and concurrent writes of the shared memory and permits an arbitrary processor to succeed in case of a write conflict.
A model, or paradigm, for the development of parallel algorithms is proposed, an example of the paradigm presented, and algorithms developed by application of the technique displayed. The paradigm proposed is to creat...
详细信息
A model, or paradigm, for the development of parallel algorithms is proposed, an example of the paradigm presented, and algorithms developed by application of the technique displayed. The paradigm proposed is to create composite unit operations combining data movement between data structures with a conventional operation such as compare or add. The composite operation is based upon partitioning the data elements into 2 linear lists. Exchange of data between adjacent elements in each list is then combined with compares and adds to complete the composite operations. This composite operation can be implemented on several computational architectures. The algorithms developed all have the property of linear speed-up with the number of processing elements and include sorting, merging, selection among sets, set interconnection, set difference, subset testing, and string matching.
In VLSI circuits, signal delays play an important role in design, timing verification andsignal integrity checks. These delays are attributed to the presence of parasitic resistance,capacitance and inductance. With in...
详细信息
In VLSI circuits, signal delays play an important role in design, timing verification and
signal integrity checks. These delays are attributed to the presence of parasitic resistance,
capacitance and inductance. With increasing clock speed and reducing feature sizes, these
delays will be dominated by parasitic inductance. In the next generation VLSI circuits, with
more than millions of components and interconnect segments, fast and accurate inductance
estimation becomes a crucial step.
A generalized approach for inductance extraction requires the solution of a large,
dense, complex linear system that models mutual inductive effects among circuit elements.
Iterative methods are used to solve the system without explicit computation of the system
matrix itself. Fast hierarchical techniques are used to compute approximate matrix-vector
products with the dense system matrix in a matrix-free way. Due to unavailability of system
matrix, constructing a preconditioner to accelerate the convergence of the iterative method
becomes a challenging task.
This work presents a class of parallel algorithms for fast and accurate inductance extraction
of VLSI circuits. We use the solenoidal basis approach that converts the linear
system into a reduced system. The reduced system of equations is solved by a preconditioned
iterative solver that uses fast hierarchical methods to compute products with the
dense coefficient matrix. A GreenâÃÂÃÂs function based preconditioner is proposed that achieves
near-optimal convergence rates in several cases. By formulating the preconditioner as a
dense matrix similar to the coefficient matrix, we are able to use fast hierarchical methods for the preconditioning step as well. Experiments on a number of benchmark problems
highlight the efficient preconditioning scheme and its advantages over FastHenry.
To further reduce the solution time of the software, we have developed a parallel implementation.
The parallel software package is capable of anal
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the str...
详细信息
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the streamline simulation technology, the three-dimensional problem is split into a set of onedimensional subproblems, for which efficient parallel algorithms can be constructed. A new geometric formulation of the streamline simulation technology is presented, which is in a sense rigorous. It is demonstrated how the accuracy of the method can be estimated with the help of this formulation.
The authors give a parallel algorithm for finding vertex disjoint s1, t1, and s2, t2 paths in an undirected graph G. An important step in solving the general problem is solving the planar case. A new structural proper...
详细信息
The authors give a parallel algorithm for finding vertex disjoint s1, t1, and s2, t2 paths in an undirected graph G. An important step in solving the general problem is solving the planar case. A new structural property yields the parallelization, as well as a simpler linear-time sequential algorithm for this case. The algorithm is extended to the nonplanar case by giving a parallel algorithm for finding a Kuratowski homeomorph, and, in particular, a homeomorph of K3,3, in a nonplanar graph. The algorithms are processor efficient;in each case, the processor-time product of the algorithms is within a polylogarithmic factor of the best-known sequential algorithm.
Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This pape...
详细信息
Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This paper analyzes energy consumption of parallel algorithms executed on a model of shared memory multicore processors. Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied. We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement, and the optimal number of cores to maximize the performance of a parallel algorithms for a specific problem size under a given energy budget. We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory. The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters. (C) 2011 Elsevier Inc. All rights reserved.
Several new variants of the hierarchical basis (HB) preconditioner and Bramble, Pasciak, and Xu's multilevel preconditioner (BPX) are presented and studied, and new parallel algorithms are introduced for both meth...
详细信息
Several new variants of the hierarchical basis (HB) preconditioner and Bramble, Pasciak, and Xu's multilevel preconditioner (BPX) are presented and studied, and new parallel algorithms are introduced for both methodologies. It is shown that the performance of both preconditioners is improved by not solving the linear system associated with the initial (coarsest) grid, which need not be a trivial grid. For a class of problems consisting of anisotropic and ''piecewise anisotropic'' problems, a grid-generation strategy that relates the multilevel preconditioners to the underlying operator is developed, and it is demonstrated that this strategy is effective in terms of both iteration counts and elapsed time. In addition, a new parallel algorithm is presented for the BPX preconditioner that is as efficient as parallel algorithms for the HB preconditioner, requiring O(j) parallel steps for problems with j levels;and a parallel algorithm is presented for the HB preconditioner that requires O(inverted right perpendicular log2j inverted left perpendicular) parallel steps. Numerical results on a Connection Machine are reported.
暂无评论