The building block layout (BBL) becomes a more and more important approach for VLSI physical design. In this paper, based on the BBL floorplan problem, we discussed several parallel simulated annealing (SA) strategies...
详细信息
ISBN:
(纸本)0780386477
The building block layout (BBL) becomes a more and more important approach for VLSI physical design. In this paper, based on the BBL floorplan problem, we discussed several parallel simulated annealing (SA) strategies. Two parallel simulated annealing algorithms are realized, using sequence-pair (SP) as the representation. The parallel algorithm can be used either to speed up a problem or to achieve a higher accuracy of solutions to a problem. In this work we are interested in the latter goal. The results from the experiment indicate that the proposed method parallelizes the routine of state transitions in SA to obtain better states efficiently.
We define a new restricted type of algebraic circuit (AC) problems, called linear algebraic circuits (LACs), and consider the problem of obtaining efficient solutions for their parallel evaluation. The term "effi...
详细信息
We define a new restricted type of algebraic circuit (AC) problems, called linear algebraic circuits (LACs), and consider the problem of obtaining efficient solutions for their parallel evaluation. The term "efficiency" indicates that for some fixed number of processors, the algorithm can compete with the sequential evaluation of ACs. While parallel evaluation of general ACs is P-complete, there are restricted types of ACs such as prefix-sums circuits (ACs in the form of chains) and arithmetic expressions (ACs in the form of trees) for which efficient parallel evaluation algorithms exist. In this work, we consider a restricted type, other than chains or trees, of ACs where at least one input of each multiplication must be a constant. We show that LACs can be evaluated in n/18p(1/3) log(2)p steps, where n is the size of the circuit and p < n is the number of processors. Thus, for suitable values of p, this algorithm can compete with the sequential evaluation of LACs and achieve speedups of about p(1/3). It follows that LACs can be represented by a product of matrices. Thus, computing this product in parallel yields a possible evaluation algorithm for LACs. parallel computation of a product of matrices is also the main "engine" used in existing parallel evaluation algorithms for ACs. We show, via a lower bound, that such a naive solution based on matrix multiplication alone, is inherently inefficient regardless of the order in which the product of the matrices is computed and hence a more complex solution must be devised. Finally, useful applications for parallelizing sequential code are given. (C) 2003 Elsevier Inc. All rights reserved.
parallel prefix circuits are parallel algorithms performing the prefix operation for the combinational circuit model. The size of a prefix circuit is the number of operation nodes in the circuit, and the depth is the ...
详细信息
parallel prefix circuits are parallel algorithms performing the prefix operation for the combinational circuit model. The size of a prefix circuit is the number of operation nodes in the circuit, and the depth is the maximum level of operation nodes. A circuit with n inputs is depth-size optimal if its depth plus size equals 2n - 2. Smaller depth implies faster computation, while smaller size implies less power consumption, smaller VLSI area, and less cost. A circuit should have a small fan-out and small depth for it to be of practical use. In this paper, we present a new approach to easing the design of parallel prefix circuits, and construct a depth-size optimal parallel prefix circuit, named WE4, with fan-out 4. In many cases of n, WE4 has the smallest depth among all known depth-size optimal prefix circuits with bounded fan-out. (C) 2003 Elsevier Inc. All rights reserved.
We investigate the complexity of computing an optimal m-processor schedule for a system of n unit execution time tasks constrained by an interval order, subject to unit delays for interprocessor communication. Our res...
详细信息
We investigate the complexity of computing an optimal m-processor schedule for a system of n unit execution time tasks constrained by an interval order, subject to unit delays for interprocessor communication. Our results are twofold. First, we show that the problem can be solved by a sequential algorithm in linear time. Second, we present a fast parallel algorithm that solves the problem on the EREW PRAM in time O(log(2)n) using n(3)/log n processors and on the CRCW PRAM in time O(log n) using n(3) processors. This is the first NC algorithm for this problem. (C) 2003 Elsevier Inc. All rights reserved.
Summary form only given. Two parallel computational geometry algorithms are presented: One to calculate a discrete Voronoi diagram when one of the seeds is removed and one to calculate the convex hull in two dimension...
详细信息
Summary form only given. Two parallel computational geometry algorithms are presented: One to calculate a discrete Voronoi diagram when one of the seeds is removed and one to calculate the convex hull in two dimensions. The model of parallel processing used for these algorithms assumes the availability of one processor per pixel. The recent growth in the capacity of FPGAs and systems-on-a-chip makes these algorithms interesting again.
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyc...
详细信息
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.
This paper presents an efficient parallel algorithm for the shortest-path problem in interval graph for computing shortest-paths in a weighted interval graph that runs in O(n) time with n intervals in a graph. A linea...
详细信息
This paper presents an efficient parallel algorithm for the shortest-path problem in interval graph for computing shortest-paths in a weighted interval graph that runs in O(n) time with n intervals in a graph. A linear processor CRCW algorithm for determining the shortest-paths in an interval graphs is given.
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyc...
详细信息
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.
In this paper we study scheduling algorithms in WDM optical interconnects with recirculating buffering. The interconnect we consider has wavelength conversion capabilities. We focus on limited range wavelength convers...
详细信息
ISBN:
(纸本)0769521975
In this paper we study scheduling algorithms in WDM optical interconnects with recirculating buffering. The interconnect we consider has wavelength conversion capabilities. We focus on limited range wavelength conversion while considering full range wavelength conversion as a special case. We formalize the problem of maximizing throughput and minimizing packet delay in such an interconnect as a matching problem in a bipartite graph and give an optimal parallel algorithm that runs in O(Bk-2), as compared to O((N + B)(3)k(3)) time if directly applying other existing matching algorithms, where N is the number of input/output fibers, B is the number of fiber delay lines and k is the number of wavelengths per fiber.
In here we consider the problem of parallel execution of Join operation by a J2EE cluster. J2EE clusters are intended for coarse-grain distributed processing of multiple queries/business transactions over the Web. Thu...
详细信息
ISBN:
(纸本)0769522106
In here we consider the problem of parallel execution of Join operation by a J2EE cluster. J2EE clusters are intended for coarse-grain distributed processing of multiple queries/business transactions over the Web. Thus, the possiblity of using it J2EE cluster for fine-grain parallel computations (parallel Joins in our case) is intriguing and of practical interest. We have developed a new variant of the SFR algorithm for parallel computation of Cartesian Product in Join operations and proved its optimality in terms of communication/execution-time tradeoffs via a simple lower bound. Our experimental results show that despite the fact that J2EE is considered to be a platform that uses a complex interfaces and software entities, such as various types of Java beans, J2EE clusters can be efficiently used to execute Join operation in parallel.
暂无评论