Many problems on sequences and on special kinds of graphs involve the computation of longest chains passing points in the plane. Given a set S of n points in the plane, we consider the problem of computing the matrix ...
详细信息
Many problems on sequences and on special kinds of graphs involve the computation of longest chains passing points in the plane. Given a set S of n points in the plane, we consider the problem of computing the matrix of longest chain lengths between all pairs of points in S, and the matrix of ''parent'' pointers that describes the n longest chain trees. We present a simple sequential algorithm for computing these matrices. Our algorithm runs in O(n(2)) time, and hence is optimal. We also present a rather involved parallel algorithm that computes these matrices in O((log n)(2)) time using O(n(2)/log n) processors in the CREW PRAM model. These matrices enable us to report, in O(1) time, the length of a longest chain between any two points in S by using one processor, and the actual chain by using k processors, where k is the number of points of S on that chain. The space complexity of the algorithms is O(n(2)).
We present an efficient and scalable coarse grained multicomputer (CGM) coloring algorithm that colors a graph G with at most Delta + 1 colors where A is the maximum degree in G. This algorithm is given in two variant...
详细信息
We present an efficient and scalable coarse grained multicomputer (CGM) coloring algorithm that colors a graph G with at most Delta + 1 colors where A is the maximum degree in G. This algorithm is given in two variants: randomized and deterministic. We show that on a p-processor CGM model the proposed algorithms require a parallel time of O(\G\/p) and a total work and overall communication cost of O(\G\). These bounds correspond to the average case for the randomized version and to the worst case for the deterministic variant. (C) 2003 Elsevier B.V. All rights reserved.
parallel algorithms for the solution of linear-quadratic optimal control problems are described. The algorithms are based on a straightforward decomposition of the domain of the problem, and are related to multiple sh...
详细信息
parallel algorithms for the solution of linear-quadratic optimal control problems are described. The algorithms are based on a straightforward decomposition of the domain of the problem, and are related to multiple shooting methods for two-point boundary value problems. Their arithmetic cost is approximately twice that of the serial dynamic programming approach;however, they have the advantage that they can be efficiently implemented on a wide variety of parallel architectures. Extension to the case in which there are box constraints on the controls is simple. The algorithms can be used to solve linear-quadratic subproblems arising from the application of Newton's method or two-metric gradient projection methods to nonlinear problems.
An efficient parallel implementation of an algorithm for recursive least squares computations based upon the covariance updating method has been developed. The target architecture is a distributed-memory multiprocesso...
详细信息
An efficient parallel implementation of an algorithm for recursive least squares computations based upon the covariance updating method has been developed. The target architecture is a distributed-memory multiprocessor, and test results on an Intel iPSC/2 hypercube demonstrate the parallel efficiency of the algorithm. A 64-node system is measured to execute the algorithm over 48 times as fast as a single processor for the largest problem that fits on a single node (fixed-size speedup). Moreover, the computation times increase only slightly with an increase in the number of processors when the problem size per processor remains constant. Applications include robust regression in statistics and modification of the Hessian matrix in optimization, but the primary motivation for this work is the need for fast recursive least squares computations in adaptive filtering methods in signal processing.
In this paper, a robust, iterative algorithm is presented for computing Karush–Kuhn–Tucker (KKT) points of nonlinear programs. This algorithm is a variation of the NE/SQP method for solving the nonlinear complementa...
详细信息
In this paper, a robust, iterative algorithm is presented for computing Karush–Kuhn–Tucker (KKT) points of nonlinear programs. This algorithm is a variation of the NE/SQP method for solving the nonlinear complementarily problem; it makes use of a merit function that combines the original objective function of the nonlinear program with a residual function of the KKT conditions formulated as a system of nonsmooth equations. Global convergence and a Q-quadratic rate of convergence of the algorithm are established under some standard constraint qualifications in nonlinear programming theory, but without the strict complementarily condition. parallel implementations of the algorithm are discussed.
The control of robot manipulators is a complex problem because the differential equation that defines the system is nonlinear and coupled. In several control problems such as linear quadratic optimal control, it is ne...
详细信息
The control of robot manipulators is a complex problem because the differential equation that defines the system is nonlinear and coupled. In several control problems such as linear quadratic optimal control, it is necessary to work with a first order approximation of nonlinear equations. To obtain this approximation, the Lagrange-Euler equations have been used based on general theorems of mechanics. The linearization must be computed on-line, and this is the reason why parallel computing has been used. The parallel algorithms have been implemented on a low cost parallel platform based on a PC plus a board with transputers.
In this paper, the author presents an optimised parallel implementation of a flexible maximum a-posteriori decoder for synchronisation error correcting codes, supporting a very wide range of code sizes and channel con...
详细信息
In this paper, the author presents an optimised parallel implementation of a flexible maximum a-posteriori decoder for synchronisation error correcting codes, supporting a very wide range of code sizes and channel conditions. On mid-range GPUs the author demonstrates decoding speedups of more than two orders of magnitude over a central processing unit implementation of the same optimised algorithm, and more than an order of magnitude over the author's earlier GPU implementation. The prominent challenge is to maintain high parallelisation efficiency over a wide range of code sizes and channel conditions, and different execution hardware. The author ensures this with a dynamic strategy for choosing parallel execution parameters at run-time. They also present a variant that trades off some decoding speed for significantly reduced memory requirement, with no loss to the decoder's error correction performance. The increased throughput of their implementation and its ability to work with less memory allow us to analyse larger codes and poorer channel conditions, and makes practical use of such codes more feasible.
Given two sorted arrays A = (a1, a2,....., a(n1)) and B = (b1, b2,....,b(n2)) where their elements are drawn from a linear range in n and n = Max(n (1), n (2)). The merging of two sorted arrays is one of the fundament...
详细信息
Given two sorted arrays A = (a1, a2,....., a(n1)) and B = (b1, b2,....,b(n2)) where their elements are drawn from a linear range in n and n = Max(n (1), n (2)). The merging of two sorted arrays is one of the fundamental problems in computer science. The main contribution of this work is to give a new merging algorithm on a EREW PRAM. The algorithm is cost optimal, deterministic and simple. The algorithm uses n/logn processors and O(n) storage. We also give the conditions that make the algorithm run in a constant time on a EREW PRAM.
The architecture of a multicomputer system with switchable main memory modules (SM3) is presented. This architecture supports the efficient execution of parallel algorithms for nonnumeric processing by 1) allowing the...
详细信息
The architecture of a multicomputer system with switchable main memory modules (SM3) is presented. This architecture supports the efficient execution of parallel algorithms for nonnumeric processing by 1) allowing the sharing of switchable main memory modules between computers, 2) supporting dynamic partitioning of the system, and 3) employing global control lines to efficiently support interprocessor communication. Data transfer time is reduced to memory switching time by allowing some main memory modules to be switched between processors. Dynamic partitioning gives a common bus system the capability of an MIMD machine while performing global operations. The global control lines establish a quick and efficient high-level protocol in the system. The network is supervised by a control computer which oversees network partitioning and other global functions. The hardware involved is quite simple and the network is easily extensible. A simulation study using discrete event simulation techniques has been carried out and the results of the study are presented. The architecture of this system is compared to those of conventional local area networks and shared-memory systems in order to establish the distinct nature and characteristics of a multicomputer system based on the SM3 concept.
In the group mutual exclusion problem [Y Joung, Asynchronous group mutual exclusion, Distrib. Comput. 13 (2000) 189], which generalizes mutual exclusion [E. Dijkstra, Solution of a problem in concurrent programming co...
详细信息
In the group mutual exclusion problem [Y Joung, Asynchronous group mutual exclusion, Distrib. Comput. 13 (2000) 189], which generalizes mutual exclusion [E. Dijkstra, Solution of a problem in concurrent programming control, Comm. ACM 8 (9) (1965) 569], a process chooses a session when it requests entry into the Critical Section. A group mutual exclusion algorithm must ensure that the mutual exclusion property holds: if two processes are in the Critical Section at the same time, then they request the same session. In addition to mutual exclusion, lockout freedom, bounded exit, and concurrent entering are basic properties that are desirable in any group mutual exclusion algorithm. Hadzilacos in [Proc. 20th Annual Symp. on Principles of Distributed Computing, 2001, pp. 100-106] first introduced a fairness condition, called first-come-first-served (FCFS), for group mutual exclusion. The only known FCFS group mutual exclusion algorithm is due to Hadzilacos [Proc. 20th Annual Symp. on Principles of Distributed Computing, 2001, pp. 100-106], and requires Theta(N-2) bounded shared registers, where N is the number of processes. We present a FCFS group mutual exclusion algorithm that uses only Theta(N) bounded shared registers. (The existence of such an algorithm was posed as an open problem by Hadzilacos.) (c) 2005 Elsevier B.V. All rights reserved.
暂无评论