We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O(log n log n) time and n2/log n processo...
详细信息
ISBN:
(纸本)0897913701
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O(log n log n) time and n2/log n processors. We generalize this to obtain an O(log n log n) time algorithm using nd/log n processors for solving the problem in d dimensions. We also give a randomized algorithm for the EREW PRAM that constructs an arrangement of n lines on-line, in which each insertion is done in optimal O(log n) time using n/log n processors. Our algorithms develop new parallel data structures and new methods for traversing an arrangement.
For programmable high-speed digital signal processing devices the proper architecture has to be carefully selected according to the algorithms to be implemented. The appropriate number of arithmetic units depends on t...
详细信息
For programmable high-speed digital signal processing devices the proper architecture has to be carefully selected according to the algorithms to be implemented. The appropriate number of arithmetic units depends on the degree of parallelism of the signal processing algorithm. The question of parallelism of algorithms is discussed. For the efficient exploitation of a given signal processor hardware, an appropriate processor schedule is necessary. In two examples different approaches for multiprocessor architectures are discussed.
We propose the first optimal parallel algorithm computing arrangements of hyperplanes in Ed (d ≥ 2). The algorithm is randomized and computes the arrangement of n hyperplanes within expected logarithmic time on a CRC...
详细信息
ISBN:
(纸本)0897913701
We propose the first optimal parallel algorithm computing arrangements of hyperplanes in Ed (d ≥ 2). The algorithm is randomized and computes the arrangement of n hyperplanes within expected logarithmic time on a CRCW-PRAM with O(nd/log n) processors.
This paper investigates the parallel time and processor complexities of several searching problems involving Monge and Monge-composite arrays. We present array-searching algorithms for concurrent-read-concurrent-write...
详细信息
ISBN:
(纸本)0897913701
This paper investigates the parallel time and processor complexities of several searching problems involving Monge and Monge-composite arrays. We present array-searching algorithms for concurrent-read-concurrent-write (CRCW) PRAMs, concurrent-read-exclusive-write (CREW) PRAMs, hypercubes, cube-connected-cycles, and shuffle-exchange networks. All these algorithms run in optimal time, and their processor-time products are all within an O(lg n) factor of the worst-case sequential bounds. Several applications of these algorithms are also given. Two applications improve previous results substantially, and the others provide novel parallel algorithms for problems not previously considered.
We introduce a task scheduling model which is useful in the design and analysis of algorithms for small parallel machines. We prove that under our model, the overhead experienced in scheduling an n × n grid graph...
详细信息
ISBN:
(纸本)0897913701
We introduce a task scheduling model which is useful in the design and analysis of algorithms for small parallel machines. We prove that under our model, the overhead experienced in scheduling an n × n grid graph is O(log log n) for p processors, p ≥ 2. We also prove a matching lower bound of Ω(log log n) for p processors, p ≥ 2. We give an extension of the model to cover the case where the processors can have varying speed or are subject to delay.
The time required to compute any function of a collection of circular doubly linked lists on a CROW PRAM is shown to be at most a constant factor more than on a CREW PRAM, but this is not true for singly linked lists....
详细信息
ISBN:
(纸本)0897913701
The time required to compute any function of a collection of circular doubly linked lists on a CROW PRAM is shown to be at most a constant factor more than on a CREW PRAM, but this is not true for singly linked lists. A tight lower bound of Ω(log log n) for colouring an n node doubly linked list on a CROW PRAM using a constant number of colours is also obtained.
Many regular algorithms, suitable for VLSI implementation, are naturally described by sets of integer index vectors together with a rule that assigns a computation to each vector. Regular VLSI structures for such algo...
详细信息
ISBN:
(纸本)0897913701
Many regular algorithms, suitable for VLSI implementation, are naturally described by sets of integer index vectors together with a rule that assigns a computation to each vector. Regular VLSI structures for such algorithms can be found by mapping the index vectors to a discrete space-time with integer coordinates. If the scope is restricted to linear or affine mappings, then the minimization of the execution time for the VLSI implementation with respect to the space-time mapping is essentially an integer linear programming (ILP) problem. If the entries in the vector describing the time function must be integers, ILP techniques can be applied directly. There are, however, index sets that allow space-time mappings with rational, non-integral entries. In such cases, ILP will not consider all possible affine time functions and an optimal solution may go unnoticed. In this paper we give sufficient conditions on the index set for when integer time functions are allowed only. We also give a general algorithm to find a 'preconditioning' affine transformation of the index set, such that the transformed index set allows only integer time functions. ILP methods can then be used to find time-optimal architectures for the transformed algorithm. This considerably extends the class of algorithms for which time-optimal VLSI structures can be found.
The Program Dependence Graph (PDG), which represents the data and the control dependences of a program is considered. Attention is limited to the subgraph of the PDG which contains only the control dependence edges. T...
详细信息
ISBN:
(纸本)0897913701
The Program Dependence Graph (PDG), which represents the data and the control dependences of a program is considered. Attention is limited to the subgraph of the PDG which contains only the control dependence edges. This subgraph is called the Control Dependence Graph (CDG). The authors formalize what it means for a CDG to have a corresponding sequential version and characterize the conditions under which there is such a corresponding sequentialization not requiring duplication.
In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent...
详细信息
ISBN:
(纸本)0897913701
In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent Symmetry. We describe a parallel implementation of the 'shuffling' algorithm for generating a random permutation. If the hardware operates in a fair manner, this algorithm generates a fully random permutation. However, if the machine resolves contention in a malicious manner, then the algorithm does not generate permutations uniformly. We give almost tight bounds on the degree that an adversary can reduce the randomness. We also discuss the cost of locking data in the algorithm and present a method of generating random permutations with substantially reduced locking cost.
We use O(log2 n) parallel arithmetic steps and n2 processors to compute the least-squares solution x = A+b to a linear system of equations, Ax = b, given a g × h matrix A and a vector b, both filled with integers...
详细信息
ISBN:
(纸本)0897913701
We use O(log2 n) parallel arithmetic steps and n2 processors to compute the least-squares solution x = A+b to a linear system of equations, Ax = b, given a g × h matrix A and a vector b, both filled with integers or with rational numbers, provided that g + h ≤ 2n and that A is given with its displacement generator of length r = O(1) and thus has displacement rank O(1). For a vector b and for a general p × q matrix A (with p + q ≤ n), we compute A+ and A+b by using O(log2 n) parallel arithmetic steps and n2.851 processors, and we may also compute A+b by using O(n2.376) arithmetic operations.
暂无评论