The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyc...
详细信息
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.
We study the noncommutative rank problem, ncRANK, of computing the rank of matrices with linear entries in n noncommuting variables and the problem of noncommutative Rational Identity Testing, RIT, which is to decide ...
详细信息
Path checking, the special case of the model checking problem where the model under consideration is a single path, plays an important role in monitoring, testing, and verification. We prove that for linear-time tempo...
详细信息
Path checking, the special case of the model checking problem where the model under consideration is a single path, plays an important role in monitoring, testing, and verification. We prove that for linear-time temporal logic (LTL), path checking can be efficiently parallelized. In addition to the core logic, we consider the extensions of LTL with bounded-future (BLTL) and past-time (LTL+Past) operators. Even though both extensions improve the succinctness of the logic exponentially, path checking remains efficiently parallelizable: Our algorithm for LTL, LTL+Past, and BLTL+Past is in AC(1) (logDCFL) subset of NC.
We give an overview of the computational complexity of linear and mesh-connected cellular and iterative arrays with respect to well known models of sequential and parallel computation. We discuss one-way communication...
详细信息
We give an overview of the computational complexity of linear and mesh-connected cellular and iterative arrays with respect to well known models of sequential and parallel computation. We discuss one-way communication versus two-way communication, serial input versus parallel input, and space-efficient simulations. In particular, we look at the parallel complexity of cellular arrays in terms of the PRAM theory and its implications, e.g., to the parallel complexity of recurrence equations and loops. We also point out some important and fundamental open problems that remain unresolved. Next, we investigate the solvability of some reachability and safety problems concerning machines operating in parallel and cite some possible applications. Finally, we briefly discuss the complexity of the "commutativity analysis" technique that is used in the areas of parallel computing and parallelizing compilers.
Power circuits have been introduced in 2012 by Myasnikov, Ushakov and Won as a data structure for non-elementarily compressed integers supporting the arithmetic operations addition and (x, y) bar right arrow. x center...
详细信息
Power circuits have been introduced in 2012 by Myasnikov, Ushakov and Won as a data structure for non-elementarily compressed integers supporting the arithmetic operations addition and (x, y) bar right arrow. x center dot 2(y). The same authors applied power circuits to give a polynomial time solution to the word problem of the Baumslag group, which has a non-elementary Dehn function. In this work, we examine power circuits and the word problem of the Baumslag group under parallel complexity aspects. In particular, we establish that the word problem of the Baumslag group can be solved in NC-even though one of the essential steps is to compare two integers given by power circuits and this, in general, is shown to be P-complete. The key observation is that the depth of the occurring power circuits is logarithmic and such power circuits can be compared in NC.
To reduce a graph problem to its planar version, a standard technique is to replace crossings in a drawing of the input graph by planarizing gadgets. We show unconditionally that such a reduction is not possible for t...
详细信息
To reduce a graph problem to its planar version, a standard technique is to replace crossings in a drawing of the input graph by planarizing gadgets. We show unconditionally that such a reduction is not possible for the perfect matching problem and also extend this to some other problems related to perfect matching. We further show that there is no planarizing gadget for the Hamiltonian cycle problem.
We show that the bipartite perfect matching problem is in quasi-NC2. That is, it has uniform circuits of quasi-polynomial size n(O(log n)), and O(log(2) n) depth. Previously, only an exponential upper bound was known ...
详细信息
We show that the bipartite perfect matching problem is in quasi-NC2. That is, it has uniform circuits of quasi-polynomial size n(O(log n)), and O(log(2) n) depth. Previously, only an exponential upper bound was known on the size of such circuits with poly-logarithmic depth. We obtain our result by an almost complete derandomization of the famous Isolation Lemma when applied to yield an efficient randomized parallel algorithm for the bipartite perfect matching problem.
In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building an efficient schedule. By restricting...
详细信息
ISBN:
(纸本)9780818685910
In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building an efficient schedule. By restricting accesses in shared memory, we show that such a data dependency graph can be computed on-line on a distributed architecture. The overhead introduced is bounded with respect to the parallelism expressed by the user: each basic computation corresponds to a user-defined task, each data-dependency to a user-defined data structure. We introduce a language named Athapascan-1 that allows built a graph of dependencies from a strong typing of shared memory accesses. We detail compilation and implementation of the language. Besides, the performance of a code (parallel time, communication and arithmetic works, memory space) are defined from a cost model without the need of a machine model. We exhibit efficient scheduling with respect to these costs on theoretical machine models.
暂无评论