The Lovász Local Lemma (LLL) shows that, for a collection of "bad" events B in a probability space which are not too likely and not too interdependent, there is a positive probability that no events in ...
详细信息
Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with work by Luby (19...
详细信息
ISBN:
(纸本)9781510836358
Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with work by Luby (1988) and continuing with Berger & Rompel (1991) and Chari et al. (1994), showed that these techniques can be combined to give deterministic parallel algorithms for combinatorial optimization problems involving sums of w-juntas. We improve these algorithms through derandomized variable partitioning. This reduces the processor complexity to essentially independent of w while the running time is reduced from exponential in w to linear in w. For example, we improve the time complexity of an algorithm of Berger & Rompel (1991) for rainbow hypergraph coloring by a factor of approximately log~2 n and the processor complexity by a factor of approximately m~(ln 2). As a major application of this, we give an NC algorithm for the Lovasz Local Lemma. Previous NC algorithms, including the seminal algorithm of Moser & Tardos (2010) and the work of Chandrasekaran et. al (2013), required that (essentially) the bad-events could span only O(log n) variables; we relax this to allowing polylog(n) variables. As two applications of our new algorithm, we give algorithms for defective vertex coloring and domatic graph partition. One main sub-problem encountered in these algorithms is to generate a probability space which can "fool" a given list of GF(2) Fourier characters. Schulman (1992) gave an NC algorithm for this; we dramatically improve its efficiency to near-optimal time and processor complexity and code dimension. This leads to a new algorithm to solve the heavy-codeword problem, introduced by Naor & Naor (1993), with a near-linear processor complexity (mn)~(1+o(1)).
We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of e...
详细信息
We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.
Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showe...
详细信息
The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently se...
详细信息
ISBN:
(纸本)9781509028238
The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds;in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, It is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.
The problem of obtaining blocks of operations and threads of parallel algorithm resulting in a smaller number of accesses to global memory and resulting in the efficient use of caches and shared memory graphics proces...
详细信息
The problem of obtaining blocks of operations and threads of parallel algorithm resulting in a smaller number of accesses to global memory and resulting in the efficient use of caches and shared memory graphics processor is investigated. We formulated and proved statements to assess the volume of communication transactions generated by alternative sizing of blocks, as well as to minimize the number of cache misses due to the use of temporal and spatial locality of data. The research is constructive and allows software implementation for practical use.
We present two new parallel algorithms which compute the GCD of n integers of O(n) bits in O(n/ logn) time with O(n(2+epsilon)) processors in the worst case, for any epsilon > 0 in CRCW PRAM model. More generally, ...
详细信息
ISBN:
(纸本)9781450350648
We present two new parallel algorithms which compute the GCD of n integers of O(n) bits in O(n/ logn) time with O(n(2+epsilon)) processors in the worst case, for any epsilon > 0 in CRCW PRAM model. More generally, we prove that computing the GCD of m integers of O(n) bits can be achieved in O(n/ logn)parallel time with O(mn(1+epsilon)) processors, for any 2 <=( )m <= n(3/2)/ log n, i.e. the parallel time does not depend on the number m of integers considered in this range. We suggest an extended GCD version for many integers as well as an algorithm to solve linear Diophantine equations.
Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor ...
详细信息
Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine;and the few that break away from this mold do not easily incorporate practically important constraints, such as non-negativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction Method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration;and it naturally leads to distributed algorithms suitable for parallel implementation. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using non-negativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based non-negative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.
This paper proposes a parallel framework to distributedly solve robust convex programs over networks when the constraints are affected by uncertainty. To this end, we adopt a probabilistic approach based on randomly s...
详细信息
ISBN:
(纸本)9781467386838
This paper proposes a parallel framework to distributedly solve robust convex programs over networks when the constraints are affected by uncertainty. To this end, we adopt a probabilistic approach based on randomly sampling the uncertainty to obtain a standard convex optimization, which is called the scenario problem. However, the number of samples to attain high levels of probabilistic guarantee of robustness may be large, which results in a large number of constraints in the scenario problem. Instead of using a single processor, we resort to multiple processors that are distributed among different nodes of a network. We study recursive algorithms which parallelize the computational task across the nodes and collaboratively solve the problem very effectively. Under local communication links, we show that each node asymptotically provides a solution to the scenario optimization problem.
暂无评论