Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with work by Luby (19...
详细信息
ISBN:
(纸本)9781611974782
Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with work by Luby (1988) and continuing with Berger & Rompel (1991) and Chari et al. (1994), showed that these techniques can be combined to give deterministic parallel algorithms for combinatorial optimization problems involving sums of w-juntas. We improve these algorithms through derandomized variable partitioning. This reduces the processor complexity to essentially independent of w while the running time is reduced from exponential in w to linear in w. For example, we improve the time complexity of an algorithm of Berger & Rompel (1991) for rainbow hypergraph coloring by a factor of approximately log(2) n and the processor complexity by a factor of approximately m(ln2). As a major application of this, we give an NC algorithm for the Lovasz Local Lemma Previous NC algorithms, including the seminal algorithm of Moser & Tardos (2010) and the work of Chandrasekaran et. al (2013), required that (essentially) the bad-events could span only O(log n) variables;we relax this to allowing polylog(n) variables. As two applications of our new algorithm, we give algorithms for defective vertex coloring and domatic graph partition. One main sub-problem encountered in these algorithms is to generate a probability space which can "fool" a given list of GF(2) Fourier characters. Schulman (1992) gave an NC algorithm for this;we dramatically improve its efficiency to near-optimal time and processor complexity and code dimension. This leads to a new algorithm to solve the heavy-codeword problem, introduced by Naor & Naor (1993), with a near-linear processor complexity (mn)(l+o(1)).
The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work ...
详细信息
The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms, and no open source library exists that can scale efficiently to thousands of cores. In this paper, we present algorithms and a library built on top of the message passing interface (MPI) and OpenMP that enable nearest neighbor searches to hundreds of thousands of cores for arbitrary-dimensional datasets. The library supports both exact and approximate nearest neighbor searches. The latter is based on iterative, randomized, and greedy KD-tree (k-dimensional tree) searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an all-neighbors query search on a 13 TB synthetic dataset of 0.8 billion points in 2,048 dimensions on the 131K cores on Oak Ridge's XK6 "Jaguar" system. These results represent several orders of magnitude improvement over current state-of-the-art methods. Also, we apply our method to nonsynthetic data from machine learning data repositories. For example, we perform an all-nearest-neighbors search on a variant of the "MNIST" handwritten digit dataset with 8 million points in 784 dimensions on 16,384 cores of the "Stampede" system at the Texas Advanced Computing Center, achieving less than one second per RKDT iteration.
parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but di...
详细信息
The Lovasz Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. Th...
详细信息
ISBN:
(纸本)9781510836358
The Lovasz Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This algorithm can be parallelized to give an algorithm that uses polynomially many processors and runs in O(log~3 n) time, stemming from O(log n) adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time O(log~2 n), but these algorithms work under significantly more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser & Tardos but uses only a single MIS computation, thus running in O(log~2 n) time. This conceptually new algorithm also gives a clean combinatorial description of a satisfying assignment which might be of independent interest. Our techniques extend to the deterministic LLL algorithm given by Chandrasekaran et al. (2013) leading to an NC-algorithm running in time O(log~2 n) as well. We also provide improved bounds on the run-times of the sequential and parallel resampling-based algorithms originally developed by Moser & Tardos. Our bounds extend to any problem instance in which the tighter Shearer LLL criterion is satisfied. We also improve on the analysis of Kolipaka & Szegedy (2011) to give tighter concentration results.
Designing and implementing cooperation schemes for parallel algorithms has become a very important task recently. The scheme, which defines the cooperation topology, frequency and strategies for handling transferred s...
详细信息
We consider the problem of finding the kth highest element in a totally ordered set of n elements (SELECT), and partitioning a totally ordered set into the top k and bottom n-k elements (PARTITION) using pairwise comp...
详细信息
ISBN:
(纸本)9781450341325
We consider the problem of finding the kth highest element in a totally ordered set of n elements (SELECT), and partitioning a totally ordered set into the top k and bottom n-k elements (PARTITION) using pairwise comparisons. Motivated by settings like peer grading or crowdsourcing, where multiple rounds of interaction are costly and queried comparisons may be inconsistent with the ground truth, we evaluate algorithms based both on their total runtime and the number of interactive rounds in three comparison models: noiseless (where the comparisons are correct), erasure (where comparisons are erased with probability 1-gamma), and noisy (where comparisons are correct with probability 1/2 + gamma/2 and incorrect otherwise). We provide numerous matching upper and lower bounds in all three models. Even our results in the noiseless model, which is quite well-studied in the TCS literature on parallel algorithms, are novel.
The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard ...
详细信息
ISBN:
(纸本)9781450342100
The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floatingpoint operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed. In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.
Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of...
详细信息
For solving systems of linear algebraic equations with block fivediagonal matrices arising in geoelectrics and diffusion problems, the parallel matrix square root method, conjugate gradient method with pre-conditioner...
详细信息
For solving systems of linear algebraic equations with block fivediagonal matrices arising in geoelectrics and diffusion problems, the parallel matrix square root method, conjugate gradient method with pre-conditioner, conjugate gradient method with regularization, and parallel matrix sweep algorithm are proposed and some of them are implemented numerically on multi-core CPU Intel. Investigation of efficiency and optimization of parallel algorithms for solving the problem with quasi-model data are performed. The problem with quasi-model data is solved.
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is tha...
详细信息
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems. Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms. (C) 2015 Elsevier Inc. All rights reserved.
暂无评论