In this paper we study the adaptive complexity of submodular optimization. Informally, the adaptive complexity of a problem is the minimal number of sequential rounds required to achieve a constant factor approximatio...
详细信息
ISBN:
(纸本)9781450355599
In this paper we study the adaptive complexity of submodular optimization. Informally, the adaptive complexity of a problem is the minimal number of sequential rounds required to achieve a constant factor approximation when polynomially-many queries can be executed in parallel at each round. Adaptivity is a fundamental concept that is heavily studied in computer science, largely due to the need for parallelizing computation. Somewhat surprisingly, very little is known about adaptivity in submodular optimization. For the canonical problem of maximizing a monotone submodular function under a cardinality constraint, to the best of our knowledge, all that is known to date is that the adaptive complexity is between 1 and Omega(n). Our main result in this paper is a tight characterization showing that the adaptive complexity of maximizing a monotone submodular function under a cardinality constraint is (Theta) over tilde (log n): We describe an algorithm which requires O(log n) sequential rounds and achieves an approximation that is arbitrarily close to 1/3;We show that no algorithm can achieve an approximation log better than O(1/log n) with fewer O(log n/log log n) rounds. Thus, when allowing for parallelization, our algorithm achieves a constant factor approximation exponentially faster than any known existing algorithm for submodular maximization. Importantly, the approximation algorithm is achieved via adaptive sampling and complements a recent line of work on optimization of functions learned from data. In many cases we do not know the functions we optimize and learn them from labeled samples. Recent results show that no algorithm can obtain a constant factor approximation guarantee using polynomially-many labeled samples as in the PAC and PMAC models, drawn from any distribution. Since learning with non-adaptive samples over any distribution results in a sharp impossibility, we consider learning with adaptive samples where the learner obtains poly(n) samples drawn
Geometric problems of interest to mathematical visualization applications involve changing structures, such as the moves that transform one knot into an equivalent knot. In this paper, we describe mathematical entitie...
详细信息
ISBN:
(数字)9781728108582
ISBN:
(纸本)9781728108599
Geometric problems of interest to mathematical visualization applications involve changing structures, such as the moves that transform one knot into an equivalent knot. In this paper, we describe mathematical entities (curves and surfaces) as link-node graphs, and make use of energy-driven relaxation algorithms to optimize their geometric shapes by moving knots and surfaces to their simplified equivalence. Furthermore, we design and conFigure parallel functional units in the relaxation algorithms to accelerate the computation these mathematical deformations require. Results show that we can achieve significant performance optimization via the proposed threading model and level of parallelization.
Truss decomposition is a method used to analyze large sparse graphs in order to identify successively better connected subgraphs. Since in many domains the underlying graph changes over time, its associated truss deco...
详细信息
ISBN:
(数字)9781728141299
ISBN:
(纸本)9781728152080
Truss decomposition is a method used to analyze large sparse graphs in order to identify successively better connected subgraphs. Since in many domains the underlying graph changes over time, its associated truss decomposition needs to be updated as well. This work focuses on the problem of incrementally updating an existing truss decomposition and makes the following three significant contributions. First, it presents a theory that identifies how the truss decomposition can change as new edges get added. Second, it develops an efficient incremental algorithm that incorporates various optimizations to update the truss decomposition after every edge addition. These optimizations are designed to reduce the number of edges that are explored by the algorithm. Third, it extends this algorithm to batch updates (i.e., where the truss decomposition needs to be updated after a set of edges are added), which reduces the overall computations that need to be performed. We evaluated the performance of our algorithms on real-world datasets. Our incremental algorithm achieves over 250000x average speedup for inserting an edge in a graph with 10 million edges relative to the non-incremental algorithm. Further, our experiments on batch updates show that our batch algorithm consistently performs better than the incremental algorithm.
Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. Th...
详细信息
ISBN:
(纸本)9781450349826
Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query processing parallelization and acceleration with co-processors such as GPUs. However, previous work runs queries either on GPU or CPU, ignoring the fact that the best processor for a given query depends on the query's characteristics, which may change as the processing proceeds. We present Griffin, an IR systems that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state-of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation. Our GPU-based approach, as far as we know, achieves the best available GPU search performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm. We evaluate Griffin with real world queries and datasets, and show that it improves query performance by 10x compared to a highly optimized CPU-only implementation, and 1.5x compared to our GPU-approach running alone. We also find that Griffin helps reduce the 95th-, 99th-, and 99.9th-percentile query response time by 10.4x, 16.1x, and 26.8x, respectively.
To take full advantage of multi-core processor resources, this paper examines the multi-core platform using OpenMP implementation of two kinds of pi parallel algorithms, numerical integration and Monte Carlo. The expe...
详细信息
The matricized-tensor times Khatri-Rao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel al...
详细信息
ISBN:
(纸本)9781538643686
The matricized-tensor times Khatri-Rao product (MTTKRP) computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.
In this paper, aiming at the problems of traditional K-means clustering algorithm in big data processing, such as performance and determination of initial clustering center, an improved k-means clustering algorithm ba...
详细信息
ISBN:
(纸本)9781728150956
In this paper, aiming at the problems of traditional K-means clustering algorithm in big data processing, such as performance and determination of initial clustering center, an improved k-means clustering algorithm based on Hadoop platform is proposed. This algorithm uses canopy algorithm and cosine similarity to calculate, optimizes the determination of initial clustering center by K-means algorithm, and uses parallel computing framework to expand the algorithm in parallel. To adapt to big data processing. The experimental results show that the improved k-means clustering algorithm based on Hadoop platform has better clustering effect, and also has good speedup and scalability when processing a large number of data.
A parallel algorithm for maximal independent set (MIS) in hypergraphs has been a long-standing algorithmic challenge, dating back nearly 30 years to a survey of Karp & Ramachandran (1990). Despite its apparent sim...
详细信息
ISBN:
(纸本)9781611975031
A parallel algorithm for maximal independent set (MIS) in hypergraphs has been a long-standing algorithmic challenge, dating back nearly 30 years to a survey of Karp & Ramachandran (1990). Despite its apparent simplicity, there have been no general sub-polynomialtime algorithms or hardness reductions. The best randomized parallel algorithm for hypergraphs of fixed rank r was developed by Beame & Luby (1990) and Kelsen (1992), running in time roughly (log n)(r!). The key probabilistic tool underlying this algorithm is a concentration bound for low-degree polynomials applied to independent input variables;this is a natural generalization of concentration bounds for sums of independent random variables, which are ubiquitous in combinatorics and computer science. These concentration bounds for polynomials do not lend themselves to standard derandomization techniques. Thus, the algorithm of Kelsen cannot be derandomized in any known way. There are no deterministic parallel algorithms for hypergraph MIS for any fixed rank r > 3. We improve the randomized algorithm of Kelsen to obtain a running time of (log n)(2r). We also give a method for derandomizing concentration bounds for polynomials, thus obtaining a deterministic algorithm running in (log n)(2r+3) time and (mn)(O(1)) processors. Our analysis can also apply when r is slowly growing;using this in conjunction with a strategy of Bercea et al. (2015) gives a deterministic MIS algorithm running in time exp(O(log m/log log m + log log n).
The sequence alignment is an important basic work in analyzing large biological data. For the massive short reads alignment problem, based on the dynamic programming approach, divide and conquer principle, and FUSE ke...
详细信息
ISBN:
(数字)9781728126166
ISBN:
(纸本)9781728126173
The sequence alignment is an important basic work in analyzing large biological data. For the massive short reads alignment problem, based on the dynamic programming approach, divide and conquer principle, and FUSE kernel module, a parallel short-read alignment method allowing the optimal number of inserting gaps depending on species and sequence length is developed on multi-core cluster. The experimental results on real and synthetic data show that the proposed parallel alignment method can achieve good speedup with the same alignment accuracy as the sequential alignment method. Compared with the existing parallel alignment method, the proposed method can remarkably reduce the time of partitioning reference genome and reads files and accelerate the large-scale short-read alignment.
暂无评论