As the scale of the problems we want to solve in real life becomes larger, the input sizes of the problems we want to solve could be much larger than the memory of a single computer. In these cases, the classical algo...
详细信息
As the scale of the problems we want to solve in real life becomes larger, the input sizes of the problems we want to solve could be much larger than the memory of a single computer. In these cases, the classical algorithms may no longer be feasible options, even when they run in linear time and linear space, as the input size is too large. In this thesis, we study various combinatorial problems in different computation models that process large input sizes using limited resources. In particular, we consider the query model, streaming model, and massively parallel computation model. In addition, we also study the tradeoffs between the adaptivity and performance of algorithms in these models. We first consider two graph problems, vertex coloring problem and metric traveling salesman problem (TSP). The main results are structure results for these problems, which give frameworks for achieving sublinear algorithms of these problems in different models. We also show that the sublinear algorithms for (∆ + 1)-coloring problem are tight. We then consider the graph sparsification problem, which is an important technique for designing sublinear algorithms. We give proof of the existence of a linear size hypergraph cut sparsifier, along with a polynomial algorithm that calculates one. We also consider sublinear algorithms for this problem in the streaming and query models. Finally, we study the round complexity of submodular function minimization (SFM). In particular, we give a polynomial lower bound on the number of rounds we need to compute s − t max flow - a special case of SFM - in the streaming model. We also prove a polynomial lower bound on the number of rounds we need to solve the general SFM problem in polynomial queries.
We show a fully dynamic algorithm for maintaining (1 + epsilon)-approximate size of maximum matching of the graph with n vertices and m edges using m(0.5-Omega epsilon(1)) update time. This is the first polynomial imp...
详细信息
ISBN:
(纸本)9798350318944
We show a fully dynamic algorithm for maintaining (1 + epsilon)-approximate size of maximum matching of the graph with n vertices and m edges using m(0.5-Omega epsilon(1)) update time. This is the first polynomial improvement over the long-standing O(n) update time, which can be trivially obtained by periodic recomputation. Thus, we resolve the value version of a major open question of the dynamic graph algorithms literature (see, e.g., [Gupta and Peng FOCS'13], [Bernstein and Stein SODA'16], [Behnezhad and Khanna SODA'22]). Our key technical component is the first sublinear algorithm for (1, epsilon n)-approximate maximum matching with sublinear running time on dense graphs. All previous algorithms suffered a multiplicative approximation factor of at least 1.499 or assumed that the graph has a very small maximum degree.
Range partitioning is a typical and mostly used data partitioning method and has became a core operation in most of big data computing platforms. Given an input Lof Ndata items admitting a total order, the goal of ran...
详细信息
Range partitioning is a typical and mostly used data partitioning method and has became a core operation in most of big data computing platforms. Given an input Lof Ndata items admitting a total order, the goal of range partitioning is to divide the whole input into kranges containing the same number of data items. There is a trivial lower bound Omega(N) for the exact partitioning algorithms, since they need to at least make a full scan of the whole data. In the context of big data computing, even algorithms with O(N) time are not always thought to be efficient enough, the ultimate goal of designing algorithms on big data is usually to solve problems within sublineartime. Therefore, it is well motivated and important to study sublinear algorithms for the range partitioning problem. The paper aims to answer three questions. For the internal memory (RAM) model, since sophisticated sampling based (epsilon, delta)-approximation partitioning algorithm with O(klog(N/delta)/epsilon(2)) time cost has been proposed, the first question is what a lower bound we can obtain for sublinear partitioning algorithms. For the external memory (I/O) model, as far as we know, no previous works give external partitioning algorithms with performance guarantee within sublinear time, therefore the two questions are what the upper bound and the lower bound we can achieve for sublinear external partitioning algorithms. To answer the above questions, based on the RAM and I/O model, the paper studies the lower and upper bounds for the range partitioning problem. For the RAM model, alower bound Omega(k(1-delta)/epsilon(2)) for the cost of sampling based partitioning algorithms is proved. For the I/O model, two lower bounds of the sampling cost required by sublinear external range partitioning algorithms are proved, which indicate that at least a full scan of the whole input is needed in the worst case and a general sublinear external partitioning algorithm does not exist. Motivated by the hard in
We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on [-1, 1] that cannot be approximated to accuracy ...
详细信息
We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on [-1, 1] that cannot be approximated to accuracy e in Wasserstein-1 distance even if we know all of their moments to multiplicative accuracy (1 +/- 2(-Omega(1/epsilon)));this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein-1 distance is a well-studied algorithmic problem, and a recent result of Cohen-Steiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using 2(Omega(1/epsilon)) random walks initiated at uniformly random nodes in the graph. As a strengthening of our main result, we show that improving the dependence on 1/epsilon in this result would require a new algorithmic approach. Specifically, no algorithm can compute an e-accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of 2(Omega(1/epsilon)) random walks of length 2(Omega(1/epsilon)) started at random nodes.
Estimating the number of triangles in a graph is one of the most fundamental problems in sublinear algorithms. In this work, we provide an algorithm that approximately counts the number of triangles in a graph using o...
详细信息
Estimating the number of triangles in a graph is one of the most fundamental problems in sublinear algorithms. In this work, we provide an algorithm that approximately counts the number of triangles in a graph using only polylogarithmic queries when the number of triangles on any edge in the graph is polylogarithmically bounded. Our query oracle Tripartite Independent Set (TIS) takes three disjoint sets of vertices, and as inputs, and answers whether there exists a triangle having one endpoint in each of these three sets. Our query model generally belongs to the class of group queries (Ron and Tsur ACM Trans. Comput. Theory 8(4), 15, 2016;Dell and Lapinskas 2018) and in particular is inspired by the Bipartite Independent Set (BIS) query oracle of Beame et al. (2018). We extend the algorithmic framework of Beame et al., with TIS replacing BIS, for approximately counting triangles in graphs.
Given an n-point metric space (M, d), metric 1-median asks for a point p is an element of M minimizing Sigma(x is an element of M) d(p, x). We show that for each computable function f : Z(+) -> Z(+) satisfying f(n)...
详细信息
ISBN:
(纸本)9783030895433;9783030895426
Given an n-point metric space (M, d), metric 1-median asks for a point p is an element of M minimizing Sigma(x is an element of M) d(p, x). We show that for each computable function f : Z(+) -> Z(+) satisfying f(n) = omega(1), metric 1-median has a deterministic, o(n)-query, o(f(n) . log n)-approximation and nonadaptive algorithm. Previously, no deterministic o(n)-query o(n)approximation algorithms are known for metric 1-median.
In this paper we extend the deterministic sublinear FFT algorithm in Plonka et al. (Numer algorithms 78:133-159, 2018. https://***/10.1007/s11075-017-0370-5) for fast reconstruction of M-sparse vectors x of length N =...
详细信息
In this paper we extend the deterministic sublinear FFT algorithm in Plonka et al. (Numer algorithms 78:133-159, 2018. https://***/10.1007/s11075-017-0370-5) for fast reconstruction of M-sparse vectors x of length N = 2(J), where we assume that all components of the discrete Fourier transform (x) over cap = F(N)x are available. The sparsity of x needs not to be known a priori, but is determined by the algorithm. If the sparsity M is larger than 2(J/2), then the algorithm turns into a usual FFT algorithm with runtime O(N log N). For M-2 < N, the runtime of the algorithm is O(M-2 log N). The proposed modifications of the approach in Plonka et al. (2018) lead to a significant improvement of the condition numbers of the Vandermonde matrices which are employed in the iterative reconstruction. Our numerical experiments show that our modification has a huge impact on the stability of the algorithm. While the algorithm in Plonka et al. (2018) starts to be unreliable for M > 20 because of numerical instabilities, the modified algorithm is still numerically stable for M = 200.
作者:
Luan, QiZhao, LiangCUNY
Grad Ctr PhD Program Math New York NY 10010 USA CUNY
Lehman Coll Dept Comp Sci New York NY USA
Matrix CUR decomposition aims at representing a large matrix A with the product C . U . R, where C (resp. R) consists of a small collection of the original columns (resp. rows), and U is a small intermediate matrix co...
详细信息
ISBN:
(纸本)9781728192284
Matrix CUR decomposition aims at representing a large matrix A with the product C . U . R, where C (resp. R) consists of a small collection of the original columns (resp. rows), and U is a small intermediate matrix connecting C and R. While modern randomized CUR algorithms have provided many efficient methods of choosing representative columns and rows, there hasn't been a method to find the optimal U matrix. In this paper, we present a sublinear-time randomized method to find good choices of the U matrix. Our proposed algorithm treats the task of finding U as a double-sided least squares problem min(Z) parallel to A - CZR parallel to(F), and is able to guarantee a close-to-optimal solution by solving a down-sampled problem of much smaller size. We provide worst-case analysis on its approximation error relative to theoretical optimal low-rank approximation error, and we demonstrate empirically how this method can improve the approximation of several large-scale real data matrices with a small number of additional computations.
Identification of up to d defective items and up to h inhibitors in a set of n items is the main task of non-adaptive group testing with inhibitors. To reduce the cost of this Herculean task, a subset of the n items i...
详细信息
ISBN:
(数字)9783030148126
ISBN:
(纸本)9783030148119;9783030148126
Identification of up to d defective items and up to h inhibitors in a set of n items is the main task of non-adaptive group testing with inhibitors. To reduce the cost of this Herculean task, a subset of the n items is formed and then tested. This is called group testing. A test outcome on a subset of items is positive if the subset contains at least one defective item and no inhibitors, and negative otherwise. We present two decoding schemes for efficiently identifying the defective items and the inhibitors in the presence of e erroneous outcomes in time poly(d, h, e, log(2) n), which is sublinear to the number of items. This decoding complexity significantly improves the state-of-the-art schemes in which the decoding time is linear to the number of items, i.e., poly(d, h, e, n). Moreover, each column of the measurement matrices associated with the proposed schemes can be nonrandomly generated in polynomial order of the number of rows. As a result, one can save space for storing them. Simulation results confirm our theoretical analysis. When the number of items is sufficiently large, the decoding time in our proposed scheme is smallest in comparison with existing work. In addition, when some erroneous outcomes are allowed, the number of tests in the proposed scheme is often smaller than the number of tests in existing work.
We consider the extensively studied problem of l(2)/l(2) compressed sensing. The main contribution of our work is an improvement over [Gilbert, Li, Porat and Strauss, STOC 2010] with faster decoding time and significa...
详细信息
ISBN:
(纸本)9781450367059
We consider the extensively studied problem of l(2)/l(2) compressed sensing. The main contribution of our work is an improvement over [Gilbert, Li, Porat and Strauss, STOC 2010] with faster decoding time and significantly smaller column sparsity, answering two open questions of the aforementioned work. Previous work on sublinear-time compressed sensing employed an iterative procedure, recovering the heavy coordinates in phases. We completely depart from that framework, and give the first sublinear-time l(2)/l(2) scheme which achieves the optimal number of measurements without iterating;this new approach is the key step to our progress. Towards that, we satisfy the l(2)/l(2) guarantee by exploiting the heaviness of coordinates in a way that was not exploited in previous work. Via our techniques we obtain improved results for various sparse recovery tasks, and indicate possible further applications to problems in the field, to which the aforementioned iterative procedure creates significant obstructions.
暂无评论