In this paper we develop fast and memory efficient numerical methods for learning functions of many variables that admit sparse representations in terms of general bounded orthonormal tensor product bases. Such functi...
详细信息
In this paper we develop fast and memory efficient numerical methods for learning functions of many variables that admit sparse representations in terms of general bounded orthonormal tensor product bases. Such functions appear in many applications including, e.g., various Uncertainty Quantification (UQ) problems involving the solution of parametric PDE that are approximately sparse in Chebyshev or Legendre product bases (Chkifa et al. in Polynomial approximation via compressed sensing of high-dimensional functions on lower sets., 2016;Rauhut and Schwab in Math Comput 86(304):661-700, 2017). We expect that our results provide a starting point for a new line of research on sublinear-time solution techniques for UQ applications of the type above which will eventually be able to scale to significantly higher-dimensional problems than what are currently computationally feasible. More concretely, let B be a finite Bounded Orthonormal Product Basis (BOPB) of cardinality vertical bar B vertical bar = N. Herein we will develop methods that rapidly approximate any functionfthat is sparse in the BOPB, that is,f: D subset of R-D -> C of the form f(x)= Sigma(b is an element of S) c(b)center dot b(x) with S subset of B of cardinality vertical bar S vertical bar = s << N. Our method adapts the CoSaMP algorithm (Needell and Tropp in Appl Comput Harmon Anal 26(3):301-321, 2009) to use additional function samples fromfalong a randomly constructed grid G subset of R-D with universal approximation properties in order to rapidly identify the multi-indices of the most dominant basis functions in S component by component during each CoSaMP iteration. It has a runtime of just(slogN)O(1), uses only(slogN)O(1) function evaluations on the fixed and nonadaptive grid G, and requires not more than(slogN)O(1) bits of memory. We emphasize that nothing about S or any of the coefficientscb is an element of Cis assumed in advance other than that S subset of B has |S| <= s. Both S and its related coe
In the problem of compressed phase retrieval, the goal is to reconstruct a sparse or approximately k-sparse vector x epsilon C-n given access to y = | phi x|, where |v| denotes the vector obtained from taking the abso...
详细信息
In the problem of compressed phase retrieval, the goal is to reconstruct a sparse or approximately k-sparse vector x epsilon C-n given access to y = | phi x|, where |v| denotes the vector obtained from taking the absolute value of v epsilon Cn coordinate-wise. In this paper we present sublinear-time algorithms for a few for-each variants of the compressive phase retrieval problem which are akin to the variants considered for the classical compressive sensing problem in theoretical computer science. Our algorithms use pure combinatorial techniques and near-optimal number of measurements.
Longest common substring (LCS), longest palindrome substring (LPS), and Ulam distance (UL) are three fundamental string problems that can be classically solved in near linear time. In this work, we present sublinear t...
详细信息
Longest common substring (LCS), longest palindrome substring (LPS), and Ulam distance (UL) are three fundamental string problems that can be classically solved in near linear time. In this work, we present sublineartime quantum algorithms for these problems along with quantum lower bounds. Our results shed light on a very surprising fact: Although the classic solutions for LCS and LPS are almost identical (via suffix trees), their quantum computational complexities are different. While we give an exact o (root n) time algoritham for LPS, we prove that LCS needs at least time omega(sic) (n(2/3 )) even for 0/1 strings.
We study the problem of estimating the value of sums of the form when one has the ability to sample with probability proportional to its magnitude. When , this problem is equivalent to estimating the selectivity of a ...
详细信息
We study the problem of estimating the value of sums of the form when one has the ability to sample with probability proportional to its magnitude. When , this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a -multiplicative approximation of has query and time complexities . Here, is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365-1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where , and , our upper bound is , in contrast to their lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublineartime. However, when the ratio between in-degree and out-degree is bounded-or equivalently, when the ratio between
The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-timealgorithms are practical;however, when we handle big data, even a linear-time algor...
详细信息
The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-timealgorithms are practical;however, when we handle big data, even a linear-time algorithm may be too slow. Thus, sublinear- and constant-timealgorithms are required. The academic research project, "Foundations of Innovative algorithms for Big Data," which was started in 2014 and will finish in September 2021, aimed at developing various techniques and frameworks to design algorithms for big data. In this project, we introduce a "sublinear Computation Paradigm." Toward this purpose, we first provide a survey of constant-timealgorithms, which are the most investigated framework of this area, and then present our recent results on sublinear progressive algorithms. A sublinear progressive algorithm first outputs a temporary approximate solution in constant time, and then suggests better solutions gradually in sublinear-time, finally finds the exact solution. We present sublinear Progressive Algorithm Theory (SPA Theory, for short), which enables to make a sublinear progressive algorithm for any property if it has a constant-time algorithm and an exact algorithm (an exponential-time one is allowed) without losing any computation time in the big-O sense.
Recent works in graph visualization attempt to reduce the runtime of repulsion force computation of force-directed algorithms using sampling. However, they fail to reduce the runtime for attraction force computation t...
详细信息
Recent works in graph visualization attempt to reduce the runtime of repulsion force computation of force-directed algorithms using sampling. However, they fail to reduce the runtime for attraction force computation to sublinear in the number of edges. We present the sublinearForce framework for a fully sublinear-time force computation algorithm for drawing large complex graphs. More precisely, we present new sublinear-time algorithms for the attraction force computation of force-directed algorithms. We then integrate them with sublinear-time repulsion force computation to give a fully sublinear-time force computation. Extensive experiments show that our algorithms compute layouts on average 80% faster than the existing linear-time force computation algorithm, while obtaining significantly better quality metrics such as edge crossing and shape-based metrics.
Construction of error-correcting codes achieving a designated minimum distance parameter is a central problem in coding theory. In this work, we study a very simple construction of binary linear codes that correct a g...
详细信息
Construction of error-correcting codes achieving a designated minimum distance parameter is a central problem in coding theory. In this work, we study a very simple construction of binary linear codes that correct a given number of errors K. Moreover, we design a simple, nearly optimal syndrome decoder for the code as well. The running time of the decoder is only logarithmic in the block length of the code and nearly linear in the number of errors K. This decoder can be applied to exact for-all sparse recovery over any field, improving upon previous results with the same number of measurements. Furthermore, computation of the syndrome from a received word can be done in nearly linear time in the block length. We also demonstrate an application of these techniques in nonadaptive group testing and construct simple explicit measurement schemes with O(K2 log2 N) tests and O(K3 log2 N) recovery time for identifying up to K defectives in a population of size N.
In this paper we develop a sublinear-time compressive sensing algorithm for approximating functions of many variables which are compressible in a given Bounded Orthonormal Product Basis (BOPB). The resulting algorithm...
详细信息
In this paper we develop a sublinear-time compressive sensing algorithm for approximating functions of many variables which are compressible in a given Bounded Orthonormal Product Basis (BOPB). The resulting algorithm is shown to both have an associated best s-term recovery guarantee in the given BOPB, and also to work well numerically for solving sparse approximation problems involving functions contained in the span of fairly general sets of as many as similar to 10(230) orthonormal basis functions. All code is made publicly available. As part of the proof of the main recovery guarantee new variants of the well known CoSaMP algorithm are proposed which can utilize any sufficiently accurate support identification procedure satisfying a Support Identification Property (SIP) in order to obtain strong sparse approximation guarantees. These new CoSaMP variants are then proven to have both runtime and recovery error behavior which are largely determined by the associated runtime and error behavior of the chosen support identification method. The main theoretical results of the paper are then shown by developing a sublinear-time support identification algorithm for general BOPB sets which is robust to arbitrary additive errors. Using this new support identification method to create a new CoSaMP variant then results in a new robust sublinear-time compressive sensing algorithm for BOPB-compressible functions of many variables.
In this paper, we consider the problems from the area of sublineartimealgorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differin...
详细信息
ISBN:
(纸本)9781450392648
In this paper, we consider the problems from the area of sublineartimealgorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex. In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access model, which has a query that returns the entire neighborhood at a unit cost, has recently been considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global fully random hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor. For edge sampling and counting, our new lower bounds are in the most powerful full neighborhood access model. We provide matching upper bounds in the weaker hash-ordered neighbor access model. Our new faster algorithms can be provably implemented efficiently on massive graphs in external memory and with the current APIs for, e.g., Twitter or Wikipedia. For triangle counting, we provide a separation: a better upper bound with full neighborhood access than the known lower bounds with indexed neighbor access. The technical core of our paper is our edge-sampling algorithm on which the other results depend. We now describe our results on the classic problems of edge and triangle counting. We give an algorithm that uses hash-ordered neighbor access to approximately count edges in time (O) over tilde (n/epsilon(2)root m + 1/epsilon(2)) (compare to the state of the art without hash-ordered neighbor access of (O) over tilde (n/epsilon(2)root m) Eden, Ron, and Seshadhri [ICALP 2017]). We present an Omega(n/epsilon root m) lower bound for epsilon >= root m/n in the full neighborhood access model. This improves the lower bound of Omega(n/root epsilon m) by Goldreich and Ron [R
We give two fully dynamic algorithms that maintain a (1 + epsilon)-approximation of the weight M of a minimum spanning forest (MSF) of an n-node graph G with edges weights in [1, W], for any epsilon > 0. (1) Our de...
详细信息
We give two fully dynamic algorithms that maintain a (1 + epsilon)-approximation of the weight M of a minimum spanning forest (MSF) of an n-node graph G with edges weights in [1, W], for any epsilon > 0. (1) Our deterministic algorithm takes O(W2 log W /epsilon(3)) worst-case update time, which is O (1) if both W and E are constants. (2) Our randomized (Monte -Carlo style) algorithm works with high probability and runs in worst-case O (log W /epsilon(4)) update time if W = O((m*)(1/6)/log(2/3) n), where m* is the minimum number of edges in the graph throughout all the updates. It works even against an adaptive adversary. We complement our algorithmic results with two cell-probe lower bounds for dynamically maintaining an approximation of the weight of an MSF of a graph. (C) 2021 Elsevier Inc. All rights reserved.
暂无评论