In the past 10 years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality...
详细信息
In the past 10 years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm, which provides approximate nearest neighbors in a euclidean space with sublinear complexity, is probably the most popular, the euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and chi(2) distances. In this paper, we present a new LSH scheme adapted to chi(2) distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity, and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.
Given two compact convex sets P and Q in the plane, we compute an image of P under a rigid motion that approximately maximizes the overlap with Q. More precisely, for any epsilon > 0, we compute a rigid motion such...
详细信息
Given two compact convex sets P and Q in the plane, we compute an image of P under a rigid motion that approximately maximizes the overlap with Q. More precisely, for any epsilon > 0, we compute a rigid motion such that the area of overlap is at least 1-epsilon times the maximum possible overlap. Our algorithm uses O(1/epsilon) extreme point and line intersection queries on P and Q, plus O((1/epsilon(2)) log(1/epsilon)) running time. If only translations are allowed, the extra running time reduces to O((1/epsilon) log(1/epsilon)). If P and Q are convex polygons with n vertices in total that are given in an array or balanced tree, the total running time is O((1/epsilon) log n + (1/epsilon(2)) log(1/epsilon)) for rigid motions and O((1/epsilon) log n + (1/epsilon) log(1/epsilon)) for translations. (c) 2006 Elsevier B.V. All rights reserved.
In this paper we extend the deterministic sublinear FFT algorithm in Plonka et al. (Numer algorithms 78:133-159, 2018. https://***/10.1007/s11075-017-0370-5) for fast reconstruction of M-sparse vectors x of length N =...
详细信息
In this paper we extend the deterministic sublinear FFT algorithm in Plonka et al. (Numer algorithms 78:133-159, 2018. https://***/10.1007/s11075-017-0370-5) for fast reconstruction of M-sparse vectors x of length N = 2(J), where we assume that all components of the discrete Fourier transform (x) over cap = F(N)x are available. The sparsity of x needs not to be known a priori, but is determined by the algorithm. If the sparsity M is larger than 2(J/2), then the algorithm turns into a usual FFT algorithm with runtime O(N log N). For M-2 < N, the runtime of the algorithm is O(M-2 log N). The proposed modifications of the approach in Plonka et al. (2018) lead to a significant improvement of the condition numbers of the Vandermonde matrices which are employed in the iterative reconstruction. Our numerical experiments show that our modification has a huge impact on the stability of the algorithm. While the algorithm in Plonka et al. (2018) starts to be unreliable for M > 20 because of numerical instabilities, the modified algorithm is still numerically stable for M = 200.
Given an n-point metric space (M, d), metric 1-median asks for a point p is an element of M minimizing Sigma(x is an element of M) d(p, x). We show that for each computable function f : Z(+) -> Z(+) satisfying f(n)...
详细信息
ISBN:
(纸本)9783030895433;9783030895426
Given an n-point metric space (M, d), metric 1-median asks for a point p is an element of M minimizing Sigma(x is an element of M) d(p, x). We show that for each computable function f : Z(+) -> Z(+) satisfying f(n) = omega(1), metric 1-median has a deterministic, o(n)-query, o(f(n) . log n)-approximation and nonadaptive algorithm. Previously, no deterministic o(n)-query o(n)approximation algorithms are known for metric 1-median.
We consider the extensively studied problem of l(2)/l(2) compressed sensing. The main contribution of our work is an improvement over [Gilbert, Li, Porat and Strauss, STOC 2010] with faster decoding time and significa...
详细信息
ISBN:
(纸本)9781450367059
We consider the extensively studied problem of l(2)/l(2) compressed sensing. The main contribution of our work is an improvement over [Gilbert, Li, Porat and Strauss, STOC 2010] with faster decoding time and significantly smaller column sparsity, answering two open questions of the aforementioned work. Previous work on sublinear-time compressed sensing employed an iterative procedure, recovering the heavy coordinates in phases. We completely depart from that framework, and give the first sublinear-time l(2)/l(2) scheme which achieves the optimal number of measurements without iterating;this new approach is the key step to our progress. Towards that, we satisfy the l(2)/l(2) guarantee by exploiting the heaviness of coordinates in a way that was not exploited in previous work. Via our techniques we obtain improved results for various sparse recovery tasks, and indicate possible further applications to problems in the field, to which the aforementioned iterative procedure creates significant obstructions.
作者:
Luan, QiZhao, LiangCUNY
Grad Ctr PhD Program Math New York NY 10010 USA CUNY
Lehman Coll Dept Comp Sci New York NY USA
Matrix CUR decomposition aims at representing a large matrix A with the product C . U . R, where C (resp. R) consists of a small collection of the original columns (resp. rows), and U is a small intermediate matrix co...
详细信息
ISBN:
(纸本)9781728192284
Matrix CUR decomposition aims at representing a large matrix A with the product C . U . R, where C (resp. R) consists of a small collection of the original columns (resp. rows), and U is a small intermediate matrix connecting C and R. While modern randomized CUR algorithms have provided many efficient methods of choosing representative columns and rows, there hasn't been a method to find the optimal U matrix. In this paper, we present a sublinear-time randomized method to find good choices of the U matrix. Our proposed algorithm treats the task of finding U as a double-sided least squares problem min(Z) parallel to A - CZR parallel to(F), and is able to guarantee a close-to-optimal solution by solving a down-sampled problem of much smaller size. We provide worst-case analysis on its approximation error relative to theoretical optimal low-rank approximation error, and we demonstrate empirically how this method can improve the approximation of several large-scale real data matrices with a small number of additional computations.
We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on [-1, 1] that cannot be approximated to accuracy ...
详细信息
We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on [-1, 1] that cannot be approximated to accuracy e in Wasserstein-1 distance even if we know all of their moments to multiplicative accuracy (1 +/- 2(-Omega(1/epsilon)));this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein-1 distance is a well-studied algorithmic problem, and a recent result of Cohen-Steiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using 2(Omega(1/epsilon)) random walks initiated at uniformly random nodes in the graph. As a strengthening of our main result, we show that improving the dependence on 1/epsilon in this result would require a new algorithmic approach. Specifically, no algorithm can compute an e-accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of 2(Omega(1/epsilon)) random walks of length 2(Omega(1/epsilon)) started at random nodes.
Given two compact convex sets P and Q in the plane, we compute an image of P under a rigid motion that approximately maximizes the overlap with Q. More precisely, for any epsilon > 0, we compute a rigid motion such...
详细信息
Given two compact convex sets P and Q in the plane, we compute an image of P under a rigid motion that approximately maximizes the overlap with Q. More precisely, for any epsilon > 0, we compute a rigid motion such that the area of overlap is at least 1-epsilon times the maximum possible overlap. Our algorithm uses O(1/epsilon) extreme point and line intersection queries on P and Q, plus O((1/epsilon(2)) log(1/epsilon)) running time. If only translations are allowed, the extra running time reduces to O((1/epsilon) log(1/epsilon)). If P and Q are convex polygons with n vertices in total that are given in an array or balanced tree, the total running time is O((1/epsilon) log n + (1/epsilon(2)) log(1/epsilon)) for rigid motions and O((1/epsilon) log n + (1/epsilon) log(1/epsilon)) for translations. (c) 2006 Elsevier B.V. All rights reserved.
The prototypical signal processing pipeline can be divided into four blocks. Representation of the signal in a basis suitable for processing. Enhancement of the meaningful part of the signal and noise reduction. Estim...
详细信息
The prototypical signal processing pipeline can be divided into four blocks. Representation of the signal in a basis suitable for processing. Enhancement of the meaningful part of the signal and noise reduction. Estimation of important statistical properties of the signal. Adaptive processing to track and adapt to changes in the signal statistics. This thesis revisits each of these blocks and proposes new algorithms, borrowing ideas from information theory, theoretical computer science, or communications. First, we revisit the Walsh-Hadamard transform (WHT) for the case of a signal sparse in the transformed domain, namely that has only K ≤ N non-zero coefficients. We show that an efficient algorithm exists that can compute these coefficients in O(K log 2 (K) log 2 (N/K)) and using only O(K log 2 (N/K)) samples. This algorithm relies on a fast hashing procedure that computes small linear combinations of transformed domain coefficients. A bipartite graph is formed with linear combinations on one side, and non-zero coefficients on the other. A peeling decoder is then used to recover the non-zero coefficients one by one. A detailed analysis of the algorithm based on error correcting codes over the binary erasure channel is given. The second chapter is about beamforming. Inspired by the rake receiver from wireless com- munications, we recognize that echoes in a room are an important source of extra signal diversity. We extend several classic beamforming algorithms to take advantage of echoes and also propose new optimal formulations. We explore formulations both in time and frequency domains. We show theoretically and in numerical simulations that the signal-to-interference-and-noise ratio increases proportionally to the number of echoes used. Finally, beyond objective measures, we show that echoes also directly improve speech intelligibility as measured by the perceptual evaluation of speech quality (PESQ) metric. Next, we attack the problem of direction of arrival of a
暂无评论