We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern...
详细信息
We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores;the variance of the estimator is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions.
An analysis of high-dimensional data can offer a detailed description of a system but is of-ten challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by e...
详细信息
An analysis of high-dimensional data can offer a detailed description of a system but is of-ten challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions or a metaheuristic search that requires extensive compu-tations. This research proposes a novel method that identifies critical subspaces, reduced-dimensional physical spaces, to achieve dimensionality reduction and variable selection. We apply a randomized search for subspace exploration and leverage ensemble techniques to enhance model performance. When applied to high-dimensional data collected from the failure prediction of a composite/metal hybrid structure exhibiting complex progressive damage failure under loading, the proposed method outperforms the existing and potential alternatives in prediction and important variable selection.
Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by solvin...
详细信息
Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by solving a lower dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, RP is also useful for reducing memory usage, and has useful properties for privacy-preserving optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of the constraint set. For a broad class of RPs, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected to a dimension proportional to the squared Gaussian width of the tangent cone of the constraint set at the original solution. This effective dimension of the convex program is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and l(1)-constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications for privacy-preserving optimization, as well as connections with denoising and compressed sensing.
In this paper, we consider the weighted online set k-multicover problem. In this problem, we have a universe V of elements, a family S of subsets of V with a positive real cost for every S is an element of S, and a &q...
详细信息
In this paper, we consider the weighted online set k-multicover problem. In this problem, we have a universe V of elements, a family S of subsets of V with a positive real cost for every S is an element of S, and a "coverage factor" (positive integer) k. A subset {i(0), i(1).... } subset of V of elements are presented online in an arbitrary order. When each element i(P) is presented, we are also told the collection of all (at least k) sets (S)i(p) subset of S and their costs to which i(P) belongs and we need to select additional sets from Si-P if necessary such that our collection of selected sets contains at least k sets that contain the element i(P). The goal is to minimize the total cost of the selected sets.' In this paper, we describe a new randomized algorithm for the online multicover problem based on a randomized version of the winnowing approach of [N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning 2 (1988) 285-318]. This algorithm generalizes and improves some earlier results in [N. Alon, B. Awerbuch, Y Azar, N. Buchbinder, J. Naor, A general approach to online network optimization problems, in: Proceedings of the 15th ACM-SIAM Symposium on Discrete algorithms, 2004, pp. 570-579;N. Alon, B. Awerbuch, Y Azar, N. Buchbinder, J. Naor, The online set cover problem, in: Proceedings of the 35th Annual ACM Symposium on the Theory of Computing, 2003, pp. 100-105]. We also discuss lower bounds on competitive ratios for deterministic algorithms for general k based on the approaches in [N. Alon, B. Awerbuch, Y. Azar, N. Buchbinder, J. Naor, The online set cover problem, in: Proceedings of the 35th Annual ACM Symposium on the Theory of Computing, 2003, pp. 100-105]. (C) 2007 Elsevier B.V. All rights reserved.
We consider a special case of the weighted caching problem A here the weight of emery page is either 1 or some fixed number M > 1. We present a randomized algorithm which achieves a competitive ratio which is O(log...
详细信息
We consider a special case of the weighted caching problem A here the weight of emery page is either 1 or some fixed number M > 1. We present a randomized algorithm which achieves a competitive ratio which is O(log k) where k is the number of pages which can fit in the cache.
The paper tackles the power of randomization in the context of local distributed computing by analyzing the ability to "boost" the success probability of deciding a distributed language using a Monte-Carlo a...
详细信息
The paper tackles the power of randomization in the context of local distributed computing by analyzing the ability to "boost" the success probability of deciding a distributed language using a Monte-Carlo algorithm. We prove that, in many cases, the ability to increase the success probability for deciding distributed languages is rather limited. This contrasts with the sequential computing setting where boosting can systematically be achieved by repeating the randomized execution.
The technique of -matrices was introduced to deal with (large-scale) dense matrices in an elegant way. It provides a data-sparse format and allows an approximate matrix algebra of nearly optimal complexity. The primar...
详细信息
The technique of -matrices was introduced to deal with (large-scale) dense matrices in an elegant way. It provides a data-sparse format and allows an approximate matrix algebra of nearly optimal complexity. The primary step in the construction of an -matrix is how to approximate the subblocks of a given dense matrix by matrices that have low numerical rank. randomized algorithms have recently been introduced as a highly efficient tool for computing approximate factorization of low-rank matrices. In this paper, we consider various randomized algorithms to construct an -matrix and perform the corresponding -matrix algebra. In particular, by taking advantages of randomization, we present a simple but fast algorithm of complexity for truncating an low-rank matrix of (fixed) rank k. The corresponding efficient deterministic algorithm has the complexity . We provide numerical examples applying to a BEM model associated with Laplace equation in one and two dimensions to show the efficiency of the proposed randomized methods versus the previously known efficient deterministic methods. The gain of efficiency about 10-25% is achievable when a moderate oversampling parameter is used in the computations. In this case, the experimental results show that the proposed algorithm not only has a lower cost but also is more accurate than the conventional methods.
Recently, Charikar et al. investigated the problem of evaluating AND/OR trees, with non-uniform costs on its leaves, from the perspective of the competitive analysis. For an AND/OR tree T they presented a mu(T)-compet...
详细信息
Recently, Charikar et al. investigated the problem of evaluating AND/OR trees, with non-uniform costs on its leaves, from the perspective of the competitive analysis. For an AND/OR tree T they presented a mu(T)-competitive deterministic polynomial time algorithm, where mu(T) is the number of leaves that must be read, in the worst case, in order to determine the value of T. Furthermore, they proved that mu(T) is a lower bound on the deterministic competitiveness, which assures the optimality of their algorithm. The power of randomization in this context has remained as an open question. Here, we take a step towards solving this problem by presenting a 5/6 mu(T)-competitive randomized polynomial time algorithm. This contrasts with the best known lower bound mu(T)/2. (c) 2008 Elsevier B.V. All rights reserved.
Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm-based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to ...
详细信息
Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm-based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to the system through system level encoding. In this paper, we propose randomized construction techniques, under an extended model, for the design of ABFT systems with the required fault tolerance capability. The model considers failures in the processors performing the checking operations.
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show...
详细信息
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean (l(2)) and l(infinity) spaces.
暂无评论