We present an efficient probabilistic algorithm for testing approximate membership of words to regular languages modulo the edit distance. Our algorithm runs in polynomial time in the size of the input automaton and t...
详细信息
We present an efficient probabilistic algorithm for testing approximate membership of words to regular languages modulo the edit distance. Our algorithm runs in polynomial time in the size of the input automaton and the inverse error precision in contrast to all previous approaches, and independently of the size of the input word. We also improve a previous approximate membership tester modulo the Hamming distance such that it runs in polynomial complexity time, but with larger polynomials than for the edit distance. (C) 2013 Elsevier B.V. All rights reserved.
A program correctness checker is an algorithm for checking the output of a computation. That is, given a program and an instance on which the program is run, the checker certifies whether the output of the program on ...
详细信息
A program correctness checker is an algorithm for checking the output of a computation. That is, given a program and an instance on which the program is run, the checker certifies whether the output of the program on that instance is correct. This paper defines the concept of a program checker. It designs program checkers for a few specific and carefully chosen problems in the class FP of functions computable in polynomial time. Problems in FP for which checkers are presented in this paper include Sorting, Matrix Rank and GCD. It also applies methods of modern cryptography, especially the idea of a probabilistic interactive proof, to the design of program checkers for group theoretic computations. Two structural theorems are proven here. One is a characterization of problems that can be checked. The other theorem establishes equivalence classes of problems such that whenever one problem in a class is checkable, all problems in the class are checkable.
To exploit the similarity information hidden in the hyperlink structure of the Web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multis...
详细信息
To exploit the similarity information hidden in the hyperlink structure of the Web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multistep neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [1], a recursive refinement of cocitation, and PSimRank, a novel variant with better theoretical characteristics. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. We justify our approximation method by asymptotic worst-case lower bounds: We show that there is a significant gap between exact and approximate approaches, and suggest that the exact computation, in general, is infeasible for large-scale inputs. We were the first to evaluate SimRank on real Web data. On the Stanford WebBase [2] graph of 80M pages the quality of the methods increased significantly in each refinement step until step four.
Nonparametric neighborhood methods for learning entail estimation of class conditional probabilities based on relative frequencies of samples that are "near-neighbors" of a test point. We propose and explore...
详细信息
Nonparametric neighborhood methods for learning entail estimation of class conditional probabilities based on relative frequencies of samples that are "near-neighbors" of a test point. We propose and explore the behavior of a learning algorithm that uses linear interpolation and the principle of maximum entropy (LIME). We consider some theoretical properties of the LIME algorithm: LIME weights have exponential form;the estimates are consistent;and the estimates are robust to additive noise. In relation to bias reduction, we show that near-neighbors contain a test point in their convex hull asymptotically. The common linear interpolation solution used for regression on grids or look-up-tables is shown to solve a related maximum entropy problem. LIME simulation results support use of the method, and performance on a pipeline integrity classification problem demonstrates that the proposed algorithm has practical value.
We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, proposed by Liu F.T., Ting K.M. and Zhou Z. H. in their seminal work as a heuristic method for anomaly det...
详细信息
We present the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection, proposed by Liu F.T., Ting K.M. and Zhou Z. H. in their seminal work as a heuristic method for anomaly detection in Big Data. We prove that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved, using the Law of Large Numbers. A couple of counterexamples are presented to show that the method is inconclusive and no certificate of quality can be given, when using it as a means to detect anomalies. Hence, an alternative version of the method is proposed whose mathematical foundation is fully justified. Furthermore, a criterion for choosing the number of sampled trees needed to guarantee confidence intervals of the numerical results is presented. Finally, numerical experiments are presented to compare the performance of the classic method with the proposed one.
We prove upper bounds on the order and degree of the polynomials involved in a resolvent representation of the prime differential ideal associated with a polynomial differential system for a particular class of ordina...
详细信息
We prove upper bounds on the order and degree of the polynomials involved in a resolvent representation of the prime differential ideal associated with a polynomial differential system for a particular class of ordinary first order algebraic-differential equations arising in control theory. We also exhibit a probabilistic algorithm which computes this resolvent representation within time polynomial in the natural syntactic parameters and the degree of a certain algebraic variety related to the input system. In addition, we give a probabilistic polynomial-time algorithm for the computation of the differential Hilbert function of the ideal. (C) 2005 Elsevier Inc. All rights reserved.
This paper investigates the problem of assigning unique identifiers to processors that communicate through shared memory. Solutions to fundamental multiprocessor coordination problems such as the Mutual Exclusion Prob...
详细信息
This paper investigates the problem of assigning unique identifiers to processors that communicate through shared memory. Solutions to fundamental multiprocessor coordination problems such as the Mutual Exclusion Problem and the Choice Coordination Problem often rely on unique identifiers. We present a probabilistic protocol that solves this Processor Identity Problem for asynchronous processors that communicate through a common shared memory. This protocol requires no central arbiter, and all processors start in exactly the same state. The use of our protocol simplifies shared memory processor design by eliminating the need to encode processor identifiers in system hardware or software structures.
In experiments designed for family-based association studies, methods such as transmission disequilibrium test require large number of trios to identify single-nucleotide polymorphisms associated with the disease. How...
详细信息
In experiments designed for family-based association studies, methods such as transmission disequilibrium test require large number of trios to identify single-nucleotide polymorphisms associated with the disease. However, unavailability of a large number of trios is the Achilles' heel of many complex diseases, especially for late-onset diseases. In this paper, we propose a novel approach to this problem by means of the Dempster-Shafer method. The simulation studies show that the Dempster-Shafer method has a promising overall performance, in identifying single-nucleotide polymorphisms in the correct association class, as it has 90 percent accuracy even with 60 trios.
We show that the size of any sample space that could be used in Freivalds' probabilistic matrix product verification algorithm for n x n matrices is at least (n - 1)/epsilon if the probability of error is at most ...
详细信息
We show that the size of any sample space that could be used in Freivalds' probabilistic matrix product verification algorithm for n x n matrices is at least (n - 1)/epsilon if the probability of error is at most epsilon, matching the upper bound of Kimbrel and Sinha. We also provide a characterization of any sample space for which Freivalds' algorithm has probability of error at most epsilon. We then provide a generalization of Freivalds' algorithm and give a tight lower bound on the time-randomness tradeoff for this class of algorithms.
暂无评论