The well-studied "power of two choices" family of algorithms creates balanced allocations of m balls into n bins by, for each ball, selecting a few bins at random and then placing the item in the least-loade...
详细信息
The well-studied "power of two choices" family of algorithms creates balanced allocations of m balls into n bins by, for each ball, selecting a few bins at random and then placing the item in the least-loaded bin. A natural variation is to create an unbalanced allocation by, for each ball, selecting a few bins at random and then placing the ball in the most-loaded bin. Surprisingly, this variation has not been previously studied. This paper introduces this family of unbalanced allocation processes and begins its analysis. The behavior of the bounded m case is analyzed in detail via differential equations and coupling, and some preliminary results for the general case are presented.
Differential privacy is a framework to quantify to what extent individual privacy in a statistical database is preserved while releasing useful aggregate information about the database. In this paper, within the class...
详细信息
Differential privacy is a framework to quantify to what extent individual privacy in a statistical database is preserved while releasing useful aggregate information about the database. In this paper, within the classes of mechanisms oblivious of the database and the queries beyond the global sensitivity, we characterize the fundamental tradeoff between privacy and utility in differential privacy, and derive the optimal epsilon-differentially private mechanism for a single real-valued query function under a very general utility-maximization (or cost-minimization) framework. The class of noise probability distributions in the optimal mechanism has staircase-shaped probability density functions which are symmetric (around the origin), monotonically decreasing and geometrically decaying. The staircase mechanism can be viewed as a geometric mixture of uniform probability distributions, providing a simple algorithmic description for the mechanism. Furthermore, the staircase mechanism naturally generalizes to discrete query output settings as well as more abstract settings. We explicitly derive the parameter of the optimal staircase mechanism for l(1) and l(2) cost functions. Comparing the optimal performances with those of the usual Laplacian mechanism, we show that in the high privacy regime (epsilon is small), the Laplacian mechanism is asymptotically optimal as epsilon -> 0;in the low privacy regime (epsilon is large), the minimum magnitude and second moment of noise are Theta (Delta e((-epsilon/2))) and Theta(Delta(2)e((-2 epsilon/3))) as epsilon -> +infinity, respectively, while the corresponding figures when using the Laplacian mechanism are Delta/epsilon and 2 Delta(2)/epsilon(2), where Delta is the sensitivity of the query function. We conclude that the gains of the staircase mechanism are more pronounced in the moderate-low privacy regime.
The main idea of the "black box"approach in exact linear algebra is to reduce matrix problems to the computation of minimum polynomials. In most cases preconditioning is necessary to obtain the desired resul...
详细信息
The main idea of the "black box"approach in exact linear algebra is to reduce matrix problems to the computation of minimum polynomials. In most cases preconditioning is necessary to obtain the desired result. Here good preconditioners will be used to ensure geometrical/algebraic properties on matrices, rather than numerical ones, so we do not address a condition number. We offer a review of problems for which (algebraic) preconditioning is used, provide a bestiary of preconditioning problems, and discuss several preconditioner types to solve these problems. We present new conditioners, including conditioners to preserve low displacement rank for Toeplitz-like matrices. We also provide new analyses of preconditioner performance and results on the relations among preconditioning problems and with linear algebra problems. Thus, improvements are offered for the efficiency and applicability of preconditioners. The focus is on linear algebra problems over finite fields, but most results are valid for entries from arbitrary fields. (C) 2002 Elsevier Science Inc. All rights reserved.
Kernel analog forecasting (KAF) is a methodology for data-driven, nonparametric forecasting of dynamically generated time series data. This approach has a rigorous foundation in Koopman operator theory and it produces...
详细信息
Kernel analog forecasting (KAF) is a methodology for data-driven, nonparametric forecasting of dynamically generated time series data. This approach has a rigorous foundation in Koopman operator theory and it produces good forecasts in practice, but it suffers from the heavy computational costs common to kernel methods. This paper proposes a streaming algorithm for KAF that only requires a single pass over the training data. This algorithm dramatically reduces the costs of training and prediction without sacrificing forecasting skill. Computational experiments demonstrate that the streaming KAF method can successfully forecast several classes of dynamical systems (periodic, quasi-periodic, and chaotic) in both data-scarce and data-rich regimes. The overall methodology may have wider interest as a new template for streaming kernel regression.
We consider the computation of the convex hull of a given n-point set in three-dimensional Euclidean space in an output-sensitive manner. Clarkson and Shor proposed an optimal randomized algorithm for this problem, wi...
详细信息
We consider the computation of the convex hull of a given n-point set in three-dimensional Euclidean space in an output-sensitive manner. Clarkson and Shor proposed an optimal randomized algorithm for this problem, with an expected running time O(nlogh), where h denotes the number of points on the surface of the convex hull. In this note we point out that the algorithm can be made deterministic by using recently developed techniques, thus obtaining an optimal deterministic algorithm.
Both linear and nonlinear relationships may exist among process variables, and monitoring a process with such complex relationships among variables is imperative. However, individual principal component analysis (PCA)...
详细信息
Both linear and nonlinear relationships may exist among process variables, and monitoring a process with such complex relationships among variables is imperative. However, individual principal component analysis (PCA) or kernel PCA (KPCA) may not be able to characterize these complex relationships well. This paper proposes a parallel PCA-KPCA (P-PCA-KPCA) modeling and monitoring scheme that incorporates randomized algorithm (RA) and genetic algorithm (GA) for efficient fault detection for a process with linearly correlated and nonlinearly related variables First, to determine the included variables in the parallel PCA (P-PCA) and the parallel KPCA (P-KPCA) models, GA-based optimization is performed, in which RA is used to generate faulty validation data. Second, monitoring statistics are established for the P-PCA and the P-KPCA models, in which the process status is determined. The proposed monitoring scheme discriminates the linear and nonlinear relationships among variables in a process and deals with nonlinear processes efficiently. We provide case studies on a numerical example and the continuous stirred tank reactor process. These case studies demonstrate that the proposed P-PCA-KPCA monitoring scheme is better than conventional PCA- or KPCA-based methods at performing nonlinear process monitoring.
This paper describes a novel algorithm for approximate nearest neighbor searching. For solving this problem especially in high dimensional spaces, one of the best-known algorithm is Locality-Sensitive Hashing (LSH). T...
详细信息
This paper describes a novel algorithm for approximate nearest neighbor searching. For solving this problem especially in high dimensional spaces, one of the best-known algorithm is Locality-Sensitive Hashing (LSH). This paper presents a variant of the LSH algorithm that outperforms previously proposed methods when the dataset consists of vectors normalized to unit length, which is often the case in pattern recognition. The LSH scheme is based on a family of hash functions that preserves the locality of points. This paper points out that for our special case problem we can design efficient hash functions that map a point on the hypersphere into the closest vertex of the randomly rotated regular polytope. The computational analysis confirmed that the proposed method could improve the exponent p, the main indicator of the performance of the LSH algorithm. The practical experiments also supported the efficiency of our algorithm both in time and in space.
Let G be a directed graph with an integral cost on each edge. For a given positive integer k, the k-length negative cost cycle (kLNCC) problem is to determine whether G contains a negative cost cycle with at least k e...
详细信息
Let G be a directed graph with an integral cost on each edge. For a given positive integer k, the k-length negative cost cycle (kLNCC) problem is to determine whether G contains a negative cost cycle with at least k edges. Because of its applications in deadlock avoidance in synchronized streaming computing network, kLNCC was first studied in paper (Li et al. in Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures, pp 243-252, 2010), but remains open whether the problem is NP-hard. In this paper, we first show that an even harder problem, the fixed-point k-length negative cost cycle trail (FPkLNCCT) problem that is to determine whether G contains a negative closed trail enrouting a given vertex (as the fixed point) and containing only cycles with at least k edges, is NP-complete in a multigraph even when k = 3 by reducing from the 3SAT problem. Then, we prove the NP-completeness of kLNCC by giving amore sophisticated reduction from the 3 occurrence 3-satisfiability (3O3SAT) problem which is knownNP-complete. The complexity result for kLNCC is interesting since polynomial-time algorithms are known for both 2LNCC, which is actually equivalent to negative cycle detection, and the k-cycle problem, which is to determine whether G contains a cycle with of length at least k. Thus, this paper closes the open problem proposed by Li et al. (2010) whether kLNCC admits polynomial-time algorithms. Last but not the least, we present for 3LNCC a randomized algorithm that, if G contains a negative cycle of length at most L, can find a solution with a probability 1- is an element of for any is an element of. (0, 1] within runtime O(2min{L, h} mn is an element of ln 1 is an element of is an element of), where m, n and h are respectively the numbers of edges, vertices and length 2 negative cost cycles in G.
We present parameterized algorithms for the k-path problem, the p-packing of q-sets problem, and the q-dimensional p-matching problem. Our algorithms solve these problems with high probability in time exponential only...
详细信息
We present parameterized algorithms for the k-path problem, the p-packing of q-sets problem, and the q-dimensional p-matching problem. Our algorithms solve these problems with high probability in time exponential only in the parameter (k, p, q) and using polynomial space. The constant bases of the exponentials are significantly smaller than in previous works;for example, for the k-path problem the improvement is from 2 to 1.66. We also show how to detect if a d-regular graph admits an edge coloring with d colors in time within a polynomial factor of 2((d-1)n/2). Our techniques generalize an algebraic approach studied in various recent works. (c) 2017 Elsevier Inc. All rights reserved.
For a set of n disjoint line segments S in R-2, the visibility testing problem (VTP) is to test whether the query point p sees a query segment s is an element of S. For this configuration, the visibility counting prob...
详细信息
For a set of n disjoint line segments S in R-2, the visibility testing problem (VTP) is to test whether the query point p sees a query segment s is an element of S. For this configuration, the visibility counting problem (VCP) is to preprocess S such that the number of visible segments in S from any query point p can be computed quickly. In this paper, we solve VTP in expected logarithmic query time using quadratic preprocessing time and space. Moreover, we propose a (1 + delta)-approximation algorithm for VCP using at most quadratic preprocessing time and space. The query time of this method is O-epsilon (1/delta(2) root n) where O-epsilon (f (n)) = O (f (n)n(epsilon)) and epsilon > 0 is an arbitrary constant number. (C) 2015 Elsevier B.V. All rights reserved.
暂无评论