Querying k nearest neighbors of query point from data set in high dimensional space is one of important operations in spatial database. The classic nearest neighbor query algorithms are based on R-tree. However, R-tre...
详细信息
Querying k nearest neighbors of query point from data set in high dimensional space is one of important operations in spatial database. The classic nearest neighbor query algorithms are based on R-tree. However, R-tree exits overlapping problem of minimum bounding rectangles. This causes its time complexity exponentially depends on the dimensionality of the space. So, the reduction of the dimensionality is the key point. Hilbert curve fills high dimensional space linearly, divides the space into equal-size grids and maps points lying in grids into linear space. Using the quality of reducing dimensionality of Hilbert curve, the paper presents an approximate k nearest neighbor query algorithm AKNN, and analyzes the quality of k nearest neighbors in theory. According to the experimental result, the execution time of algorithm AKNN is shorter than the nearest neighbor query algorithm based on R-tree in high dimensional space, and the quality of approximate k nearest neighbors satisfies the need of real applications.
Core Vector Machine(CVM) is a promising technique for scaling up a binary Support Vector Machine(SVM) to handle large data sets with the utilization of approximate Minimum Enclosing Ball(MEB) ***,the experimental resu...
详细信息
Core Vector Machine(CVM) is a promising technique for scaling up a binary Support Vector Machine(SVM) to handle large data sets with the utilization of approximate Minimum Enclosing Ball(MEB) ***,the experimental results in implementation show that there always exists some redundancy in the final core set to determine the final decision *** propose an approximate MEB algorithm in this paper to decrease the redundant core vectors as much as *** simulations on synthetic data sets demonstrate the competitive performances on training time,core vectors' number and training accuracy.
Core Vector Machine (CVM) is a promising technique for scaling up a binary Support Vector Machine (SVM) to handle large data sets with the utilization of approximate Minimum Enclosing Ball (MEB) algorithm. However, th...
详细信息
Core Vector Machine (CVM) is a promising technique for scaling up a binary Support Vector Machine (SVM) to handle large data sets with the utilization of approximate Minimum Enclosing Ball (MEB) algorithm. However, the experimental results in implementation show that there always exists some redundancy in the final core set to determine the final decision function. We propose an approximate MEB algorithm in this paper to decrease the redundant core vectors as much as possible. The simulations on synthetic data sets demonstrate the competitive performances on training time, core vectors' number and training accuracy.
Trajectory extraction has been studied in many research areas, including traditional spatio-temporal databases, advanced vehicle information systems, and military surveillance. In wireless sensor networks, several fac...
详细信息
Trajectory extraction has been studied in many research areas, including traditional spatio-temporal databases, advanced vehicle information systems, and military surveillance. In wireless sensor networks, several factors make it difficult to acquire an object's trajectory, including imprecise and stream-oriented localized locations, limited sensor storage, and limited bandwidth. This paper proposes the Possible Presence Zone Trajectory Extraction Method (PPZTEM) with an error bound control mechanism to extract the approximate object trajectory from imprecise localized locations. PPZTEM constructs a trajectory that describes the most probable path of an object in wireless sensor networks. The constructed trajectory of PPZTEM satisfies the given error bound constraint and requires only a small amount of data. Experiments on a broad variety of synthetic and real-world object trajectories reveal that PPZTEM significantly reduces the data size of the trajectory by fusing the localized locations. At the same time PPZTEM achieves user-specified error constraints on the estimated locations. (C) 2012 Elsevier B.V. All rights reserved.
Data sharing in today's information society poses a threat to individual privacy and organisational confidentiality. k-anonymity is a widely adopted model to prevent the owner of a record being re-identified. By g...
详细信息
Data sharing in today's information society poses a threat to individual privacy and organisational confidentiality. k-anonymity is a widely adopted model to prevent the owner of a record being re-identified. By generalising and/or suppressing certain portions of the released dataset, it guarantees that no records can be uniquely distinguished from at least other k-1 records. A key requirement for the k-anonymity problem is to minimise the information loss resulting from data modifications. This article proposes a top-down approach to solve this problem. It first considers each record as a vertex and the similarity between two records as the edge weight to construct a complete weighted graph. Then, an edge cutting algorithm is designed to divide the complete graph into multiple trees/components. The Large Components with size bigger than 2k-1 are subsequently split to guarantee that each resulting component has the vertex number between k and 2k-1. Finally, the generalisation operation is applied on the vertices in each component (i.e. equivalence class) to make sure all the records inside have identical quasi-identifier values. We prove that the proposed approach has polynomial running time and theoretical performance guarantee O(k). The empirical experiments show that our approach results in substantial improvements over the baseline heuristic algorithms, as well as the bottom-up approach with the same approximate bound O(k). Comparing to the baseline bottom-up O(logk)-approximation algorithm, when the required k is smaller than 50, the adopted top-down strategy makes our approach achieve similar performance in terms of information loss while spending much less computing time. It demonstrates that our approach would be a best choice for the k-anonymity problem when both the data utility and runtime need to be considered, especially when k is set to certain value smaller than 50 and the record set is big enough to make the runtime have to be taken into account.
Querying k nearest neighbors of query point from data set in high dimensional space is one of important operations in spatial database. The classic nearest neighbor query algorithms are based on R-tree. However, R-tre...
详细信息
Querying k nearest neighbors of query point from data set in high dimensional space is one of important operations in spatial database. The classic nearest neighbor query algorithms are based on R-tree. However, R-tree exits overlapping problem of minimum bounding rectangles. This causes its time complexity exponentially depends on the dimensionality of the space. So, the reduction of the dimensionality is the key point. Hilbert curve fills high dimensional space linearly, divides the space into equal-size grids and maps points lying in grids into linear space. Using the quality of reducing dimensionality of Hilbert curve, the paper presents an approximate k nearest neighbor query algorithm AKNN, and analyzes the quality of k nearest neighbors in theory. According to the experimental result, the execution time of algorithm AKNN is shorter than the nearest neighbor query algorithm based on R-tree in high dimensional space, and the quality of approximate k nearest neighbors satisfies the need of real applications.
The bilinear relation between chemical shifts and charges of some atoms was studied. Illustrative calculations of lysine and some propyne's derivatives were given. The result showed that the bilinear relation exit...
详细信息
The bilinear relation between chemical shifts and charges of some atoms was studied. Illustrative calculations of lysine and some propyne's derivatives were given. The result showed that the bilinear relation exits and an approximate algorithm was proposed. Some experimentally unavailable chemical shifts of C-13 in compound series [Mn-2(mu-PPh2)(mu-PPhR)(CO)(8)], R=H, Cl, Me, Et, or Mn(CO) were predicted by the algorithm.
Maximum consensus estimation plays a critically important role in several robust fitting problems in computer vision. Currently, the most prevalent algorithms for consensus maximization draw from the class of randomiz...
详细信息
Maximum consensus estimation plays a critically important role in several robust fitting problems in computer vision. Currently, the most prevalent algorithms for consensus maximization draw from the class of randomized hypothesize-and-verify algorithms, which are cheap but can usually deliver only rough approximate solutions. On the other extreme, there are exact algorithms which are exhaustive search in nature and can be costly for practical-sized inputs. This paper fills the gap between the two extremes by proposing deterministic algorithms to approximately optimize the maximum consensus criterion. Our work begins by reformulating consensus maximization with linear complementarity constraints. Then, we develop two novel algorithms: one based on non-smooth penalty method with a Frank-Wolfe style optimization scheme, the other based on the Alternating Direction Method of Multipliers (ADMM). Both algorithms solve convex subproblems to efficiently perform the optimization. We demonstrate the capability of our algorithms to greatly improve a rough initial estimate, such as those obtained using least squares or a randomized algorithm. Compared to the exact algorithms, our approach is much more practical on realistic input sizes. Further, our approach is naturally applicable to estimation problems with geometric residuals. Matlab code and demo program for our methods can be downloaded from https://***/FQcxpi.
We have developed two new approximate methods for stochastically simulating chemical systems. The methods are based on the idea of representing all the reactions in the chemical system by a single reaction, i.e., by t...
详细信息
We have developed two new approximate methods for stochastically simulating chemical systems. The methods are based on the idea of representing all the reactions in the chemical system by a single reaction, i.e., by the representative reaction approach (RRA). Discussed in the article are the concepts underlying the new methods along with flowchart with all the steps required for their implementation. It is shown that the two RRA methods {with the reaction $ 2A \rightarrow B $ as the representative reaction (RR)} perform creditably with regard to accuracy and computational efficiency, in comparison to the exact stochastic simulation algorithm (SSA) developed by Gillespie and are able to successfully reproduce at least the first two moments of the probability distribution of each species in the systems studied. As such, the RRA methods represent a promising new approach for stochastically simulating chemical systems. (C) 2011 Wiley Periodicals, Inc. J Comput Chem, 2012
Given the n complex coefficients of a degree n - 1 complex polynomial, we wish to evaluate the polynomial at a large number m greater than or equal to n of points on the complex plane. This problem is required by many...
详细信息
Given the n complex coefficients of a degree n - 1 complex polynomial, we wish to evaluate the polynomial at a large number m greater than or equal to n of points on the complex plane. This problem is required by many algebraic computations and so is considered in most basic algorithm texts (e.g., [A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer algorithms, Addison-Wesley, 1974]). We assume an arithmetic model of computation, where on each step we can execute an arithmetic operation, which is computed exactly. All previous exact algorithms [C. M. Fiduccia, Proceedings 4th Annual ACM Symposium on Theory of Computing, 1972, pp. 88-93;H. T. Kung, Fast Evaluation and Interpolation, Carnegie-Mellon, 1973;A. B. Borodin and I. Munro, The Computational Complexity of Algebraic and Numerical Problems, American Elsevier, 1975;V. Pan, A. Sadikou, E. Landowne, and O. Tiga, Comput. Math. Appl., 25 (1993), pp. 25-30] cost at least work Omega(log(2) n) per point, and previously, there were no known approximation algorithms for complex polynomial evaluation within the unit circle with work bounds better than the fastest known exact algorithms. There are known approximation algorithms [V. Rokhlin, J. Complexity, 4 (1988), pp. 12-32;V. Y. Pan, J. H. Reif, and S. R. Tate, in Proceedings 32nd Annual IEEE Symposium on Foundations of Computer Science, 1992, pp. 703-713] for polynomial evaluation at real points, but these do not extend to evaluation at general points on the complex plane. We provide approximation algorithms for complex polynomial evaluation that cost, in many cases, near constant amortized work per point. Let k = log(\P\/epsilon), where \P\ is the sum of the moduli of the coefficients of the input polynomial P(z). Let (P) over tilde(z(j)) be an epsilon-approx of P(z) if epsilon upper bounds the modulus of the error of the approximation (P) over tilde(z(j)) at each evaluation point z(j), that is, \P(z(j)) - (P) over tilde(z(j))\ less than or
暂无评论