The nearestneighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work ...
详细信息
The nearestneighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms, and no open source library exists that can scale efficiently to thousands of cores. In this paper, we present algorithms and a library built on top of the message passing interface (MPI) and OpenMP that enable nearestneighbor searches to hundreds of thousands of cores for arbitrary-dimensional datasets. The library supports both exact and approximate nearestneighbor searches. The latter is based on iterative, randomized, and greedy KD-tree (k-dimensional tree) searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an all-neighbors query search on a 13 TB synthetic dataset of 0.8 billion points in 2,048 dimensions on the 131K cores on Oak Ridge's XK6 "Jaguar" system. These results represent several orders of magnitude improvement over current state-of-the-art methods. Also, we apply our method to nonsynthetic data from machine learning data repositories. For example, we perform an all-nearest-neighbors search on a variant of the "MNIST" handwritten digit dataset with 8 million points in 784 dimensions on 16,384 cores of the "Stampede" system at the Texas Advanced Computing Center, achieving less than one second per RKDT iteration.
The property of almost every point being a Lebesgue point has proven to be crucial for the consistency of several classification algorithms based on nearestneighbors. We characterize Lebesgue points in terms of a 1-n...
详细信息
The sa-tree is an interesting metric space indexing structure that is inspired by the Voronoi diagram. In essence, the sa-tree records a portion of the Delaunay graph of the data set, a graph whose vertices are the Vo...
详细信息
The sa-tree is an interesting metric space indexing structure that is inspired by the Voronoi diagram. In essence, the sa-tree records a portion of the Delaunay graph of the data set, a graph whose vertices are the Voronoi cells, with edges between adjacent cells. An improvement is presented on the original search strategy for the sa-tree. This consists of details on the intuition behind the improvement as well as the original search strategy and a proof of their correctness. Furthermore, it is shown how to adapt an incremental nearestneighbor algorithm to the sa-tree, which allows computing nearestneighbor in a progressive manner. Unlike other adaptations, the resulting algorithm does not take the unnecessary steps to ensure that keys of "node" elements are monotonically non-decreasing. (C) 2003 Elsevier B.V. All rights reserved.
In vivo, some proteins exist as monomers (single polypeptide chains) and others as oligomers. The latter are composed of two or more chains (subunits) that are associated with each other through noncovalent interactio...
详细信息
In vivo, some proteins exist as monomers (single polypeptide chains) and others as oligomers. The latter are composed of two or more chains (subunits) that are associated with each other through noncovalent interactions and, occasionally, disulfide bonds. Oligomers can be further classified into homo-oligomers (formed by identical subunits) and hetero-oligomers (formed by different subunits), and they form the structural basis of various biological functions such as cooperative effects, the allosteric mechanism and ion-channel gating. Therefore, it would be of less interest or of low priority for crystallographic scientists to crystallize a single protein chain and determine its three-dimensional structure if it is already known as part of an oligomer. However, it is both time-consuming and laborious to acquire such information on the quaternary structure attribute purely by experiment. In particular, with the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop an automated method by which crystallographic scientists can rapidly and effectively identify which quaternary attribute a particular protein chain has according to its sequence information. In view of this, a computational method has been developed by hybridizing the approaches of functional domain composition and pseudo amino acid composition. For the convenience of crystallographic scientists, a user-friendly web server, PQSA-Pred, has been established at http://218.65.61.89:8080/bioinfo/pqsa-pred, by which the desired information can be easily obtained.
In this work we study speed-ups and time-space trade-offs for solving the shortest vector problem (SVP) on Euclidean lattices based on tuple lattice sieving. Our results extend and improve upon previous work of Bai-La...
详细信息
ISBN:
(纸本)9783319765785;9783319765778
In this work we study speed-ups and time-space trade-offs for solving the shortest vector problem (SVP) on Euclidean lattices based on tuple lattice sieving. Our results extend and improve upon previous work of Bai-Laarhoven-Stehle [ANTS'16] and Herold-Kirshanova [PKC'17], with better complexities for arbitrary tuple sizes and offering tunable time-memory trade-offs. The trade-offs we obtain stem from the generalization and combination of two algorithmic techniques: the configuration framework introduced by Herold-Kirshanova, and the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA'16]. When the available memory scales quasi-linearly with the list size, we show that with triple sieving we can solve SVP in dimension n in time 2(0.3588n+o(n)) and space 2(0.1887n+o(n)), improving upon the previous best triple sieve time complexity of 2(0.3717n+o(n)) of Herold-Kirshanova. Using more memory we obtain better asymptotic time complexities. For instance, we obtain a triple sieve requiring only 2(0.3300n+o(n)) time and 2(0.2075n+o(n)) memory to solve SVP in dimension n. This improves upon the best double Gauss sieve of Becker-Ducas-Gama-Laarhoven, which runs in 2(0.3685n+o(n)) time when using the same amount of space.
The current base station management faces challenges such as imprecise information perception, a lack of precise prediction techniques for load and energy consumption, and the absence of refined optimization methods f...
详细信息
ISBN:
(纸本)9798400708305
The current base station management faces challenges such as imprecise information perception, a lack of precise prediction techniques for load and energy consumption, and the absence of refined optimization methods for multi-source comprehensive scheduling. It can only achieve a quantitative complementarity of energy on the supply side, the grid side, and the load side, without considering the differences in energy quality of various forms of energy and their optimal scheduling in conversion, transmission, storage, and utilization. Simultaneously, there is a severe deficiency in crowss-temporal and spatial allocation and utilization of energy, as well as the use of edge computing and big data analytics for precise prediction and optimization scheduling. This has resulted in low overall energy utilization efficiency, high carbon emissions, and other issues. There is an urgent need to break through the key technologies of accurate perception, precise prediction, precise scheduling, and fine control in the energy Internet of Things system for base stations. The project team has put forth a scientifically sound solution, addressing issues related to precise perception of base station status, accurate load prediction, fine optimization of energy management, and precise control of comprehensive energy systems. They have proposed the "Uncertain Data Processing Algorithm for Base Station Energy Consumption" to tackle and solve the challenge of precise load prediction in energy IoT based on high-noise, low-quality data.
GraCT and ContaCT were the first compressed data structures to represent object trajectories, demonstrating that it was possible to use orders of magnitude less space than classical indexes while staying competitive i...
详细信息
GraCT and ContaCT were the first compressed data structures to represent object trajectories, demonstrating that it was possible to use orders of magnitude less space than classical indexes while staying competitive in query times. In this paper we considerably enhance their space, query capabilities, and time performance with three contributions. (1) We design and evaluate algorithms for more sophisticated nearestneighbor queries, finding the trajectories closest to a given trajectory or to a given point during a time interval. (2) We modify the data structure used to sample the spatial positions of the objects along time. This improves the performance on the classic spatio-temporal and the nearestneighbor queries, by orders of magnitude in some cases. (3) We introduce RelaCT, a tradeoff between the faster and larger ContaCT and the smaller and slower GraCT, offering a new relevant space-time tradeoff for large repetitive datasets of trajectories.
暂无评论