The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-time algorithms are practical;however, when we handle big data, even a linear-time algor...
详细信息
The challenges posed by big data in the 21st Century are complex: Under the previous common sense, we considered that polynomial-time algorithms are practical;however, when we handle big data, even a linear-time algorithm may be too slow. Thus, sublinear- and constant-time algorithms are required. The academic research project, "Foundations of Innovative algorithms for Big Data," which was started in 2014 and will finish in September 2021, aimed at developing various techniques and frameworks to design algorithms for big data. In this project, we introduce a "Sublinear Computation Paradigm." Toward this purpose, we first provide a survey of constant-time algorithms, which are the most investigated framework of this area, and then present our recent results on sublinear progressive algorithms. A sublinear progressive algorithm first outputs a temporary approximate solution in constant time, and then suggests better solutions gradually in sublinear-time, finally finds the exact solution. We present Sublinear progressive Algorithm Theory (SPA Theory, for short), which enables to make a sublinear progressive algorithm for any property if it has a constant-time algorithm and an exact algorithm (an exponential-time one is allowed) without losing any computation time in the big-O sense.
A top-k dominating query reports the k items with the highest domination score. algorithms for efficient processing of this query have been recently proposed in the literature. Those methods, either index based or ind...
详细信息
A top-k dominating query reports the k items with the highest domination score. algorithms for efficient processing of this query have been recently proposed in the literature. Those methods, either index based or index free, apply a series of pruning criteria toward efficient processing. However, they are characterized by several limitations, such as (1) they lack progressiveness (they report the k best items at the end of the processing), (2) they require a multi-dimensional index or they build a grid-based index on-the-fly, which suffers from performance degradation, especially in high dimensionalities, and (3) they do not support vertically decomposed data. In this paper, we design efficient algorithms that can handle any subset of the dimensions in a progressive manner. Among the studied algorithms, the Differential Algorithm shows the best overall performance.
Group skyline query is a powerful tool for optimal group analysis. Most of the existing group skyline queries select optimal groups by comparing the dominance relationship between aggregate-based points;such feature c...
详细信息
Group skyline query is a powerful tool for optimal group analysis. Most of the existing group skyline queries select optimal groups by comparing the dominance relationship between aggregate-based points;such feature creates difficulties for users to specify an appropriate aggregate function. Besides, many significant groups that have great attractions to users in practice may be overlooked. To address these issues, the group skyline (GSky) query is formulated on the basis of a general definition of group dominance operator. While the existing GSky query algorithms are effective, there is still room for improvement in terms of progressiveness and efficiency. In this paper, we propose some new lemmas which facilitate direct generation of the GSky query results. Consecutively, we design a layered unit-based (LU) algorithm that applies a layered optimumstrategy. Additionally, for the GSky query over the data that are dynamically produced and cannot be indexed, we propose a novel index-independent algorithm, called sorted-based progressive (SP) algorithm. The experimental results demonstrate the effectiveness, efficiency, and progressiveness of the proposed algorithms. By comparing with the state-of-the-art algorithm for the GSky query, our LU algorithmis more scalable and two orders of magnitude faster.
With the further development of sensor techniques in wireless sensor networks (WSNs), it is becoming urgent that they should be able to support complicated queries like skyline query for multi-preference and decision ...
详细信息
ISBN:
(纸本)9781424454686
With the further development of sensor techniques in wireless sensor networks (WSNs), it is becoming urgent that they should be able to support complicated queries like skyline query for multi-preference and decision making. In this paper, we consider skyline query evaluation in WSNs by devising evaluation algorithms for finding skyline points on a dataset progressively. The core techniques adopted are to partition the dataset into several disjoint subsets and output the skyline points by examining each subsequent subset progressively, using some of the skyline points obtained so far to filter out those unlikely skyline points in the current processing subset from transmission. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms on synthetic and real datasets. The experimental results show that the proposed algorithms outperform existing algorithms significantly in network lifetime prolongation.
Developing algorithms that produce approximate solutions is always interesting when we are generating the final solution for a problem. progressive algorithms report a partial solution to the user which approximates t...
详细信息
Developing algorithms that produce approximate solutions is always interesting when we are generating the final solution for a problem. progressive algorithms report a partial solution to the user which approximates the final solution in some specific steps. Thus, the user can stop the algorithm if the error of the partial solution is tolerable in terms of the application. In this paper, we study the closest pair problem under the Euclidean metric. A progressive algorithm is designed for the closest pair problem, which consists of O (log n) steps and spends O (n) time in each step. In step r, the error of the partial solution is bounded by O (alpha/2(r)) , where alpha is the ratio of the maximum pairwise distance and the minimum pairwise distance of points.
With the deployment of wireless sensor networks (WSNs) for environmental monitoring and event surveillance, WSNs can be treated as virtual databases to respond to user queries. It thus becomes more urgent that such da...
详细信息
With the deployment of wireless sensor networks (WSNs) for environmental monitoring and event surveillance, WSNs can be treated as virtual databases to respond to user queries. It thus becomes more urgent that such databases are able to support complicated queries like skyline queries. Skyline query which is one of popular queries for multi-criteria decision making has received much attention in the past several years. In this paper we study skyline query optimization and maintenance in WSNs. Specifically, we first consider skyline query evaluation on a snapshot dataset, by devising two algorithms for finding skyline points progressively without examining the entire dataset. Two key strategies are adopted: One is to partition the dataset into several disjoint subsets and produce the skyline points in each subset progressively. Another is to employ a global filter that consists of some skyline points in the processed subsets to filter out unlikely skyline points from the rest of unexamined subsets. We then consider the query maintenance issue by proposing an algorithm for incremental maintenance of the skyline in a streaming dataset. A novel maintenance mechanism is proposed, which is able to identify which skyline points from past skylines to be the global filter and determine when the global filter is broadcast. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms on both synthetic and real sensing datasets, and the experimental results demonstrate that the proposed algorithms significantly outperform existing algorithms in terms of network lifetime prolongation.
The G-Skyline (GSky) query is a powerful tool to analyze optimal groups in decision support. Compared with other group skyline queries, it releases users from providing an aggregate function. Besides, it can get much ...
详细信息
The G-Skyline (GSky) query is a powerful tool to analyze optimal groups in decision support. Compared with other group skyline queries, it releases users from providing an aggregate function. Besides, it can get much comprehensive results without overlooking some important results containing non-skylines. However, it is hard for the users to make sensible choices when facing so many results the GSky query returns, especially over a large, high-dimensional dataset or with a large group size. In this article, we investigate k representative G-Skyline (kGSky) queries to obtain a manageable size of optimal groups. The kGSky query can also inherit the advantage of the GSky query;its results are representative and diversified. Next, we propose three exact algorithms with novel techniques including an upper bound pruning, a grouping strategy, a layered optimum strategy, and a hybrid strategy to efficiently process the kGSky query. Consider these exact algorithms have high time complexity and the precise results are not necessary in many applications. We further develop two approximate algorithms to trade off some accuracy for efficiency. Extensive experiments on both real and synthetic datasets demonstrate the efficiency, scalability, and accuracy of the proposed algorithms.
暂无评论