Neighborhood discovery is a precursor to knowledge discovery in complex and large datasets such as temporal data, which is a sequence of data tuples measured at successive time instances. Hence instead of mining the e...
详细信息
Neighborhood discovery is a precursor to knowledge discovery in complex and large datasets such as temporal data, which is a sequence of data tuples measured at successive time instances. Hence instead of mining the entire data, we are interested in dividing the huge data into several smaller intervals of interest which we call as temporal neighborhoods. In this paper we propose a class of algorithms to generate temporal neighborhoods through unequal depth discretization. We describe four novel algorithms (a) Similarity based Merging (SMerg), (b) Stationary distribution based Merging (StMerg), (c) Greedy Merge (GMerg) and, (d) Optimal Merging (OptMerg). The SMerg and STMerg algorithms are based on the robust framework of Markov models and the Markov Stationary distribution respectively. GMerg is a greedy approach and OptMerg algorithm is geared towards discovering optimal binning strategies for the most effective partitioning of the data into temporal neighborhoods. Both these algorithms do not use Markov models. We identify temporal neighborhoods with distinct demarcations based on unequal depth discretization of the data. We discuss detailed experimental results in both synthetic and real world data. Specifically, we show (i) the efficacy of our algorithms through precision and recall of labeled bins, (ii) the ground truth validation in real world traffic monitoring datasets and, (iii) Knowledge discovery in the temporal neighborhoods such as global anomalies. Our results indicate that we are able to identify valuable knowledge based on our ground truth validation from real world traffic data.
Data stream management systems (DSMSs) are conceived for running continuous queries (CQs) on the most recently streamed data. This model does not completely fit the needs of several modern data-intensive applications ...
详细信息
Data stream management systems (DSMSs) are conceived for running continuous queries (CQs) on the most recently streamed data. This model does not completely fit the needs of several modern data-intensive applications that require to manage recent/historical/static data and execute both CQs and OTQs joining such data. In order to cope with these new needs, some DSMSs have moved toward the integration of database management systems (DBMSs) functionalities to augment their capabilities. In this paper we adopt the opposite perspective and we lay the groundwork for extending DBMSs to natively support streaming facilities. To this end, we introduce a new kind of table, the streaming table, as a persistent structure where streaming data enters and remains stored for a long period, ideally forever. Streaming tables feature a novel access paradigm: continuous writes and one-time as well as continuous reads. We present a streaming table implementation and two novel types of indices that efficiently support both update and scan high rates. A detailed experimental evaluation shows the effectiveness of the proposed technology.
The increasing availability of large-scale trajectory data provides us great opportunity to explore them for knowledge discovery in transportation systems using advanced data mining techniques. Nowadays, large number ...
详细信息
The increasing availability of large-scale trajectory data provides us great opportunity to explore them for knowledge discovery in transportation systems using advanced data mining techniques. Nowadays, large number of taxicabs in major metropolitan cities are equipped with a GPS device. Since taxis are on the road nearly 24 h a day (with drivers changing shifts), they can now act as reliable sensors to monitor the behavior of traffic. In this article, we use GPS data from taxis to monitor the emergence of unexpected behavior in the Beijing metropolitan area, which has the potential to estimate and improve traffic conditions in advance. We adapt likelihood ratio test statistic (LRT) which have previously been mostly used in epidemiological studies to describe traffic patterns. To the best of our knowledge the use of LRT in traffic domain is not only novel but results in accurate and rapid detection of anomalous behavior. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
While location-aware services, both in professional and private context, are widely used today, not all the available knowledge is exploited. The predicted path moving objects follow when being guided e.g. by a smartp...
详细信息
While location-aware services, both in professional and private context, are widely used today, not all the available knowledge is exploited. The predicted path moving objects follow when being guided e.g. by a smartphone, is not used, instead only the current position is taken into account. In this article, we describe how the exploitation of not only point but also route information can be used to offer a new value to the users as well as a reduction of resource usage. It allows having queries on the future position of moving objects, taking the probability of its estimation into account. An index structure to handle this kind of objects and queries is introduced. Different methods for updating objects are proposed, allowing for even less consumption of resources. A set of experiments shows the performance of our approach. This enables new services to be able to cope with a growing demand, both in quality and quantity, by using sophisticated algorithms and return real value to the user. Furthermore, the communication between server and client is optimized to avoid an overload of the mobile network. (C) 2014 Elsevier B.V. All rights reserved.
In this paper, we propose an efficient query processing algorithm that returns the trajectory results in a progressive manner. We limit the calculation of pairwise shortest path distances between the set of query loca...
详细信息
ISBN:
(纸本)9783319237817;9783319237800
In this paper, we propose an efficient query processing algorithm that returns the trajectory results in a progressive manner. We limit the calculation of pairwise shortest path distances between the set of query locations and the spatial nodes, by highly reducing the preprocessing requirements. Also, we introduce a spatiotemporal similarity measure, based on which the temporal-to-spatial significance of the trajectory results can be easily modified and the query locations can be spatially prioritized according to users' preferences. In our experiments with a real-world road network, we show that the proposed method has approximately ten times less preprocessing requirements than the competitive methods and reduces the search time by two orders of magnitude at least.
暂无评论