The excellent performance of short texts classification has emerged in the past few years. However, massive short texts with few words like invoice data are different with traditional short texts like tweets in its no...
详细信息
Advances in wireless networks and positioning technologies (e.g., CPS) have enabled new data management applications that monitor moving objects. In such new applications, realtime data analysis such as clustering ana...
详细信息
ISBN:
(纸本)9783540717027
Advances in wireless networks and positioning technologies (e.g., CPS) have enabled new data management applications that monitor moving objects. In such new applications, realtime data analysis such as clustering analysis is becoming one of the most important requirements. In this paper, we present the problem of clustering moving objects in spatial networks and propose a unified framework to address this problem. Due to the innate feature of continuously changing positions of moving objects, the clustering results dynamically change. By exploiting the unique features of road networks, our framework first introduces a notion of cluster block (CB) as the underlying clustering unit. We then divide the clustering process into the continuous maintenance of CBs and periodical construction of clusters with different criteria based on CBs. The algorithms for efficiently maintaining and organizing the CBs to construct clusters are proposed. Extensive experimental results show that our clustering framework achieves high efficiency for clustering moving objects in real road networks.
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
This paper proposes a new locking protocol, SeCCX, for isolation of concurrent transactions on XML data. This protocol adopts the semantics of operations issued by users. Comparing with previous XML locking protocols,...
详细信息
Feature selection is a powerful tool of dimension reduction from datasets. In the last decade, more and more researchers have paid attentions on feature selection. Further, some researchers begin to focus on feature s...
详细信息
In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the comm...
详细信息
Predicting new user's reaction behavior to its recommended candidate partner correctly is critical to improve recommendation accuracy in online dating systems. However, new user (cold start) problem and data spars...
详细信息
ISBN:
(纸本)9783642258558
Predicting new user's reaction behavior to its recommended candidate partner correctly is critical to improve recommendation accuracy in online dating systems. However, new user (cold start) problem and data sparseness problem in the online dating system make this task very challenging. In this paper, we propose a hybrid method called crowd wisdom based behavior prediction to solve the two problems and achieve good prediction accuracy. By this method, old users who have been recommended partners before are first separated into groups. Users in each group have similar preference for partners. Then, we propose a novel measure to combine a group user's collective behavior to predict one user's behavior, which can solve the data sparseness problem. By calculating the probability a new user belongs to each group and utilizing the group's behavior we can solve the new user problem. Based on these strategies. we develop a behavior prediction algorithm for new users. Experimental results conducted on a real online dating dataset show that our proposed method performs better than other traditional methods.
N-gram approach takes the position information into account additionally and thus can offer higher accuracy in query answering than keyword based approaches and is widely used in IR and NLP. However, in large-scale RD...
详细信息
This paper proposes a new method to cluster law texts based on referential relation of laws. We extract law entities (an entity represents a law) and their referential relation from law texts. Then SimRank algorithm i...
详细信息
Previous methods on knowledge base question generation (KBQG) primarily focus on refining the quality of a single generated question. However, considering the remarkable paraphrasing ability of humans, we believe that...
详细信息
暂无评论