Collaborative filtering is an important personalized recommendation technique applied widely in E-commerce. It is not adapted to multi-interest or title recommendation for the 'general neighbourhood' problem w...
详细信息
Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, nam...
详细信息
database-as-a-Service is a promising data management paradigm in which data is encrypted before being sent to the untrusted server. Efficient querying on encrypted data is a performance critical problem which has vari...
详细信息
Extracting multi-records from web pages is useful, it allows us to integrate information from multiple sources to provide value-added services. Existing techniques still have some limitations because of their several ...
详细信息
Sequential pattern mining is an important problem in continuous, fast, dynamic and unlimited stream mining. Recently approximate mining algorithms are proposed which spend too many system resources and can only obtain...
详细信息
Sequential pattern mining is an important problem in continuous, fast, dynamic and unlimited stream mining. Recently approximate mining algorithms are proposed which spend too many system resources and can only obtain the partial feature of stream. In this paper, a multi-level evolving sequential pattern mining model ESPMM is presented to address this problem thus the mostly entire stream feature is obtained. Furthermore, because of the smaller support of sequential patterns in each level, a mining method BMLA based on Levenshtein-Automata is proposed which builds state conversion model to compute sequences' similarity in linear time. The experiment results show this model is effective and efficient.
In this paper, we present a system called CRO (Chinese Review Observer) for online product review structurization. By Structurization, we mean identifying, extracting and summarizing information from unstructured revi...
详细信息
ISBN:
(纸本)9781605581934
In this paper, we present a system called CRO (Chinese Review Observer) for online product review structurization. By Structurization, we mean identifying, extracting and summarizing information from unstructured review text to a structured table. The core tasks include review collection, product feature and user opinion extraction, and polarity analysis of opinions. Existing research in this area is mainly English text oriented. To deal with Chinese effectively, we propose several novel approaches for fulfilling the core tasks. Then we integrated these approaches and implement the whole procedure of review structurization in the system CRO. Running results for reviews of real products show its performance is satisfactory.
In update intensive applications, main memory database systems produce large volume of log records, it is critical to write out the log records efficiently to speedup transaction processing. We propose a parallel reco...
详细信息
In update intensive applications, main memory database systems produce large volume of log records, it is critical to write out the log records efficiently to speedup transaction processing. We propose a parallel recovery scheme based on XOR differential logging for main memory database systems in such environments. Some NVRAM is used to temporarily hold log records and decouple transaction committing from disk writes, inherited parallelism properties of differential logging are exploited to accelerate log flushing by using multiple log disks. During recovery, log records are loaded from multiple log disks and applied to data partition in time without the need of reordering according to serialization order, total recovery time is cut down. The scheme employs a data partition based consistent checkpointing method. The log records are classified according to IDs of data partitions accessed. data partitions are recovered according to loading priorities computed from update frequencies and transaction waiting times, data access demands of new transactions coming after failure recovery are given attention immediately, thus the scheme provides system availability during recovery, which is of importance for large scale main memory database systems.
Recently there have been growing interests in the applications of wireless sensor networks such as traffic tracking, environmental surveillance, and network monitoring. In these applications, the exploration of the re...
详细信息
ISBN:
(纸本)9781424432004;9780769531854
Recently there have been growing interests in the applications of wireless sensor networks such as traffic tracking, environmental surveillance, and network monitoring. In these applications, the exploration of the relationship and linkage of sensing data with other data sources can be naturally expressed by the external join, where the sensory tuples join with an external table at the base station. However, executing such kind of join queries in a highly distributed and resource-constraint sensor network is a challenging task. In this paper, we propose a partition-based algorithm called NEJA (in-network external join algorithm) for the external join processing in sensor networks. NEJA organizes the sensory data of the network through an optimized "value-to-storage" mapping, according to which each storage point stores the tuples that belong to the same subrange on the joint attribute. Then the subrange of each storage point is further partitioned into unit ranges, and tuples in the same unit range wisely choose their joining point that incurs the least communication cost based on a cost metric according to the latest historical statistics. Also, NEJA adopts some optimization techniques to handle the changes of sensory data and uses approximate approaches to cut down the maintenance cost of the mechanism. The experimental results indicate that our scheme is effective in reducing the amount of transmissions for the real time external join processing, especially when the external table has a relatively large size.
Frequent itemsets mining is an important problem in data mining. Frequent closed itemsets mining provides complete and condensed information for frequent pattern analysis thus reduces the memory cost without accuracy ...
详细信息
ISBN:
(纸本)9780769532639
Frequent itemsets mining is an important problem in data mining. Frequent closed itemsets mining provides complete and condensed information for frequent pattern analysis thus reduces the memory cost without accuracy loss. More research focus on stream mining with the more application of stream. Stream is fast and unlimited thus data had to be stored in limited memory, how to save running time and memory usage is the most important target. In this paper, we propose an improved frequent closed itemsets mining method based on traditional stream mining algorithm CFI-stream with bitmap coding named CLIMB (closed itemset mining with bitmap) over stream's sliding window. The distinct items are maintained in memory in lexicographic order and each itemset is coded to bit-sequence with the order of items, moreover, the bit-sequence is split into sections to be recoded to reduce the memory cost. The experimental results on real-life show that CLIMB algorithm is effective and efficient.
暂无评论