Sequential pattern mining is an important problem in continuous, fast, dynamic and unlimited stream mining. Recently approximate mining algorithms are proposed which spend too many system resources and can only obtain...
详细信息
Both Content analysis and link, analysis have its advantages in measuring relationships among documents. In this paper. we propose a new method to combine these two methods to compute the similarity of research papers...
详细信息
ISBN:
(纸本)9783540881919
Both Content analysis and link, analysis have its advantages in measuring relationships among documents. In this paper. we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately, without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted oil ACM data Set show that our new algorithm. S-SimRank, outperforms other algorithms.
database-as-a-Service is a promising data management paradigm in which data is encrypted before being sent to the untrusted server. Efficient querying on encrypted data is a performance critical problem which has vari...
详细信息
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quali...
详细信息
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quality of XML retrieval results. Firstly, according to the features of XML documents, a method of modeling XML retrieval result documents is brought forward, which integrates both structural semantic features and content information of XML documents. Then, a measure method to compute similarity, including structural semantic similarity and keywords similarity, between retrieval result documents is suggested;and a strategy named Item Frequency in Cluster-Inverse Cluster Frequency to extract labels from result clusters is presented. Experiments indicate that the clustering quality for XML retrieval results based on hybrid similarity is obviously better than the one only based on content similarity.
Collaborative filtering is an important personalized recommendation technique applied widely in E-commerce. It is not adapted to multi-interest or title recommendation for the 'general neighbourhood' problem w...
详细信息
Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, nam...
详细信息
Extracting multi-records from web pages is useful, it allows us to integrate information from multiple sources to provide value-added services. Existing techniques still have some limitations because of their several ...
详细信息
In update intensive applications, main memory database systems produce large volume of log records, it is critical to write out the log records efficiently to speedup transaction processing. We propose a parallel reco...
详细信息
In update intensive applications, main memory database systems produce large volume of log records, it is critical to write out the log records efficiently to speedup transaction processing. We propose a parallel recovery scheme based on XOR differential logging for main memory database systems in such environments. Some NVRAM is used to temporarily hold log records and decouple transaction committing from disk writes, inherited parallelism properties of differential logging are exploited to accelerate log flushing by using multiple log disks. During recovery, log records are loaded from multiple log disks and applied to data partition in time without the need of reordering according to serialization order, total recovery time is cut down. The scheme employs a data partition based consistent checkpointing method. The log records are classified according to IDs of data partitions accessed. data partitions are recovered according to loading priorities computed from update frequencies and transaction waiting times, data access demands of new transactions coming after failure recovery are given attention immediately, thus the scheme provides system availability during recovery, which is of importance for large scale main memory database systems.
With the rapid development of information retrieval technology and daily increasing information in the Internet, common users can retrieve many text-based database and get part of the information through the search en...
详细信息
With the rapid development of information retrieval technology and daily increasing information in the Internet, common users can retrieve many text-based database and get part of the information through the search engines such as Google, and Baidu. However, there is a great amount of data contained in the background relational database of web pages. So there are many researches focusing on the search in these relational database with keywords, compared with these researches, our algorithms are mainly based on bags using the greedy algorithms and supporting the phrase recognition by utilizing multiple dictionaries. We make a comparison between our algorithm and the existing ones. The experiment results shows that our algorithm owns not only the feature of effectiveness but also the feature of efficiency.
暂无评论