作者:
Li YuSchool of Information
Key Laboratory of Data Engineering and Knowledge Engineerin Renmin University of China Beijing China
Collaborative filtering is an important personalized recommendation technique applied widely in E-commerce. It is not adapted to multi-interest or title recommendation for the 'general neighbourhood' problem w...
详细信息
ISBN:
(纸本)9781424432004;9780769531854
Collaborative filtering is an important personalized recommendation technique applied widely in E-commerce. It is not adapted to multi-interest or title recommendation for the 'general neighbourhood' problem which is analyzed in this paper. Based on it, collaborative filtering recommendation based on community is presented by introducing the concept 'community neighbourhood' in the paper. Unfortunately, it results into severer sparsity problem which makes heavy effect on its performance. In order to overcome it, an ontological A-priori score is used to infer user preference and to pre-fill null rating first. After pre-filling using the ontology method, then collaborative filtering based on community is executed based on a dense rating matrix. The experiment shows that collaborative filtering based on community makes generally better performance than traditional method when data is not very sparse, and ontology method can truly enhance collaborative filtering based on community since the sparsity is overcame.
In the research field of supply chain coordination,many coordination contracts have been well *** chain members still feel confused about which contract should be chosen for their specific needs and *** paper starts f...
详细信息
In the research field of supply chain coordination,many coordination contracts have been well *** chain members still feel confused about which contract should be chosen for their specific needs and *** paper starts from the essential analysis of supply chain coordination,and summaries four important affecting factors as well as the attributes in coordination,including market demand,competitors' relationship,supply chain structure,and decisions *** importantly this paper studies the related products' characters,such as storage life,customer's loyalty,etc,which are seldom discussed in coordination before,and analyzes the influence in *** on these research,the chain members could analyze their specific product's characters and affecting factors,then choose the proper coordination contracts.
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering cu...
详细信息
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering current ranking principles, the proposed semantic ranking function provides new metrics to measure query relevance. Based on it, two Top-k search algorithms BA (blocking algorithm) and EBA (early-stopping blocking algorithm) are presented. EBA improves BA by providing a filtering threshold to terminate iterations as early as possible. Finally, experimental results show the semantic ranking function guarantees a search result with high precision and recall, and the proposed BA and EBA algorithms improve query performance of existing approaches.
data mining researches focus on algorithms that mine valuable patterns from particular domain. Apart from the theoretical research, experiments take a vast amount of effort to build. In this paper, we propose an integ...
详细信息
data mining researches focus on algorithms that mine valuable patterns from particular domain. Apart from the theoretical research, experiments take a vast amount of effort to build. In this paper, we propose an integrated framework that utilises a multi-agent system to support the researchers to rapidly develop experiments. Moreover, the proposed framework allows extension and integration for future researches in mutual aspects of agent and data mining. The paper describes the details of the framework and also presents a sample implementation.
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory dat...
详细信息
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory database (MMDBMS) more precisely. As the performance of MMDBMS is not affected by disk I/O, it is more sensitive to cache usage. Then using ETPC-C benchmark, we investigated the behavior of caches and processors extensively. We find that the miss stall time is mostly spent on on-CPU-chip caches, that is, the first and second level cache misses are dominant. Furthermore, we find instruction cache (I-cache) stall time of on-CPU-chip is a major component to the memory stall time. The smaller the emulated users, the more proportion the I-cache stall time of on-CPU-chip contributes to the memory stall time. However, if employing index, the system under test (SUT) has more total I-cache stall time than the SUT without index at the same number of emulated users and data population. Another observation is that the SUT with index has a little more branch misprediction rate than the SUT without index in average. Finally, we find only the third level (L3) D-cache stall time rate increases with the number of users. This is because L3 D-cache miss incremental rate is the largest. Under TPC-and ETPC-evaluation, we find that for optimized database performance on modern computers, reducing instruction miss penalty is equally important to reducing data miss penalty;since they are conflict efforts, the best way is to have them balanced.
Nowadays many popular Peer-to-Peer (P2P) systems suffer from the simultaneous attacks of various pollution, including file-targeted attack and index-targeted attack. However, to our knowledge, there is no model that t...
详细信息
Nowadays many popular Peer-to-Peer (P2P) systems suffer from the simultaneous attacks of various pollution, including file-targeted attack and index-targeted attack. However, to our knowledge, there is no model that takes both of them into consideration. In fact, the two attacks impact the effect of each other. It makes the models considering either kind of pollution only fail to accurately illustrate the actual pollution. In this paper, we develop a unified model to remedy the defect. Through the analysis from the perspective of user behavior, the two attacks are integrated into the unified model as two factors impacting users' choice of the files to download. The modeled file proliferation processes are consistent to those measured in real P2P systems. Moreover, the co-effect of the two attacks is also analyzed. The extremum point of co-effect is found, which corresponds to the most efficient attack of pollution. Further analysis of the model's accuracy requires the quantitative comparison between the modeled effects of pollution and the measured ones. Nonetheless, no such metric has ever been proposed, which also causes a lot of problems in evaluating the effect of pollution and anti-pollution techniques. To fix the deficiency, we propose several metrics to assess the effect of pollution, including abort ratio, average download time of unpolluted files, etc. These metrics estimate the effect of pollution from different aspects. They are applied to the analysis of pollution emulated by our unified model. The co-effect of pollution is captured by these metrics. Furthermore, the difference between our model and the previously developed ones is also reflected by them.
Most traditional mining approaches of frequent item sets consider mainly on databases and thus can use the second storage and need multiple scans which are not adapted to mining of stream. Some new algorithms over str...
详细信息
Most traditional mining approaches of frequent item sets consider mainly on databases and thus can use the second storage and need multiple scans which are not adapted to mining of stream. Some new algorithms over stream's sliding window are presented recently, which perform addition and deletion over stream independently, so the common deleting strategy which removes the earliest transaction is used when the window slides. This paper considers both operations together to reduce the computation cost, consequently, three deleting strategies are proposed to improve the performance with little precision loss. The experimental results show that these strategies over current method are effective and efficient.
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As ...
详细信息
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As the capacity increases and price drops, it looks like a perfect alternative for magnetic disks. However, due to hardware limitations of flash memory, techniques including storage subsystem and indexing originally designed for magnetic disks can not run smoothly in a flash memory without any modification. In this paper we explore problems of indexing flash-resided data and present a new dynamical hash index for flash memory in two schemas. The analysis and experimental results validate the efficiency of our design.
Web document structural clustering is a useful task for many web intelligent applications, however, processing based on the structure of web documents have not yet received strong attention. In this paper, we propose ...
详细信息
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we f...
详细信息
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we focus on mining product reviews written in Chinese. We aim at extract the structural information from Chinese product reviews. By structural information, we mean product features and corresponding opinion words expressed in each review text. There are already some works done for reviews written in English, but less in Chinese. In this paper, we propose an effective method to extract candidate features and some effective pruning rules to prune the features. Also, we introduce a pattern extraction and matching step to improve our results. The experiment results show our approach is very effective, and has a good recall and precision.
暂无评论