data mining researches focus on algorithms that mine valuable patterns from particular domain. Apart from the theoretical research, experiments take a vast amount of effort to build. In this paper, we propose an integ...
详细信息
data mining researches focus on algorithms that mine valuable patterns from particular domain. Apart from the theoretical research, experiments take a vast amount of effort to build. In this paper, we propose an integrated framework that utilises a multi-agent system to support the researchers to rapidly develop experiments. Moreover, the proposed framework allows extension and integration for future researches in mutual aspects of agent and data mining. The paper describes the details of the framework and also presents a sample implementation.
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering cu...
详细信息
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering current ranking principles, the proposed semantic ranking function provides new metrics to measure query relevance. Based on it, two Top-k search algorithms BA (blocking algorithm) and EBA (early-stopping blocking algorithm) are presented. EBA improves BA by providing a filtering threshold to terminate iterations as early as possible. Finally, experimental results show the semantic ranking function guarantees a search result with high precision and recall, and the proposed BA and EBA algorithms improve query performance of existing approaches.
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory dat...
详细信息
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory database (MMDBMS) more precisely. As the performance of MMDBMS is not affected by disk I/O, it is more sensitive to cache usage. Then using ETPC-C benchmark, we investigated the behavior of caches and processors extensively. We find that the miss stall time is mostly spent on on-CPU-chip caches, that is, the first and second level cache misses are dominant. Furthermore, we find instruction cache (I-cache) stall time of on-CPU-chip is a major component to the memory stall time. The smaller the emulated users, the more proportion the I-cache stall time of on-CPU-chip contributes to the memory stall time. However, if employing index, the system under test (SUT) has more total I-cache stall time than the SUT without index at the same number of emulated users and data population. Another observation is that the SUT with index has a little more branch misprediction rate than the SUT without index in average. Finally, we find only the third level (L3) D-cache stall time rate increases with the number of users. This is because L3 D-cache miss incremental rate is the largest. Under TPC-and ETPC-evaluation, we find that for optimized database performance on modern computers, reducing instruction miss penalty is equally important to reducing data miss penalty;since they are conflict efforts, the best way is to have them balanced.
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As ...
详细信息
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As the capacity increases and price drops, it looks like a perfect alternative for magnetic disks. However, due to hardware limitations of flash memory, techniques including storage subsystem and indexing originally designed for magnetic disks can not run smoothly in a flash memory without any modification. In this paper we explore problems of indexing flash-resided data and present a new dynamical hash index for flash memory in two schemas. The analysis and experimental results validate the efficiency of our design.
Web document structural clustering is a useful task for many web intelligent applications, however, processing based on the structure of web documents have not yet received strong attention. In this paper, we propose ...
详细信息
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we f...
详细信息
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we focus on mining product reviews written in Chinese. We aim at extract the structural information from Chinese product reviews. By structural information, we mean product features and corresponding opinion words expressed in each review text. There are already some works done for reviews written in English, but less in Chinese. In this paper, we propose an effective method to extract candidate features and some effective pruning rules to prune the features. Also, we introduce a pattern extraction and matching step to improve our results. The experiment results show our approach is very effective, and has a good recall and precision.
To meet more and more complex recommendation needs, it is quite important to implement hybrid recommendations for mobile commerce. In this paper, we propose a design for open hybrid recommendation systems in mobile co...
详细信息
To meet more and more complex recommendation needs, it is quite important to implement hybrid recommendations for mobile commerce. In this paper, we propose a design for open hybrid recommendation systems in mobile commerce, which could integrate multiple recommendation algorithms together to improve recommendation performance. First, three solutions for an open hybrid recommendation approach are discussed in detail, which are generic customer profile, weighted hybrid recommendation algorithm, and mobile device profile creation. After that, we give out a multi-agent architecture design to make the three solutions work together. Finally a prototype system based on our proposed architecture is implemented to demonstrate the feasibility of our design and evaluate the performance of the proposed open hybrid recommendation system.
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quali...
详细信息
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quality of XML retrieval results. Firstly, according to the features of XML documents, a method of modeling XML retrieval result documents is brought forward, which integrates both structural semantic features and content information of XML documents. Then, a measure method to compute similarity, including structural semantic similarity and keywords similarity, between retrieval result documents is suggested;and a strategy named Item Frequency in Cluster-Inverse Cluster Frequency to extract labels from result clusters is presented. Experiments indicate that the clustering quality for XML retrieval results based on hybrid similarity is obviously better than the one only based on content similarity.
In this paper, we discuss the energy efficient multicast problem for discrete power levels in ad hoc sensor wireless networks. The problem of our concern is: given n nodes and each node v has l(v) transmission power l...
详细信息
In this paper, we discuss the energy efficient multicast problem for discrete power levels in ad hoc sensor wireless networks. The problem of our concern is: given n nodes and each node v has l(v) transmission power levels and a multicast request (s, D), how to find a multicast tree rooted at s and spanning all destinations in D such that the total energy cost of the multicast tree is minimized. This problem is NP-hard. We propose a NWM_DST algorithm which has a theoretical guaranteed approximation performance ratio, and two efficient heuristics MNJT and g-D-MIP for multicast tree problem. Simulation results have shown efficiency of our proposed algorithms.
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations....
详细信息
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block edits would be useful in applications such as DNA sequence comparison and sentence similarity computation. However, the existing algorithms have generally focused on the normalized edit distance, and seldom of them consider the block swap operations at a higher level. In this paper, we introduce an extended edit distance algorithm which permits insertions, deletions, and substitutions at character level, and also permits block swap operations. Experimental results on randomly generated strings verify the algorithm's rationality and efficiency. The main contribution of this paper is that we present an algorithm to compute the lowest edit cost for string transformation with block swap in polynomial time, and propose a breaking points selection algorithm to improve the computation speed.
暂无评论