Most traditional mining approaches of frequent item sets consider mainly on databases and thus can use the second storage and need multiple scans which are not adapted to mining of stream. Some new algorithms over str...
详细信息
Most traditional mining approaches of frequent item sets consider mainly on databases and thus can use the second storage and need multiple scans which are not adapted to mining of stream. Some new algorithms over stream's sliding window are presented recently, which perform addition and deletion over stream independently, so the common deleting strategy which removes the earliest transaction is used when the window slides. This paper considers both operations together to reduce the computation cost, consequently, three deleting strategies are proposed to improve the performance with little precision loss. The experimental results show that these strategies over current method are effective and efficient.
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As ...
详细信息
Compared with traditional magnetic disks, flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, digital camera and mobile phone). As the capacity increases and price drops, it looks like a perfect alternative for magnetic disks. However, due to hardware limitations of flash memory, techniques including storage subsystem and indexing originally designed for magnetic disks can not run smoothly in a flash memory without any modification. In this paper we explore problems of indexing flash-resided data and present a new dynamical hash index for flash memory in two schemas. The analysis and experimental results validate the efficiency of our design.
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we f...
详细信息
Nowadays more and more people like to publish their comments on a product on the Web. Mining such unstructured data (product reviews) is exciting hot and challenging research and application topic. In this paper, we focus on mining product reviews written in Chinese. We aim at extract the structural information from Chinese product reviews. By structural information, we mean product features and corresponding opinion words expressed in each review text. There are already some works done for reviews written in English, but less in Chinese. In this paper, we propose an effective method to extract candidate features and some effective pruning rules to prune the features. Also, we introduce a pattern extraction and matching step to improve our results. The experiment results show our approach is very effective, and has a good recall and precision.
In this paper, we discuss the energy efficient multicast problem for discrete power levels in ad hoc sensor wireless networks. The problem of our concern is: given n nodes and each node v has l(v) transmission power l...
详细信息
In this paper, we discuss the energy efficient multicast problem for discrete power levels in ad hoc sensor wireless networks. The problem of our concern is: given n nodes and each node v has l(v) transmission power levels and a multicast request (s, D), how to find a multicast tree rooted at s and spanning all destinations in D such that the total energy cost of the multicast tree is minimized. This problem is NP-hard. We propose a NWM_DST algorithm which has a theoretical guaranteed approximation performance ratio, and two efficient heuristics MNJT and g-D-MIP for multicast tree problem. Simulation results have shown efficiency of our proposed algorithms.
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations....
详细信息
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block edits would be useful in applications such as DNA sequence comparison and sentence similarity computation. However, the existing algorithms have generally focused on the normalized edit distance, and seldom of them consider the block swap operations at a higher level. In this paper, we introduce an extended edit distance algorithm which permits insertions, deletions, and substitutions at character level, and also permits block swap operations. Experimental results on randomly generated strings verify the algorithm's rationality and efficiency. The main contribution of this paper is that we present an algorithm to compute the lowest edit cost for string transformation with block swap in polynomial time, and propose a breaking points selection algorithm to improve the computation speed.
The integer sign vulnerability is a comparatively new and subtle type of vulnerabilities, they can compromise system security. Especially, if a sign vulnerability occurs in operating system kernel, it may result in ve...
详细信息
The integer sign vulnerability is a comparatively new and subtle type of vulnerabilities, they can compromise system security. Especially, if a sign vulnerability occurs in operating system kernel, it may result in very serious invalid read/write operations to kernel memory area. Unfortunately, little attention has been paid to static detecting them automatically. This paper presents a novel approach to detecting sign vulnerabilities in Linux kernel using type qualifier technique. We introduce three pairs of type qualifier and corresponding lattices to identify some key kernel data and relationships between them. Based on an extended type inference tool, we are able to effectively detect known and unknown sign vulnerabilities from elaborately preprocessed Linux kernel files. Our experiences demonstrate that type qualifier technique can be applied to detect sign vulnerabilities effectively.
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to pro...
详细信息
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree (MRST) which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others.
Nowadays, WSMO (Web Service Modeling Ontology)1 has received great attention of academic and business communities, since its potential to achieve dynamic and scalable infrastructure for web services is extracted. Ther...
详细信息
Recently there have been growing interests in the applications of wireless sensor networks. Innovative techniques that improve energy efficiency to prolong the network lifetime are highly required. Clustering is an ef...
详细信息
Recent research has focused on density queries for moving objects in highly dynamic scenarios. An area is dense if the number of moving objects it contains is above some threshold. Monitoring dense areas has applicati...
详细信息
暂无评论