The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations....
详细信息
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block edits would be useful in applications such as DNA sequence comparison and sentence similarity computation. However, the existing algorithms have generally focused on the normalized edit distance, and seldom of them consider the block swap operations at a higher level. In this paper, we introduce an extended edit distance algorithm which permits insertions, deletions, and substitutions at character level, and also permits block swap operations. Experimental results on randomly generated strings verify the algorithm's rationality and efficiency. The main contribution of this paper is that we present an algorithm to compute the lowest edit cost for string transformation with block swap in polynomial time, and propose a breaking points selection algorithm to improve the computation speed.
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quali...
详细信息
With the unceasing growth of XML data in World Wide Web, XML document retrieval and clustering retrieval results are confronted with both challenges and opportunities. One of the challenges is how to improve the quality of XML retrieval results. Firstly, according to the features of XML documents, a method of modeling XML retrieval result documents is brought forward, which integrates both structural semantic features and content information of XML documents. Then, a measure method to compute similarity, including structural semantic similarity and keywords similarity, between retrieval result documents is suggested;and a strategy named Item Frequency in Cluster-Inverse Cluster Frequency to extract labels from result clusters is presented. Experiments indicate that the clustering quality for XML retrieval results based on hybrid similarity is obviously better than the one only based on content similarity.
The integer sign vulnerability is a comparatively new and subtle type of vulnerabilities, they can compromise system security. Especially, if a sign vulnerability occurs in operating system kernel, it may result in ve...
详细信息
The integer sign vulnerability is a comparatively new and subtle type of vulnerabilities, they can compromise system security. Especially, if a sign vulnerability occurs in operating system kernel, it may result in very serious invalid read/write operations to kernel memory area. Unfortunately, little attention has been paid to static detecting them automatically. This paper presents a novel approach to detecting sign vulnerabilities in Linux kernel using type qualifier technique. We introduce three pairs of type qualifier and corresponding lattices to identify some key kernel data and relationships between them. Based on an extended type inference tool, we are able to effectively detect known and unknown sign vulnerabilities from elaborately preprocessed Linux kernel files. Our experiences demonstrate that type qualifier technique can be applied to detect sign vulnerabilities effectively.
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to pro...
详细信息
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree (MRST) which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others.
Advances in wireless sensor networks and positioning technologies enable new applications monitoring moving objects. Some of these applications, such as traffic management, require the possibility to query the future ...
详细信息
Advances in wireless sensor networks and positioning technologies enable new applications monitoring moving objects. Some of these applications, such as traffic management, require the possibility to query the future trajectories of the objects. In this paper, we propose an original data access method, the ANR-tree, which supports predictive queries. We focus on real life environments, where the objects move within constrained networks, such as vehicles on roads. We introduce a simulation-based prediction model based on graphs of cellular automata, which makes full use of the network constraints and the stochastic traffic behavior. Our technique differs strongly from the linear prediction model, which has low prediction accuracy and requires frequent updates when applied to real traffic with velocity changing frequently. The data structure extends the R-tree with adaptive units which group neighbor objects moving in the similar moving patterns. The predicted movement of the adaptive unit is not given by a single trajectory, but instead by two trajectory bounds based on different assumptions on the traffic conditions and obtained from the simulation. Our experiments, carried on two different datasets, show that the ANR-tree is essentially one order of magnitude more efficient than the TPR-tree, and is much more scalable.
In open environment, an effective trust mechanism should be built to ensure transaction safety. But current trust evaluation approaches solely based on the certificate or feedback are inaccurate and ineffective. Thus,...
详细信息
In open environment, an effective trust mechanism should be built to ensure transaction safety. But current trust evaluation approaches solely based on the certificate or feedback are inaccurate and ineffective. Thus, referring to the social trust models, this paper presents a capability enhanced method to improve the accuracy of evaluating trustworthiness. Based on it, this paper also designs a trust evaluation model ServTrust that naturally provides a solution to the trust problem. Compared with current feedback based approaches, simulation experimental results show the effectiveness of the proposed model in trust evaluating.
The processing of XML queries can result in evaluation of various structural relationships. Efficient algorithms for evaluating ancestor-descendant and parent-child relationships have been proposed. Whereas the proble...
详细信息
The processing of XML queries can result in evaluation of various structural relationships. Efficient algorithms for evaluating ancestor-descendant and parent-child relationships have been proposed. Whereas the problems of evaluating preceding-sibling-following-sibling and preceding-following relationships are still open. In this paper, we studied the structural join and staircase join for sibling relationship. First, the idea of how to filter out and minimize unnecessary reads of elements using parent's structural information is introduced, which can be used to accelerate structural joins of parent-child and preceding-sibling-following-sibling relationships. Second, two efficient structural join algorithms of sibling relationship are proposed. These algorithms lead to optimal join performance: nodes that do not participate in the join can be judged beforehand and then skipped using B^+-tree index. Besides, each element list joined is scanned sequentially once at most. Furthermore, output of join results is sorted in document order. We also discussed the staircase join algorithm for sibling axes. Studies show that, staircase join for sibling axes is close to the structural join for sibling axes and shares the same characteristic of high efficiency. Our experimental results not only demonstrate the effectiveness of our optimizing techniques for sibling axes, but also validate the efficiency of our algorithms. As far as we know, this is the first work addressing this problem specially.
Recently there have been growing interests in the applications of wireless sensor networks. Innovative techniques that improve energy efficiency to prolong the network lifetime are highly required. Clustering is an ef...
详细信息
Nowadays, WSMO (Web Service Modeling Ontology)1 has received great attention of academic and business communities, since its potential to achieve dynamic and scalable infrastructure for web services is extracted. Ther...
详细信息
keyword Search Over Relational databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In...
详细信息
keyword Search Over Relational databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-κ keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks (CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-κ results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-κ results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach.
暂无评论