As one part of preprocessing, automatic word segmentation is an key issue in Chinese information retrieval. Since integral words are put wholly together to compose into the more meaningful words and more express users...
详细信息
The main problem of existing static vulnerability detection methods based source code analysis is their high false positive and false negative rates. One main reason is lack of accurate and effective identification an...
详细信息
The main problem of existing static vulnerability detection methods based source code analysis is their high false positive and false negative rates. One main reason is lack of accurate and effective identification and analysis of security-related program elements, e.g. data validation checking, tainted data source, etc. A static vulnerability detection method based on data security state tracing and checking is proposed. In this method, the state space of state machine model is extended;the security state of a variable is identified by a vector that may correspond to multiple security-related properties rather than by a single property;Fine-grained state transition is provided to support accurate recognition of program security-related behaviors;The recognition of validation checking is introduced in vulnerability state machine to reduce false positives;and a systematic discrimination mechanism for tainted data is constructed to prevent false negatives result from neglecting tainted data sources. The experimental results of a prototype system show that this method can effectively detect buffer overflow and other type's vulnerabilities in software systems, and with obviously lower false positive than existing mainstream static detection methods and avoid some serious false negatives of these methods.
Sensor networks are widely used in many applications to collaboratively collect information from the physical environment. In these applications, the exploration of the relationship and linkage of sensing data within ...
详细信息
Sensor networks are widely used in many applications to collaboratively collect information from the physical environment. In these applications, the exploration of the relationship and linkage of sensing data within multiple regions can be naturally expressed by joining tuples in these regions. However, the highly distributed and resource-constraint nature of the network makes join a challenging query. In this paper, we address the problem of processing join query among different regions progressively and energy-efficiently in sensor networks. The proposed algorithm PEJA (Progressive Energy-efficient Join Algorithm) adopts an event-driven strategy to output the joining results as soon as possible, and alleviates the storage shortage problem in the in-network nodes. It also installs filters in the joining regions to prune unmatchable tuples in the early processing phase, saving lots of unnecessary transmissions. Extensive experiments on both synthetic and real world data sets indicate that the PEJA scheme outperforms other join algorithms, and it is effective in reducing the number of transmissions and the delay of query results during the join processing.
In keyword search over relational databases (KSORD), retrieval of user's initial query is often unsatisfying. User has to reformulate his query and execute the new query, which costs much time and effort. In this ...
详细信息
In keyword search over relational databases (KSORD), retrieval of user's initial query is often unsatisfying. User has to reformulate his query and execute the new query, which costs much time and effort. In this paper, a method of automatically reformulating user queries by relevance feedback is introduced, which is named VSM-RF. Aimed at the results of KSORD systems, VSM-RF adopts a ranking method based on vector space model to rank KSORD results. After the first time of retrieval, using user feedback or pseudo feedback just as user like, VSM-RF computes expansion terms based on probability and reformulates the new query using query expansion. After KSORD systems executing the new query, more relevant results are produced by the new query in the result list and presented to user. Experimental results verify this method's effectiveness.
keyword search over relational databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema...
详细信息
keyword search over relational databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. In KSORD, retrieval of user's initial query is often unsatisfying. User has to reformulate his query and execute the new query, which costs much time and effort. A method of automatically reformulating user queries by user feedback aimed at the results of KSORD is introduced in this paper, which is named UFBP (user feedback based on probability). After the first time of retrieval, according to the users' feedback information, UFBP computes terms to be added into the expanded query based on probability and reformulates the new query using query expansion. After KSORD executing the new query automatically, more relevant results are presented to user. Experimental results verify its effectiveness.
With the development of relational database, people require better database not only in the aspect of database performance, but also in the aspect of the database’s interactive ability. So that the database is much m...
详细信息
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering cu...
详细信息
In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering current ranking principles, the proposed semantic ranking function provides new metrics to measure query relevance. Based on it, two Top-k search algorithms BA (blocking algorithm) and EBA (early-stopping blocking algorithm) are presented. EBA improves BA by providing a filtering threshold to terminate iterations as early as possible. Finally, experimental results show the semantic ranking function guarantees a search result with high precision and recall, and the proposed BA and EBA algorithms improve query performance of existing approaches.
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory dat...
详细信息
This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory database (MMDBMS) more precisely. As the performance of MMDBMS is not affected by disk I/O, it is more sensitive to cache usage. Then using ETPC-C benchmark, we investigated the behavior of caches and processors extensively. We find that the miss stall time is mostly spent on on-CPU-chip caches, that is, the first and second level cache misses are dominant. Furthermore, we find instruction cache (I-cache) stall time of on-CPU-chip is a major component to the memory stall time. The smaller the emulated users, the more proportion the I-cache stall time of on-CPU-chip contributes to the memory stall time. However, if employing index, the system under test (SUT) has more total I-cache stall time than the SUT without index at the same number of emulated users and data population. Another observation is that the SUT with index has a little more branch misprediction rate than the SUT without index in average. Finally, we find only the third level (L3) D-cache stall time rate increases with the number of users. This is because L3 D-cache miss incremental rate is the largest. Under TPC-and ETPC-evaluation, we find that for optimized database performance on modern computers, reducing instruction miss penalty is equally important to reducing data miss penalty;since they are conflict efforts, the best way is to have them balanced.
Compared with traditional magnetic disks, Flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, Digital Camera and Mobile Phone) in r...
详细信息
Web document structural clustering is a useful task for many web intelligent applications, however, processing based on the structure of web documents have not yet received strong attention. In this paper, we propose ...
详细信息
暂无评论