检索结果-内蒙古大学图书馆

Chinese integral word segmentation recognition based on reliability

Journal of Information and Computational Science 2009年第1期6卷 533-542页

作者： Wan, Changxuan Wang, Fang School of Information Technology Jiangxi University of Finance and Economics Nanchang 330013 China Jiangxi Key Laboratory of Data and Knowledge Engineering Jiangxi University of Finance and Economics Nanchang 330013 China School of Humanities Nanchang University Nanchang 330013 China

As one part of preprocessing, automatic word segmentation is an key issue in Chinese information retrieval. Since integral words are put wholly together to compose into the more meaningful words and more express users' query minds in information retrieval, we combine the calculation of mutual information, prefix- and-suffix information of integral word with the reliability of composing into the integral word, and put forward three Chinese integral word segmentation recognition method based on reliability according to the characteristics of the integral words. Then, we design and realize the prototype system of three Chinese integral word segmentation recognition methods based on reliability. Finally, tests and analysis on 2nd SIGHAN (2005) PKU test corpus show that the recognition performance of the system is good, and can satisfy different needs at the same time. © 2009 Binary Information Press.

关键词： Reliability

来源：评论

学校读者我要写书评

暂无评论

Static vulnerabilities detection method based on security state tracing and checking

引用

Jisuanji Xuebao/Chinese Journal of Computers 2009年第5期32卷 899-909页

作者： Liang, Bin Hou, Kan-Kan Shi, Wen-Chang Liang, Zhao-Hui School of Information Renming University of China Beijing 100872 China MOE Key Laboratory of Data Engineering and Knowledge Engineering Beijing 100872 China

The main problem of existing static vulnerability detection methods based source code analysis is their high false positive and false negative rates. One main reason is lack of accurate and effective identification and analysis of security-related program elements, e.g. data validation checking, tainted data source, etc. A static vulnerability detection method based on data security state tracing and checking is proposed. In this method, the state space of state machine model is extended;the security state of a variable is identified by a vector that may correspond to multiple security-related properties rather than by a single property;Fine-grained state transition is provided to support accurate recognition of program security-related behaviors;The recognition of validation checking is introduced in vulnerability state machine to reduce false positives;and a systematic discrimination mechanism for tainted data is constructed to prevent false negatives result from neglecting tainted data sources. The experimental results of a prototype system show that this method can effectively detect buffer overflow and other type's vulnerabilities in software systems, and with obviously lower false positive than existing mainstream static detection methods and avoid some serious false negatives of these methods.

关键词： Static analysis

来源：评论

学校读者我要写书评

暂无评论

PEJA:Progressive Energy-Efficient Join Processing for Sensor Networks

引用

Journal of Computer Science & Technology 2008年第6期23卷 957-972页

作者：赖永炫陈毅隆陈红 School of Information Renmin University of China Key Laboratory of Data Engineering and Knowledge Engineering Ministry of Education

Sensor networks are widely used in many applications to collaboratively collect information from the physical environment. In these applications, the exploration of the relationship and linkage of sensing data within multiple regions can be naturally expressed by joining tuples in these regions. However, the highly distributed and resource-constraint nature of the network makes join a challenging query. In this paper, we address the problem of processing join query among different regions progressively and energy-efficiently in sensor networks. The proposed algorithm PEJA （Progressive Energy-efficient Join Algorithm） adopts an event-driven strategy to output the joining results as soon as possible, and alleviates the storage shortage problem in the in-network nodes. It also installs filters in the joining regions to prune unmatchable tuples in the early processing phase, saving lots of unnecessary transmissions. Extensive experiments on both synthetic and real world data sets indicate that the PEJA scheme outperforms other join algorithms, and it is effective in reducing the number of transmissions and the delay of query results during the join processing.

关键词： progressive join minimal join set in-network processing sensor network

来源：评论

学校读者我要写书评

暂无评论

VSM-RF: A method of relevance feedback in keyword Search over Relational databases

VSM-RF: A method of relevance feedback in Keyword Search ove...

引用

IEEE International Symposium on Information (IT) in Medicine and Education, ITME

作者： Zhao-hui Peng Jun Zhang Shan Wang Chang-liang Wang Li-zhen Cui School of Computer Science and Technology Shandong University Jinan 250101 China School of Computer Science and Technology Dalian Maritime University 116026 China Key Laboratory of Data Engineering and Knowledge Engineering Ministry of Education Beijing 100872 China The Hong Kong University of Science and Technology China

In keyword search over relational databases (KSORD), retrieval of user's initial query is often unsatisfying. User has to reformulate his query and execute the new query, which costs much time and effort. In this paper, a method of automatically reformulating user queries by relevance feedback is introduced, which is named VSM-RF. Aimed at the results of KSORD systems, VSM-RF adopts a ranking method based on vector space model to rank KSORD results. After the first time of retrieval, using user feedback or pseudo feedback just as user like, VSM-RF computes expansion terms based on probability and reformulates the new query using query expansion. After KSORD systems executing the new query, more relevant results are produced by the new query in the result list and presented to user. Experimental results verify this method's effectiveness.

关键词： Feedback keyword search Relational databases Computer science Information retrieval laboratories data engineering knowledge engineering Computer science education Costs

来源：评论

学校读者我要写书评

暂无评论

Bring User Feedback into keyword Search over databases

Bring User Feedback into Keyword Search over Databases

引用

Web Information Systems and Applications Conference (WISA)

作者： Zhaohui Peng Jun Zhang Shan Wang Changliang Wang School of Computer Science & Technology Shandong University Jinan China Key Laboratory of Data Engineering & Knowledge Engineering Ministry of Education Beijing China School of Computer Science & Technology Dalian Maritime University Dalian China Hong Kong University of Science and Technology Hong Kong China

keyword search over relational databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. In KSORD, retrieval of user's initial query is often unsatisfying. User has to reformulate his query and execute the new query, which costs much time and effort. A method of automatically reformulating user queries by user feedback aimed at the results of KSORD is introduced in this paper, which is named UFBP (user feedback based on probability). After the first time of retrieval, according to the users' feedback information, UFBP computes terms to be added into the expanded query based on probability and reformulates the new query using query expansion. After KSORD executing the new query automatically, more relevant results are presented to user. Experimental results verify its effectiveness.

关键词： Feedback keyword search Relational databases Tree graphs Information retrieval Computer science Educational technology Writing Information systems Application software

来源：评论

学校读者我要写书评

暂无评论

The Research on the Algorithms of keyword Search in Relational database

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2008年 4977卷 134-143页

作者： Li, Peng Zhu, Qing Wang, Shan Key Laboratory of Data Engineering and Knowledge Engineering School of Information Renmin University of China Beijing100872 China

ISBN: (纸本)354089375X

With the development of relational database, people require better database not only in the aspect of database performance, but also in the aspect of the database’s interactive ability. So that the database is much more friendly than just before and it is possible for a common user who do not have any special knowledge on database can access the database, without knowing the schema of database and writing intricate SQL. For the reason that the information retrieval on the web has developed well to some extent, when we develop the technology of keyword search in relational database, we can draw some ideas from information retrieve. But the great differences between the text database on the web and the relational database also bring some new challenges: 1) The answer needed by user is not only one tuple in database, but the tuple sets consisting of the tuple connect from different table using the "primary key-foreign key" relationship. 2)The results of the evaluation criteria is more important, because it is directly related to the effectiveness of Search System. 3)The structure of relational database is much more intricate than text database, and the algorithms of information retrieval are not fit the relational database. So in this paper, we introduce a novel keyword search algorithm and a modified criteria of evaluating answers in order to enhance efficiency of the keyword search and return much more effective information to users, finally, the search algorithm’s performance is tested and evaluated. © Springer-Verlag Berlin Heidelberg 2008

关键词： Information retrieval

来源：评论

学校读者我要写书评

暂无评论

Top-K keyword search for supporting semantics in relational databases

引用

Ruan Jian Xue Bao/Journal of Software 2008年第9期19卷 2362-2375页

作者： Wang, Bin Yang, Xiao-Chun Wang, Guo-Ren College of Information Science and Engineering Northeastern University Shenyang 110004 China Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China Beijing 100872 China

In order to enhance the search results of keyword search in relational databases, semantic relationship among relations and tuples is employed and a semantic ranking function is proposed. In addition to considering current ranking principles, the proposed semantic ranking function provides new metrics to measure query relevance. Based on it, two Top-k search algorithms BA (blocking algorithm) and EBA (early-stopping blocking algorithm) are presented. EBA improves BA by providing a filtering threshold to terminate iterations as early as possible. Finally, experimental results show the semantic ranking function guarantees a search result with high precision and recall, and the proposed BA and EBA algorithms improve query performance of existing approaches.

关键词： Information retrieval

来源：评论

学校读者我要写书评

暂无评论

OLTP workloads on modern processor: Characterization and analysis

引用

Journal of Computational Information Systems 2008年第1期4卷 389-394页

作者： Liu, Dawei Qin, Biao Wang, Shan Gong, Weiwei School of Information Renmin University of China Beijing 100872 China Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China Beijing 200872 China

This paper analysis of how OLTP workloads interact with modern processors and caches behavior. First, we extend TPC-C, the OLTP-oriented benchmark, to ETPC-C benchmark, for measuring the performance of main-memory database (MMDBMS) more precisely. As the performance of MMDBMS is not affected by disk I/O, it is more sensitive to cache usage. Then using ETPC-C benchmark, we investigated the behavior of caches and processors extensively. We find that the miss stall time is mostly spent on on-CPU-chip caches, that is, the first and second level cache misses are dominant. Furthermore, we find instruction cache (I-cache) stall time of on-CPU-chip is a major component to the memory stall time. The smaller the emulated users, the more proportion the I-cache stall time of on-CPU-chip contributes to the memory stall time. However, if employing index, the system under test (SUT) has more total I-cache stall time than the SUT without index at the same number of emulated users and data population. Another observation is that the SUT with index has a little more branch misprediction rate than the SUT without index in average. Finally, we find only the third level (L3) D-cache stall time rate increases with the number of users. This is because L3 D-cache miss incremental rate is the largest. Under TPC-and ETPC-evaluation, we find that for optimized database performance on modern computers, reducing instruction miss penalty is equally important to reducing data miss penalty;since they are conflict efforts, the best way is to have them balanced.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

A new dynamic hash index for flash-based storage

A new dynamic hash index for flash-based storage

引用

9th International Conference on Web-Age Information Management, WAIM 2008

作者： Xiang, Li Zhou, Da Meng, Xiaofeng School of Information Renmin University of China Key Laboratory of Data Engineering and Knowledge Engineering MOE

ISBN: (纸本)9780769531854

Compared with traditional magnetic disks, Flash memory has many advantages and has been used as external storage media for a wide spectrum of electronic devices (such as PDA, MP3, Digital Camera and Mobile Phone) in recent years. As the capacity increases and price drops, it looks like a perfect alternative for magnetic disks. However, due to hardware limitations of flash memory, techniques including storage subsystem and indexing originally designed for magnetic disks can not run smoothly in a flash memory without any modification. In this paper we explore problems of indexing flash-resided data and present a new dynamical hash index for flash memory in two schemas. The analysis and experimental results validate the efficiency of our design. © 2008 IEEE.

关键词： Flash memory

来源：评论

学校读者我要写书评

暂无评论

Shingles-based structural clustering of web documents

引用

Journal of Computational Information Systems 2008年第6期4卷 2777-2785页

作者： Xia, Tian Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China Beijing 100872 China School of Information Resource Management Renmin University of China Beijing 100872 China

Web document structural clustering is a useful task for many web intelligent applications, however, processing based on the structure of web documents have not yet received strong attention. In this paper, we propose a shingles-based approach to clustering web documents by structure. Firstly, semi-structured web documents are converted into structured tree which composed of a set of limited nodes, and structural features are extracted by shingles. Secondly, we define document distance and structural distance matrix, and then structural similarity is calculated according to this matrix. Finally, we cluster the document structure based on modified k-means algorithm. Different from existing methods, we construct shingles not only including real vertical paths, but also virtual horizontal paths. Weight factors are also considered to optimize the algorithm. Experimental results show the effectiveness of the new shingles-based similarity measurement and the structural clustering. The proposed document similarity, as well as the structural shingles analysis, could be applied to other web-based research issues. © 2008 Binary Information Press.

关键词： Information systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：