In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for...
详细信息
In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for short). The algorithm predicts an object's next reference distance according to the retrieval system's running status and replaces the objects that have maximum reuse distances. The factors considered in the replacement algorithm include the access frequency, creation cost, and size of objects, as well as the queries being executed. By taking into account the queries currently running or queuing in the system, MRD algorithm can predict more accurately the reuse distances of index data objects.
XML Retrieval is becoming the focus study of the field of information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, ...
详细信息
XML Retrieval is becoming the focus study of the field of information Retrieval and database. Summarization of the results which come from the XML search engines will alleviate the read burden of user's. However, as the basis of this study, the construction of the query-oriented XML text summarization corpus has not yet received enough attention. In this paper, we introduce our works on constructing this kind of corpus, including the selection of topics and XML elements/documents, construction process and the feature of the constructed corpus. Up to now, the corpus has 25 English query topics, including 422 elements for summarization, and 32 Chinese topics which including 402 elements. For each topic, 4 pieces of extracted summaries and 4 pieces of generated summaries are made manually by 4 experts.
In update intensive main memory database applications, huge volume of log records is generated, to maintain the ACID properties of the database system, the log records should be persistent efficiently. Delegating logg...
详细信息
In update intensive main memory database applications, huge volume of log records is generated, to maintain the ACID properties of the database system, the log records should be persistent efficiently. Delegating logging of one main memory database to another main memory database is proposed. The scheme is elaborated in detail in terms of architecture, logging & safeness levels, checkpointing, and recovery. Strict durability and relax durability are provided. When some form of non-volatile memory is used to temporarily holding log records, not only logging efficiency is improved, but also the scheme could guarantee full ACID of the system. We also propose using parallel logging to speedup log persistence by writing logs to multiple disks in parallel. Since interconnection network techniques progress by leaps and bounds, the scheme eliminates the concern about whether the system's overall performance may be slowed down by bandwidth and latency limitations. Experiment results demonstrate the feasibility of the proposal.
Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be...
详细信息
Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naïve document similarity measurement is proposed in this paper. Based on this, k-median clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user's intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.
Much research has been done on integrated use of ISO management system standards. Integrated use of management systems is identified to have shared values of varied integration impacts on resources efficiency building...
详细信息
Much research has been done on integrated use of ISO management system standards. Integrated use of management systems is identified to have shared values of varied integration impacts on resources efficiency building and sustainable development of business processes. However, little research has been done on integrated use of business continuity management systems (BCMS), records management systems (RMS) and knowledge management systems (KMS). This paper proposes a holistic integration management approach for collaboration, optimization and innovation of the three management systems through mapping/building/operationalizing cycle for supply of efficiency building strategy to the dynamic accumulation, sharing, and exchanges of memory, evidence and knowledge of organization.
A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compar...
详细信息
ISBN:
(纸本)9781424467013;9780769540191
A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compared to Top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. As a powerful semantic of Top-k in uncertain database, Global Top-k return k highest-ranked tuples according to their probabilities of being in the Top-k answers in possible worlds. We propose a x-tuple based method to process Global Top-k aggregate queries in uncertain database. Our method has two levels, group state generation and G-x-Top-k query processing. In the former level, group states, which satisfy the properties of x-tuple, are generated one after the other according to their aggregate values, while in the latter level, dynamic programming based Global x-tuple Top-k query processing are employed to return the answers. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.
Web applications become more and more important, and the corresponding security problems have been concerned about. This paper presents TASA, an ASP static analyzer, which employs a path-sensitive, inter-procedural an...
详细信息
Web applications become more and more important, and the corresponding security problems have been concerned about. This paper presents TASA, an ASP static analyzer, which employs a path-sensitive, inter-procedural and contextsensitive data flow analysis, mainly concerning the taint propagation and sanitization. This paper also discusses some techniques used in TASA, such as sanitization routines modeling, ASP specific features, alias analysis and path-related routines modeling, to prune false positives. Experiments on four open source applications show that TASA has a rate of false positive of 4.98% and it can avoid certain false warnings owing to the proposed approaches.
Top-k query is a powerful technique in uncertain databases because of the existence of exponential possible worlds, and it is necessary to combine score and confidence of tuples to derive top k answers. Different sema...
详细信息
Top-k query is a powerful technique in uncertain databases because of the existence of exponential possible worlds, and it is necessary to combine score and confidence of tuples to derive top k answers. Different semantics, the combination methods of score and confidence, lead to different results. U-kRanks and Global Top-k are two semantics of Top-k queries in uncertain database, which consider every alternative in x-tuple as single one and return the tuple which has the highest probability appearing at top k or a given rank. However, no matter which alternative (tuple) of an x-tuple appears in a possible world, it undoubtedly believes that this x-tuple appears in the same possible world accordingly. Thus, instead of ranking every individual tuple, we define two novel Top-k queries semantics in uncertain database, Uncertain x-kRanks queries (U-x-kRanks) and Global x-Top-k queries (G-x-Top-k), which return k entities according to the score and the confidence of alternatives in x-tuple, respectively. In order to reduce the search space, we present an efficient algorithm to process U-x-kRanks queries and G-x-Top-k queries. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.
Certificateless cryptography eliminates the key escrow problem in identity-based cryptography. Hierarchical cryptography exploits a practical security model to mirror the organizational hierarchy in the real world. In...
详细信息
Certificateless cryptography eliminates the key escrow problem in identity-based cryptography. Hierarchical cryptography exploits a practical security model to mirror the organizational hierarchy in the real world. In this paper, to incorporate the advantages of both types of cryptosystems, we instantiate hierarchical certificate less cryptography by formalizing the notion of hierarchical certificate less signatures. Furthermore, we propose an HCLS scheme which, under the hardness of the computational Diffie-Hellman (CDH) problem, is proven to be existentially unforgeable against adaptive chosen-message attacks in the random oracle model. As to efficiency, our scheme has constant complexity, regardless of the depth of the hierarchy. Hence, the proposal is secure and scalable for practical applications.
暂无评论