In update intensive main memory database applications, huge volume of log records is generated, to maintain the ACID properties of the database system, the log records should be persistent efficiently. Delegating logg...
详细信息
In update intensive main memory database applications, huge volume of log records is generated, to maintain the ACID properties of the database system, the log records should be persistent efficiently. Delegating logging of one main memory database to another main memory database is proposed. The scheme is elaborated in detail in terms of architecture, logging & safeness levels, checkpointing, and recovery. Strict durability and relax durability are provided. When some form of non-volatile memory is used to temporarily holding log records, not only logging efficiency is improved, but also the scheme could guarantee full ACID of the system. We also propose using parallel logging to speedup log persistence by writing logs to multiple disks in parallel. Since interconnection network techniques progress by leaps and bounds, the scheme eliminates the concern about whether the system's overall performance may be slowed down by bandwidth and latency limitations. Experiment results demonstrate the feasibility of the proposal.
Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be...
详细信息
Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naïve document similarity measurement is proposed in this paper. Based on this, k-median clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user's intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.
SimRank is a well known algorithm which conducts link analysis to measure similarity between each pair of nodes (nodepair). But it suffers from high computational cost, limiting its usage in large-scale datasets. More...
详细信息
SimRank is a well known algorithm which conducts link analysis to measure similarity between each pair of nodes (nodepair). But it suffers from high computational cost, limiting its usage in large-scale datasets. Moreover, Links between nodes are changing over time. It may be desirable to quickly approximate the similarity score between certain nodepair without performing a large-scale computation on the entire graph. In our approach we propose a method to efficiently estimate the similarity score using only a small subgraph of the entire graph. We call this novel algorithm “Local-SimRank”. The experimental results conducted on real datasets and synthetic dataset show that our algorithm efficiently produces good approximations to the global SimRank scores. Meanwhile, we prove that the Local-SimRank score LS(a, b) is always less than original SimRank score S(a, b) mathematically.
Given a set of lists, where items of each list are sorted by the ascending order of their values, the objective of this paper is to figure out the common items that appear in all of the lists efficiently. This problem...
详细信息
Given a set of lists, where items of each list are sorted by the ascending order of their values, the objective of this paper is to figure out the common items that appear in all of the lists efficiently. This problem is sometimes known as common items extraction from sorted lists. To solve this problem, one common approach is to scan all items of all lists sequentially in parallel until one of the lists is exhausted. However, we observe that if the overlap of items across all lists is not high, such sequential access approach can be significantly improved. In this paper, we propose two algorithms, MergeSkip and MergeESkip, to solve this problem by taking the idea of skipping as many items of lists as possible. As a result, a large number of comparisons among items can be saved, and hence the efficiency can be improved. We conduct extensive analysis of our proposed algorithms on one real dataset and two synthetic datasets with different data distributions. We report all our findings in this paper.
Web applications become more and more important, and the corresponding security problems have been concerned about. This paper presents TASA, an ASP static analyzer, which employs a path-sensitive, inter-procedural an...
详细信息
Web applications become more and more important, and the corresponding security problems have been concerned about. This paper presents TASA, an ASP static analyzer, which employs a path-sensitive, inter-procedural and contextsensitive data flow analysis, mainly concerning the taint propagation and sanitization. This paper also discusses some techniques used in TASA, such as sanitization routines modeling, ASP specific features, alias analysis and path-related routines modeling, to prune false positives. Experiments on four open source applications show that TASA has a rate of false positive of 4.98% and it can avoid certain false warnings owing to the proposed approaches.
Certificateless cryptography eliminates the key escrow problem in identity-based cryptography. Hierarchical cryptography exploits a practical security model to mirror the organizational hierarchy in the real world. In...
详细信息
In recent years, many scientists have done some work in monitoring the fog, and achieved fruitful results. Now we want to conduct more in-depth study. In this paper, we utilize MODTRAN to simulate the relationship bet...
详细信息
Top-k query is a powerful technique in uncertain databases because of the existence of exponential possible worlds, and it is necessary to combine score and confidence of tuples to derive top k answers. Different sema...
详细信息
A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compar...
详细信息
ISBN:
(纸本)9781424467013;9780769540191
A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compared to Top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. As a powerful semantic of Top-k in uncertain database, Global Top-k return k highest-ranked tuples according to their probabilities of being in the Top-k answers in possible worlds. We propose a x-tuple based method to process Global Top-k aggregate queries in uncertain database. Our method has two levels, group state generation and G-x-Top-k query processing. In the former level, group states, which satisfy the properties of x-tuple, are generated one after the other according to their aggregate values, while in the latter level, dynamic programming based Global x-tuple Top-k query processing are employed to return the answers. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.
This paper considers the problem of constructing data aggregation trees in wireless sensor networks (WSNs)for a group of sensor nodes to send collected information to a single sink *** data aggregation tree contains t...
详细信息
This paper considers the problem of constructing data aggregation trees in wireless sensor networks (WSNs)for a group of sensor nodes to send collected information to a single sink *** data aggregation tree contains the sink node,all the source nodes,and some other non-source *** goal of constructing such a data aggregation tree is to minimize the number of non-source nodes to be included in the tree so as to save *** prove that the data aggregation tree problem is NP-hard and then propose an approximation algorithm with a performance ratio of four and a greedy *** also give a distributed version of the approximation *** simulations are performed to study the performance of the proposed *** results show that the proposed algorithms can find a tree of a good approximation to the optimal tree and has a high degree of scalability.
暂无评论