Load shedding has been widely used in data stream management systems (DSMSs) to keep DSMSs running steadily. One key problem in load shedding is determining how much system load to shed. Existing works tend to adapt c...
详细信息
Load shedding has been widely used in data stream management systems (DSMSs) to keep DSMSs running steadily. One key problem in load shedding is determining how much system load to shed. Existing works tend to adapt coarse algorithm (CA) to solve this problem. In this paper, we present an adaptive PI controller-based load shedding framework for data stream. The main contribution of this paper is our use of feedback control theory to design the load shedding scheme. In contrast to the existing approaches, we firstly apply system identification to establish a dynamic model to describe DSMS, which enables us analyze DSMS quantitatively. Then, based on the model, we use the Root Locus method to design the PI controller with proven performance guarantees. The adaptive framework has been implemented by modifying Borealis system. Theoretic analysis and experimental results demonstrate that our approach is robust even when system load changes frequently. Comparing to existing strategies, our approach also achieves significantly better performance.
An effective way to optimize XML queries is to minimize XML queries. In this paper, we improve redundance elimination in XPath queries greatly by incorporating two novel kinds of constraints: parent constraint and sib...
详细信息
An effective way to optimize XML queries is to minimize XML queries. In this paper, we improve redundance elimination in XPath queries greatly by incorporating two novel kinds of constraints: parent constraint and sibling constraint, and by extending the tractable fragment to include descendant-or-self axis. The two novel kinds of constraints, together with child constraint and descendant constraint, form a family of constraints, which complicate the problem but offer possibilities for further minimization. Two techniques, tree augmentation and simulation augmentation, are employed to cope with constraints. We elaborate on the minimizing algorithms and running efficiencies both in the absence and in the presence of various kinds of constraints.
We study the problem of answering queries given a set of mappings between peer ontologies. In addition to the schema mapping between peer ontologies, there are axioms to give constraints to classes and properties. We ...
详细信息
We study the problem of answering queries given a set of mappings between peer ontologies. In addition to the schema mapping between peer ontologies, there are axioms to give constraints to classes and properties. We propose a set of rules to build graphs for the axioms. Because the axioms have different properties, the generated graphs are classified into four sets. In each peer, its RDF/OWL query languages can support regular expressions. If it wants to be transitive along semantic paths in peer knowledge management systems, we must rewrite conjunctive and disjunctive queries between peers. Because conjunctive queries are well-understood, we focus on a novel algorithm to rewrite disjunctive queries along semantic paths based on the graphs. For all atoms of a disjunctive query, we consider its union as a set and find the maximum rewritings over peers through a graphical way. Finally we do extensive simulation experiments. The simulation results show our algorithm can generate more rewritings than the naive rewriting algorithm at each distance.
In domain ontology, semantic association (SA) is used to depict the correlation between two concepts. In this paper, we define semantic association degree (SAD) for measuring SA in the domain ontology. We first presen...
详细信息
In domain ontology, semantic association (SA) is used to depict the correlation between two concepts. In this paper, we define semantic association degree (SAD) for measuring SA in the domain ontology. We first present a method to measure SAD of two direct related concepts by evaluating the semantic relationship between them, and then give another method to measure SAD of two indirect related concepts though SAD of two directed neighboring concepts. A set of comparison experiments show the benefit of our approaches.
It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper, we study processing an l-keyword query, p 1 , p...
详细信息
ISBN:
(纸本)1424408024
It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper, we study processing an l-keyword query, p 1 , p 1 , ..., p l , against a relational database which can be modeled as a weighted graph, G(V, E). Here V is a set of nodes (tuples) and E is a set of edges representing foreign key references between tuples. Let V i ⊆ V be a set of nodes that contain the keyword p i . We study finding top-k minimum cost connected trees that contain at least one node in every subset V i , and denote our problem as GST-k When k = 1, it is known as a minimum cost group Steiner tree problem which is NP-complete. We observe that the number of keywords, l, is small, and propose a novel parameterized solution, with l as a parameter, to find the optimal GST-1, in time complexity O(3 l n + 2 l ((l + logn)n + m)), where n and m are the numbers of nodes and edges in graph G. Our solution can handle graphs with a large number of nodes. Our GST-1 solution can be easily extended to support GST-k, which outperforms the existing GST-k solutions over both weighted undirected/directed graphs. We conducted extensive experimental studies, and report our finding.
Association rule mining is one of the most important and basic technique in data mining, which has been studied extensively and has a wide range of applications. However, as traditional data mining algorithms usually ...
详细信息
Association rule mining is one of the most important and basic technique in data mining, which has been studied extensively and has a wide range of applications. However, as traditional data mining algorithms usually only focus on analyzing data organized in single table, applying these algorithms in multi-relational data environment will result in many problems. This paper summarizes these problems, proposes a framework for the mining of multi-relational association rule, and gives a definition of the mining task. After classifying the existing work into two categories, it describes the main techniques used in several typical algorithms, and it also makes comparison and analysis among them. Finally, it points out some issues unsolved and some future further research work in this area.
Recent advances in database related applications propose many new challenges and have inspired database researchers and practitioners to further make their efforts on new database technologies.
Recent advances in database related applications propose many new challenges and have inspired database researchers and practitioners to further make their efforts on new database technologies.
作者:
王珊杜小勇孟小峰陈红School of Information
Renmin University of China MOE Key Lab of Data Engineering and Knowledge Engineering Beijing 100872 P.R. China
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off la...
详细信息
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.
Update management is very important for data integration systems. So update management in peer data management systems (PDMSs) is a hot research area. This paper researches on view maintenance in PDMSs. First, the d...
详细信息
Update management is very important for data integration systems. So update management in peer data management systems (PDMSs) is a hot research area. This paper researches on view maintenance in PDMSs. First, the definition of view is extended and the peer view, local view and global view are proposed according to the requirements of applications. There are two main factors to influence materialized views in PDMSs. One is that schema mappings between peers are changed, and the other is that peers update their data. Based on the requirements, this paper proposes an algorithm called 2DCMA, which includes two sub-algorithms: data and definition consistency maintenance algorithm% to effectively maintain views. For data consistency maintenance, Mork's rules are extended for governing the use of updategrams and boosters. The new rule system can be used to optimize the execution plan. And are extended for the data consistency maintenance algorithm is based on the new rule system. Furthermore, an ECA rule is adopted for definition consistency maintenance. Finally, extensive simulation experiments are conducted in SPDMS. The simulation results show that the 2DCMA algorithm has better performance than that of Mork's when maintaining data consistency. And the 2DCMA algorithm has better performance than that of centralized view maintenance algorithm when maintaining definition consistency.
For ontology-based applications, the efficiency of ontology query is vital. Different from existing approaches, the paper improves performance of ontology query by materializing some derived relations. Experimental re...
详细信息
暂无评论