Existing methods on knowledge base question generation (KBQG) learn a one-size-fits-all model by training together all subgraphs without distinguishing the diverse semantics of subgraphs. In this work, we show that ma...
详细信息
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partit...
详细信息
ISBN:
(纸本)9781605586502
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partition or ignores multidimensional data distribution characteristic, especially sparsity and locality. In this paper, we propose Cluster based Onion Partition (COP), which is designed to decrease both false positive and dead space at the same time. Basically, COP is composed of two steps. First, it partition covered space level by level, which is like peeling of onion;second, at each level, a clustering algorithm based on local density is proposed to achieve local optimal secure partition. Extensive experiments on real dataset and synthetic dataset show that COP is a secure multidimensional partition with much less efficiency loss than previous top down or bottom up counterparts. Copyright 2009 ACM.
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
For ontology-based applications, the efficiency of ontology query is vital. Different from existing approaches, the paper improves performance of ontology query by materializing some derived relations. Experimental re...
详细信息
A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information need...
详细信息
Configuration tuning is essential to optimize the performance of systems(e.g.,databases,key-value stores).High performance usually indicates high throughput and low *** present,most of the tuning tasks of systems are ...
详细信息
Configuration tuning is essential to optimize the performance of systems(e.g.,databases,key-value stores).High performance usually indicates high throughput and low *** present,most of the tuning tasks of systems are performed artificially(e.g.,by database administrators),but it is hard for them to achieve high performance through tuning in various types of systems and in various *** recent years,there have been some studies on tuning traditional database systems,but all these methods have some *** this article,we put forward a tuning system based on attention-based deep reinforcement learning named WATuning,which can adapt to the changes of workload characteristics and optimize the system performance efficiently and ***,we design the core algorithm named ATT-Tune for WATuning to achieve the tuning task of *** algorithm uses workload characteristics to generate a weight matrix and acts on the internal metrics of systems,and then ATT-Tune uses the internal metrics with weight values assigned to select the appropriate ***,WATuning can generate multiple instance models according to the change of the workload so that it can complete targeted recommendation services for different types of ***,WATuning can also dynamically fine-tune itself according to the constantly changing workload in practical applications so that it can better fit to the actual environment to make *** experimental results show that the throughput and the latency of WATuning are improved by 52.6%and decreased by 31%,respectively,compared with the throughput and the latency of CDBTune which is an existing optimal tuning method.
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR...
详细信息
Link-based similarity measures play a significant role in many graph based applications. Consequently, mea- suring node similarity in a graph is a fundamental problem of graph data mining. Personalized PageRank (PPR) and Sim- Rank (SR) have emerged as the most popular and influen- tial link-based similarity measures. Recently, a novel link- based similarity measure, penetrating rank (P-Rank), which enriches SR, was proposed. In practice, PPR, SR and P-Rank scores are calculated by iterative methods. As the number of iterations increases so does the overhead of the calcula- tion. The ideal solution is that computing similarity within the minimum number of iterations is sufficient to guaran- tee a desired accuracy. However, the existing upper bounds are too coarse to be useful in general. Therefore, we focus on designing an accurate and tight upper bounds for PPR, SR, and P-Rank in the paper. Our upper bounds are designed based on the following intuition: the smaller the difference between the two consecutive iteration steps is, the smaller the difference between the theoretical and iterative similar- ity scores becomes. Furthermore, we demonstrate the effec- tiveness of our upper bounds in the scenario of top-k similar nodes queries, where our upper bounds helps accelerate the speed of the query. We also run a comprehensive set of exper- iments on real world data sets to verify the effectiveness and efficiency of our upper bounds.
In this paper, we study how to perform XML query expansion effectively from the high quality pseudo-relevance documents. A solution for selecting good expansion information is presented, in which various features impa...
详细信息
In this paper, we study how to perform XML query expansion effectively from the high quality pseudo-relevance documents. A solution for selecting good expansion information is presented, in which various features impacting weight, such as term element frequency, term inverse element frequency, semantic weight of tag and level information, are analyzed and those term with high weigh value are selected as expansion term. Experiment results show that proposed expansion method is feasible. Compared to original query and traditional expansion method with no structure features considered, our method achieves better retrieval performance.
To support dramatically increased traffic loads,communication networks become *** cell association(CA)schemes are timeconsuming,forcing researchers to seek fast *** paper proposes a deep Q-learning based scheme,whose ...
详细信息
To support dramatically increased traffic loads,communication networks become *** cell association(CA)schemes are timeconsuming,forcing researchers to seek fast *** paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is *** the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until *** the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user *** demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional ***,performance metrics,such as capacity and fairness,can be guaranteed.
Recently, social tagging systems become more and more popular in many Web 2.0 applications. In such systems, Users are allowed to annotate a particular resource with a freely chosen a set of tags. These user-generated...
详细信息
暂无评论