Previous methods on knowledge base question generation (KBQG) primarily focus on refining the quality of a single generated question. However, considering the remarkable paraphrasing ability of humans, we believe that...
详细信息
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. Due to its ability to capture long-range dependencies, the Transformer model provides a powerful mec...
详细信息
Existing methods on knowledge base question generation (KBQG) learn a one-size-fits-all model by training together all subgraphs without distinguishing the diverse semantics of subgraphs. In this work, we show that ma...
详细信息
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partit...
详细信息
ISBN:
(纸本)9781605586502
database-as-a-Service (DAS) is an emerging database management paradigm wherein partition based index is an effective way to querying encrypted data. However, previous research either focuses on one-dimensional partition or ignores multidimensional data distribution characteristic, especially sparsity and locality. In this paper, we propose Cluster based Onion Partition (COP), which is designed to decrease both false positive and dead space at the same time. Basically, COP is composed of two steps. First, it partition covered space level by level, which is like peeling of onion;second, at each level, a clustering algorithm based on local density is proposed to achieve local optimal secure partition. Extensive experiments on real dataset and synthetic dataset show that COP is a secure multidimensional partition with much less efficiency loss than previous top down or bottom up counterparts. Copyright 2009 ACM.
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
For ontology-based applications, the efficiency of ontology query is vital. Different from existing approaches, the paper improves performance of ontology query by materializing some derived relations. Experimental re...
详细信息
A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information need...
详细信息
With the increasing of XML data over the Internet, managing and analyzing huge amount of XML documents has played an important role for information management. Clustering as an intelligent technique has been utilized ...
详细信息
With the increasing of XML data over the Internet, managing and analyzing huge amount of XML documents has played an important role for information management. Clustering as an intelligent technique has been utilized as an excellent way of grouping the documents by their content or structure. However, the key problem is how to measure similarity between XML documents. In this paper, we propose an extended vector space model and on this basis put forward an effective semantic similarity measurement method combining content and structure semantics, in which a variety of XML document features impacting similarity measurement, such as term element frequency, term inverse element frequency, semantic weight of tag and level information of the term, are analyzed. In addition, information gain, for clustering quality evaluation are introduced motivated by the fact that collection has no classification information in advance. Experiment results show that proposed similarity method (EVSM_SS) outperforms the content and structure integration measurement based on structure path (VSM_SP) as well as traditional document clustering measurement (CO) in information gain and produce better clustering quality.
Configuration tuning is essential to optimize the performance of systems(e.g.,databases,key-value stores).High performance usually indicates high throughput and low *** present,most of the tuning tasks of systems are ...
详细信息
Configuration tuning is essential to optimize the performance of systems(e.g.,databases,key-value stores).High performance usually indicates high throughput and low *** present,most of the tuning tasks of systems are performed artificially(e.g.,by database administrators),but it is hard for them to achieve high performance through tuning in various types of systems and in various *** recent years,there have been some studies on tuning traditional database systems,but all these methods have some *** this article,we put forward a tuning system based on attention-based deep reinforcement learning named WATuning,which can adapt to the changes of workload characteristics and optimize the system performance efficiently and ***,we design the core algorithm named ATT-Tune for WATuning to achieve the tuning task of *** algorithm uses workload characteristics to generate a weight matrix and acts on the internal metrics of systems,and then ATT-Tune uses the internal metrics with weight values assigned to select the appropriate ***,WATuning can generate multiple instance models according to the change of the workload so that it can complete targeted recommendation services for different types of ***,WATuning can also dynamically fine-tune itself according to the constantly changing workload in practical applications so that it can better fit to the actual environment to make *** experimental results show that the throughput and the latency of WATuning are improved by 52.6%and decreased by 31%,respectively,compared with the throughput and the latency of CDBTune which is an existing optimal tuning method.
The biggest characteristic of the XML retrieval is able to return the element node results. This paper studies XML element search results clustering and proposes one similarity measurement method based on term semanti...
详细信息
The biggest characteristic of the XML retrieval is able to return the element node results. This paper studies XML element search results clustering and proposes one similarity measurement method based on term semantics, in which the "core" concept between terms is got through latent semantic indexing technology(LSI) and the same time the XML element node content and semantic structure properties(CASS) are combined. In addition, two new performance evaluation methodologies, namely R_ClusterRatio and R_DocuRatio are introduced to evaluate clustering quality. It is motivated by the observations of relevant documents distribution and the fact that the experiment data collection, IEEE CS corpus, do not provide classification information. Experiment results show that proposed similarity method combining term semantics with content and structure semantics integration(LSI-CASS) is feasible, and it produces better clustering quality than LSI-CAS and CASS.
暂无评论