Decentralization allows users to regain freedom and control over their digital life. As a global shared data space, the Linked Data already supports decentralization. Data providers are free to publish their data on t...
详细信息
ISBN:
(纸本)9783030590031;9783030590024
Decentralization allows users to regain freedom and control over their digital life. As a global shared data space, the Linked Data already supports decentralization. Data providers are free to publish their data on their web domains and users can execute decentralized sparql queries over multiple data sources. However, decentralization makes queryprocessing challenging, raising well-known problems of source discovery, answer completeness and performance. Existing approaches for decentralized sparql query processing raise issues related to autonomy and answer completeness. In this paper, we propose Qasino, an original approach for querying decentralized RDF data that targets both answer completeness, and source autonomy. Qasino is based on a decentralized random service that allows for discovering all relevant data sources. To speed up queryprocessing, sources executing similar queries cooperate by sharing their intermediate results. Our experimental results demonstrate that collaborative queryprocessing can significantly speedup queryprocessing in a decentralized setup.
sparql is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by i...
详细信息
ISBN:
(纸本)9781538672471
sparql is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by inferring unknown knowledge and removing incorrect knowledge. Hence, our long-term goal in this line of research is to propose a new framework that integrates KGE and sparql, which opens various research problems to be addressed. In this paper, we solve one of the most critical problems, that is, optimizing the performance of nearest neighbor (NN) search. In our evaluations, we demonstrate that the search time of state-of-the-art NN search algorithms is improved by 40% without sacrificing answer accuracy.
This paper revisits the classical problem of multiple query optimization in federated RDF systems. We propose a heuristic query rewriting-based approach to optimize the evaluation of multiple queries. This approach ca...
详细信息
This paper revisits the classical problem of multiple query optimization in federated RDF systems. We propose a heuristic query rewriting-based approach to optimize the evaluation of multiple queries. This approach can take advantage of sparql 1.1 to share the common computation of multiple queries while considering the cost of both query evaluation and data shipment. Although we prove that finding the optimal rewriting for multiple queries is NP-complete, we propose a heuristic rewriting algorithm with a bounded approximation ratio. Furthermore, we propose an efficient method to use the interconnection topology between RDF sources to filter out irrelevant sources, and utilize some characteristics of sparql 1.1 to optimize multiple joins of intermediate matches. The extensive experimental studies show that the proposed techniques are effective, efficient and scalable.
State-of-the-art distributed RDF systems partition data across multiple computer nodes (workers). Some systems perform cheap hash partitioning, which may result in expensive query evaluation. Others try to minimize in...
详细信息
State-of-the-art distributed RDF systems partition data across multiple computer nodes (workers). Some systems perform cheap hash partitioning, which may result in expensive query evaluation. Others try to minimize inter-node communication, which requires an expensive data preprocessing phase, leading to a high startup cost. Apriori knowledge of the query workload has also been used to create partitions, which, however, are static and do not adapt to workload changes. In this paper, we propose AdPart, a distributed RDF system, which addresses the shortcomings of previous work. First, AdPart applies lightweight partitioning on the initial data, which distributes triples by hashing on their subjects;this renders its startup overhead low. At the same time, the locality-aware query optimizer of AdPart takes full advantage of the partitioning to (1) support the fully parallel processing of join patterns on subjects and (2) minimize data communication for general queries by applying hash distribution of intermediate results instead of broadcasting, wherever possible. Second, AdPart monitors the data access patterns and dynamically redistributes and replicates the instances of the most frequent ones among workers. As a result, the communication cost for future queries is drastically reduced or even eliminated. To control replication, AdPart implements an eviction policy for the redistributed patterns. Our experiments with synthetic and real data verify that AdPart: (1) starts faster than all existing systems;(2) processes thousands of queries before other systems become online;and (3) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in subseconds.
The amount of RDF data being published on the Web is increasing at a massive rate. MapReduce-based distributed frameworks have become the general trend in processingsparql queries against RDF data. Currently, query p...
详细信息
The amount of RDF data being published on the Web is increasing at a massive rate. MapReduce-based distributed frameworks have become the general trend in processingsparql queries against RDF data. Currently, queryprocessing systems that use MapReduce have not been able to keep up with the increase of semantic annotated data, resulting in non-interactive sparql query processing. The principal reason is that intermediate query results from join operations in a MapReduce framework are so massive that they consume all available network bandwidth. In this article, the authors present an efficient sparqlprocessing system that uses MapReduce and HBase. The system runs a job optimized query plan using their proposed abstract RDF data to decrease the number of jobs and also decrease the amount of input data. The authors also present an efficient algorithm of using Map-side joins while also using the abstract RDF data to filter out unneeded RDF data. Experimental results show that the proposed approach demonstrates better performance when processing queries with a large amount of input data than those found in previous works.
Future data analytics will require enormous storage space for data-driven decisions, necessitating alternative storage sources for massive data archives. Storage solutions have always been in demand due to the limitat...
详细信息
Future data analytics will require enormous storage space for data-driven decisions, necessitating alternative storage sources for massive data archives. Storage solutions have always been in demand due to the limitations of existing media. Deoxyribonucleic Acid (DNA) is an emergent storage medium suitable for archival storage of rapidly increasing digital volumes. Due to its longevity, DNA storage technology has led to numerous applications to store and retrieve entire data. In this way, DNA synthesis and sequencing costs can be reduced by compressing data in full before it is stored. However, prior works have not used DNA storage to retrieve partial data from complex graphs, while taking advantage of cost-effective advanced analytics. In this paper, we present an efficient DNA-based queryprocessing system to retrieve partial information using RDF graph data. Moreover, using binary search, we fetch and decode significantly fewer DNA strands to obtain partial information about RDF graph data based on sparql queries. Specifically, the experimental analysis shows that the average data retrieval per query as output is found less than 1% for RDF graphs with more than 1MB (Megabytes) in size, which consequently reduces a significant amount of sequencing costs.
In this paper we present SPREFQL, an extension of the sparql language that allows appending a "PREFER" clause that expresses 'soft' preferences over the query results obtained by the main body of the...
详细信息
ISBN:
(纸本)9783319682884;9783319682877
In this paper we present SPREFQL, an extension of the sparql language that allows appending a "PREFER" clause that expresses 'soft' preferences over the query results obtained by the main body of the query. The extension does not add expressivity and any SPREFQL query can be transformed to an equivalent standard sparqlquery. However, clearly separating preferences from the 'hard' patterns and filters in the "WHERE" clause gives queries where the intention of the client is more cleanly expressed, an advantage for both human readability and machine optimization. In the paper we formally define the syntax and the semantics of the extension and we also provide empirical evidence that optimizations specific to SPREFQL improve run-time efficiency by comparison to the usually applied optimizations on the equivalent standard sparqlquery.
With the advent of huge data management systems storing voluminous data, there arises a need to develop efficient data analytics techniques for knowledge discovery at different levels of granularity. Resource Descript...
详细信息
ISBN:
(纸本)9781450349512
With the advent of huge data management systems storing voluminous data, there arises a need to develop efficient data analytics techniques for knowledge discovery at different levels of granularity. Resource Description Framework (RDF), mainly developed for Semantic Web, is presumably a good option when considering graph databases dealing with huge real-world data. RDF models information in the form of triples , and is considered as a useful tool to store graph data (aka linked data) where each edge can be stored as a triple. Due to existence of huge amount of linked data, mostly in the form of graphs, graph mining has been successful in attracting researchers from different research fields for efficient handling (storage, indexing, retrieval, etc.) of graph data. As a result, various APIs like GraphX and GraphFrames are developed to facilitate relational queries over graph data. Though GraphX is older than GraphFrames and processingsparql queries over GraphX has been explored by some researchers, to the best of our knowledge, sparql query processing over GraphFrames has not been explored yet. In this paper, we present an initial study on query-specific search space pruning and query optimization approach to process sparql queries over GraphFrames in an efficient manner. The experimental results, in terms of low response time for query execution, are encouraging, and give way to invest more research efforts in this direction.
暂无评论