queryprocessing is that method or technique in area of mobile atmosphere which deal using mobile environment. This research in that way user can think advancement using queryprocessing and it takes more use for spee...
详细信息
ISBN:
(纸本)9781538608074
queryprocessing is that method or technique in area of mobile atmosphere which deal using mobile environment. This research in that way user can think advancement using queryprocessing and it takes more use for speedily accessing global queryprocessing and it is joint venture of queryprocessing between dissimilar sites includes fixed server movable computer. The necessity saving of energy and also the existence of lopsided features in a moveable computing atmosphere, the predictable queryprocessing for a scattered database cannot be straightly genuine to a moveable computing organization. The mobile environment is a collection of mobile diverse hosts, which are enabled to communicate using wireless links. These wireless links are change giving to the natures of mobile networks. Moreover, nodes in the ad-hoc network have to link without any centralized or help. The usability for the user also changes using queryprocessing tool. Thus, this mechanism that allows the sharing of functionality among different devices in same environment change the way of user communication for searching fast query time in mobile computing environment. The aim of this research is to make innovation in queryoptimization technique in mobile computing environment. This research focuses on improvement of various queryoptimization techniques for highly effective and trustworthy queryoptimization methods.
User-defined functions (UDFs) in modern SQL database systems and Big Data processing systems such as Spark-that offer API bindings in high-level languages such as Python or Scala-make automatic optimization challengin...
详细信息
ISBN:
(纸本)9781450367356
User-defined functions (UDFs) in modern SQL database systems and Big Data processing systems such as Spark-that offer API bindings in high-level languages such as Python or Scala-make automatic optimization challenging. The foundation of modern database queryoptimization is the collection of statistics describing the data to be processed, but when a database or Big Data computation is partially obscured by UDFs, good statistics are often unavailable. In this paper, we describe a query optimizer called the Monsoon optimizer. In the presence of UDFs, the Monsoon optimizer may choose to collect statistics on the UDFs, and then run the computation. Or, it may optimize and execute part of the plan, collecting statistics on the result of the partial plan, followed by a re-optimization step, with the process repeated as needed. Monsoon decides how to interleave execution and statistics collection in a principled fashion by formalizing the problem as a Markov decision process.
One can audit SQL applications by running SQL programs over sequences of persistent snapshots, but care is needed to avoid wasteful duplicate computation. This paper describes the design, implementation, and performan...
详细信息
ISBN:
(纸本)9781450367356
One can audit SQL applications by running SQL programs over sequences of persistent snapshots, but care is needed to avoid wasteful duplicate computation. This paper describes the design, implementation, and performance of RID, the first language-independent optimization framework that eliminates duplicate computations in SQL programs running over low-level snapshots by exploiting snapshot metadata efficiently.
We introduce a simple data model to process non-relational data for relational operations, and SHC (Apache Spark - Apache HBase Connector), an implementation of this model in the cluster computing framework, Spark. SH...
详细信息
ISBN:
(纸本)9781538655207
We introduce a simple data model to process non-relational data for relational operations, and SHC (Apache Spark - Apache HBase Connector), an implementation of this model in the cluster computing framework, Spark. SHC leverages optimization techniques of relational data processing over the distributed and column-oriented key-value store (i.e., HBase). Compared to existing systems, SHC makes two major contributions. At first, SHC offers a much tighter integration between optimizations of relational data processing and non-relational data store, through a plug-in implementation that integrates with Spark SQL, a distributed in-memory computing engine for relational data. The design makes the system maintenance relatively easy, and enables users to perform complex data analytics on top of key-value store. Second, SHC leverages the Spark SQL Catalyst engine for high performance queryoptimizations and processing, e.g., data partitions pruning, columns pruning, predicates pushdown and data locality. SHC has been deployed and used in multiple production environments with hundreds of nodes, and provides OLAP queryprocessing on petabytes of data efficiently.
Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The...
详细信息
ISBN:
(纸本)9781450337168
Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The main challenge relies in the nature of the traditional index structures e.g., B-Tree and R-Tree, being centralized by nature, and hence are almost crippled when deployed in a distributed environment. To address this challenge, this paper presents Kangaroo, a system built on top of Hadoop to optimize the execution of range queries over range data. The main idea behind Kangaroo is to split the data into non-overlapping partitions in a way that minimizes the query execution time. Kangaroo is query workload aware, i.e., results in partitioning layouts that minimize the queryprocessing time of given query patterns. In this paper, we study the design challenges Kangaroo addresses in order to be deployed on top of a distributed file system, i.e., HDFS. We also study four different partitioning schemes that Kangaroo can support. With extensive experiments using real range data of more than one billion records and real query workload of more than 30,000 queries, we show that the partitioning schemes of Kangaroo can significantly reduce the I/O of range queries on range data.
Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a p...
详细信息
Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold epsilon. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like epsilon, data size, and number of dimensions increase. (C) 2015 Elsevier Ltd. All rights reserved.
In a top-k Geometric Intersection query (top-k GIQ) problem, a set of n weighted, geometric objects in R-d is to be preprocessed into a compact data structure so that for any query geometric object, q, and integer k &...
详细信息
In a top-k Geometric Intersection query (top-k GIQ) problem, a set of n weighted, geometric objects in R-d is to be preprocessed into a compact data structure so that for any query geometric object, q, and integer k > 0, the k largest-weight objects intersected by q can be reported efficiently. While the top-k problem has been studied extensively for non-geometric problems (e. g., recommender systems), the geometric version has received little attention. This paper gives a general technique to solve any top-k GIQ problem efficiently. The technique relies only on the availability of an efficient solution for the underlying (non-top-k) GIQ problem, which is often the case. Using this, asymptotically efficient solutions are derived for several top-k GIQ problems, including top-k orthogonal and circular range search, point enclosure search, halfspace range search, etc. Implementations of some of these solutions, using practical data structures, show that they are quite efficient in practice. This paper also does a formal investigation of the hardness of the top-k GIQ problem, which reveals interesting connections between the top-k GIQ problem and the underlying (non-top-k) GIQ problem.
Nowadays, personal information is being distributed into more and more heterogeneous sources, which presents a huge obstacle to management and retrieval of personal information. To address this problem, this paper pre...
详细信息
Nowadays, personal information is being distributed into more and more heterogeneous sources, which presents a huge obstacle to management and retrieval of personal information. To address this problem, this paper presents the blueprint of a novel Personal Information Management (PIM) system named 3SEPIAS (short for Semi-Structured Search Engine for Personal Information in dAtaspace System). 3SEPIAS has three main features, data integration without upfront semantic reconciliation, flexible query model for data having sparse and evolving schema, and efficient best-effort proximity search approach on graphs. For that, we first propose a semi-structured graph data model called Interpreted Object Model (IOM) to uniformly represents a user's heterogeneous personal information and loosely integrates it into a dataspace in a schema-later way. Then, a Semi-Structured Search Engine (3SE) can be used to search over the personal dataspaces. We propose an intuitive 3SE query Language (3SQL) that enables users to query in a varying degree of structural constraint according to their knowledge of underlying schemas. Moreover, a best-effort top-k proximity search optimization strategy and corresponding graph index structures are proposed to improve the efficiency of queryprocessing. We perform comprehensive experiments to test both effectiveness and efficiency of our proximity search approach. The results reveal that 3SE can beat the previous proximity search systems by a large margin with only a little or even no loss of result quality, especially for large graphs. (C) 2012 Elsevier Inc. All rights reserved.
The purpose of this talk is to provide a comprehensive state of the art concerning the evolution of data management systems from uni-processor systems to large scale distributed systems. We focus our study on the quer...
详细信息
ISBN:
(纸本)9781450313070
The purpose of this talk is to provide a comprehensive state of the art concerning the evolution of data management systems from uni-processor systems to large scale distributed systems. We focus our study on the query processing and optimization methods. For each environment, we recall their motivations and point out main characteristics of proposed methods, especially, the nature of decision-making (centralized or decentralized control for high level of scalability), adaptive level (intra-operator and/or inter-operator), impact of parallelism (partitioned and pipelined parallelism) and dynamicity (e.g. elasticity) of execution models.
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. P...
详细信息
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.
暂无评论