This paper focuses on parallel query optimization. We consider the operator problem and introduce a new class of execution strategies called Linear-oriented Bushy Trees (LBT). Compared to the related approach of the G...
详细信息
This paper focuses on parallel query optimization. We consider the operator problem and introduce a new class of execution strategies called Linear-oriented Bushy Trees (LBT). Compared to the related approach of the General Bushy Trees (GBT) a significant complexity reduction of the operator ordering problem can be derived theoretically and demonstrated experimentally (e.g, compared with GBTs, LBTs authorize optimization time improvement that can reach up to 49%) without losing quality. Finally we demonstrate that existing commercial parallelquery optimizers need little extension mod ifications in order to handle LBTs. (C) 2000 Elsevier Science B.V. All rights reserved.
We propose a parallel optimizer for queries containing a large number of joins, as well as set operators and aggregate functions. The platform of execution is a shared-disk multiprocessor machine supporting bushy para...
详细信息
We propose a parallel optimizer for queries containing a large number of joins, as well as set operators and aggregate functions. The platform of execution is a shared-disk multiprocessor machine supporting bushy parallelism and pipeline. Our model partitions the query into almost independent subtrees that can be optimized simultaneously and applies an enhanced variation of the iterative improvement technique on those of the subtrees, which contain a large number of joins. This technique is parallelized, too. In order to estimate the cost of the states constructed during optimization of join subtrees, cost formulae are developed that estimate the cost of relational algebra operators when executed across coalescing pipes.
Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by ...
详细信息
Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join operation. This is known as "Groupby-After-Join" queries-the focus of this paper. In parallel processing of such queries, it must be decided which attribute is used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, two parallel processing methods, namely join partition method (JPM) and aggregate partition method (APM) are discussed. The behaviours of these parallelization methods are described in terms of cost models. Experiments are performed based on simulations. The simulation results show that the aggregate partition method performs better than the join partition method. (C) 2004 Elsevier Inc. All rights reserved.
Analytics on modern data analytic and data warehouse systems often need to run large complex queries on increasingly complex database schemas. A lot of progress has been made on executing such complex queries using te...
详细信息
ISBN:
(数字)9781665408837
ISBN:
(纸本)9781665408837
Analytics on modern data analytic and data warehouse systems often need to run large complex queries on increasingly complex database schemas. A lot of progress has been made on executing such complex queries using techniques like scale out query processing, hardware accelerators like GPUs and code generation techniques. However, optimization of such queries remains a challenge. Existing optimal solutions either cannot be effectively parallelized, or are inefficient while doing a lot of unnecessary work. In this demonstration, we present our system, GPU-QO, which aims to demonstrate queryoptimization techniques for large analytical queries using GPUs. We first demonstrate Massively parallel Dynamic Programming (MPDP) - a novel queryoptimization technique that can run on GPUs to generate optimal plans in a (massively) parallel and efficient manner. We then showcase IDP2-MPDP and UnionDP - two heuristic techniques, again using GPUs, that can even optimize queries containing 1000s of joins. Furthermore, we compare our techniques with current state-of-the-art solutions, and demonstrate how our techniques can reduce optimization time for optimal solutions by nearly two orders of magnitude and produce much better query plans for heuristics (up to 7x).
Modern data analytical workloads often need to run queries over a large number of tables. An optimal query plan for such queries is crucial for being able to run these queries within acceptable time bounds. However, w...
详细信息
ISBN:
(纸本)9781450392495
Modern data analytical workloads often need to run queries over a large number of tables. An optimal query plan for such queries is crucial for being able to run these queries within acceptable time bounds. However, with queries involving many tables, finding the optimal join order becomes a bottleneck in queryoptimization. Due to the exponential nature of join order optimization, optimizers resort to heuristic solutions after a threshold number of tables. Our objective is two fold: (a) reduce the optimization time for generating optimal plans;and (b) improve the quality of the heuristic solution. In this paper, we propose a new massively parallel algorithm, MPDP, that can efficiently prune the large search space (via a novel plan enumeration technique) while leveraging the massive parallelism offered by modern hardware (Eg: GPUs). When evaluated on real-world benchmark queries with PostgreSQL, MPDP is at least an order of magnitude faster compared to state-of-the-art techniques for large analytical queries. As a result, we can increase the heuristic-fall-back limit from 12 relations to 25 relations with the same time budget in PostgreSQL. Also, to handle queries with even larger number of tables, we augment MPDP to a well-known heuristic, IDP 2 (iterative DP version 2) and a novel heuristic UnionDP. By systematically exploring a much larger search space, these heuristics provides query plans that are up to 7 times cheaper as compared to the state-of-the-art techniques while being faster to compute.
暂无评论