Collection join queries are join queries based on collection attributes (i.e. non-atomic attributes), which are common in object-oriented databases. We have identified three different kinds of collection join queries,...
详细信息
Collection join queries are join queries based on collection attributes (i.e. non-atomic attributes), which are common in object-oriented databases. We have identified three different kinds of collection join queries, namely;cullection-equijoin,collection-intersectjoin, andsub-collectionjoin. In this paper, we propose parallel join algorithms for these three collection join query types based on a combination of sort and hash methods, which we callparallel sort-hash, collection join algorithms. The proposed join algorithms play an important role in parallel object-oriented queryprocessing, due to their superiority over the conventional join methods which are usually in a form of relational division, and also the inefficiency of the original join predicate processing. In our implementation of these algorithms on a shared-memory machine, we show that the combination between sort and hash methods is proven to be better than the conventional sort-merge and nested-loop based parallel join processing
In this paper, parallelization models for path expressions queries are studied. Path expression queries involve multiple classes along aggregation/association hierarchies. parallelization models for path expression qu...
详细信息
In this paper, parallelization models for path expressions queries are studied. Path expression queries involve multiple classes along aggregation/association hierarchies. parallelization models for path expression queries are "inter-object parallelization" and "inter-class parallelization". Inter-object parallelization exploits the associativity within complex objects, whereas inter-class parallelization imposes upon process independence. The behaviours of these parallelization models are described in terms of analytical models. Performance evaluation is also performed to confirm the results from the quantitative analysis. (C) 1999 Elsevier Science Inc. All rights reserved.
As data volume and queryprocessing loads increase, companies that provide information retrieval services are turning to highperformance parallel computing, storage and searching. In this paper we present a new paradi...
详细信息
ISBN:
(纸本)3540644431
As data volume and queryprocessing loads increase, companies that provide information retrieval services are turning to highperformance parallel computing, storage and searching. In this paper we present a new paradigm of semantic parallelism dedicated to documentary databases. Based on existing parallel database techniques, our approach uses particular features of documentary databases to retrieve semantically relevant information in a more rapid and efficient way. Thus it can greatly alleviate both information overload and vocabulary problem of information retrieval. Extensive simulation results confirm the efficiency of our approach.
In this paper, we explore an approach of inter-leaving a bushy execution tree with hash filters to improve the execution of multi-join queries. Similar to semi-joins in distributed queryprocessing, hash filters can b...
详细信息
In this paper, we explore an approach of inter-leaving a bushy execution tree with hash filters to improve the execution of multi-join queries. Similar to semi-joins in distributed queryprocessing, hash filters can be applied to eliminate non-matching tuples from joining relations before the execution of a join, thus reducing the join cost. Note that hash filters built in different execution stages of a bushy tree can have different costs and effects. The effect of hash filters is evaluated first. Then, an efficient scheme to determine an effective sequence of hash filters for a bushy execution tree is developed, where hash filters are built and applied based on the join sequence specified in the bushy tree so that not only is the reduction effect optimized but also the cost associated is minimized. Various schemes using hash filters are implemented and evaluated via simulation. It is experimentally shown that the application of hash filters is in general a very powerful means to improve the execution of multi-join queries, and the improvement becomes more prominent as the number of relations in a query increases.
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before t...
详细信息
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree, To improve the execution of pipelined hash joins, an innovative approach on query execution tree selection is proposed to exploit segmented right deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in light of the model, develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees.
With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an...
详细信息
With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an environment are usually partitioned and distributed across computing units. To achieve the optimal performance, it is essential for each unit to have a perfectly balanced load (i.e., identical amount of data). However, fragment sizes may vary due to insertions to and deletions from a relation. To retain good performance, the system needs to periodically rebalance the load of the processors by redistributing data among computing units. Traditionally, the redistribution is performed by reshuffling tuples among processors through a relation repartitioning (e.g., rehashing) process. The computation of this process is at the tuple level. In this paper, we present a self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during queryprocessing to minimize redistribution cost. The entire scheme is built on top of the popular grid file structure. The adaptivity of the scheme and its relevant features are discussed. The cost of load rebalancing is estimated. The result shows that under our assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization s...
详细信息
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule;larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization s...
详细信息
ISBN:
(纸本)0818622954
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule;larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.
暂无评论