The emergence of modern data-intensive applications requires sophisticated database techniques for processing advanced types of user queries on massive data. In this paper, we study such a new type of query, called pr...
详细信息
The emergence of modern data-intensive applications requires sophisticated database techniques for processing advanced types of user queries on massive data. In this paper, we study such a new type of query, called progressive queries. A progressive query is defined as a set of inter-related and incrementally formulated step-queries. A step-query in a progressive query PQ is specified on the fly based on the results of previously-executed step-queries in PQ. Hence, a progressive query cannot be formulated in advance before its execution, which raises challenges for its processing and optimization. We introduce a query model to characterize different types of progressive queries. We then present a new index structure, called the collective index, to efficiently process progressive queries. The collective index technique incrementally evaluates step-queries via dynamically maintained member indexes. Utilizing the special structure of a collective index, the (member) indexes on the input relation(s) of a step-query are efficiently transformed into indexes on the result relation. Algorithms to efficiently process single-input (unary) linear and multiple-input (join) linear progressive queries based on the collective index are presented. Our experiment results show that the proposed collective index technique outperforms the conventional queryprocessing methods in processing progressive queries.
Much work has been accomplished in the past on the subject of parallel query processing and optimization in parallel relational database systems;however, little work on the same subject has been done in parallel objec...
详细信息
Much work has been accomplished in the past on the subject of parallel query processing and optimization in parallel relational database systems;however, little work on the same subject has been done in parallel object-oriented database systems. Since the object-oriented view of a database and its processing are quite different from those of a relational system, it can be expected that techniques of parallel query processing and optimization for the latter can be different from the former. In this paper, we present a general framework for parallel object-oriented database systems and several implemented query processing and optimization strategies together with some performance evaluation results. In this work, multiwavefront algorithms are used in queryprocessing to allow a higher degree of parallelism than the traditional tree-based queryprocessing. Four optimization strategies, which are designed specifically for the multiwavefront algorithms and for the optimization of single as well as multiple queries, are introduced. The queryprocessing algorithms and optimization strategies have been implemented on a parallel computer, nCUBE2;and the results of a performance evaluation are presented in this paper. The main emphases and the intended contributions of this paper are (1) data partitioning, query processing and optimization strategies suitable for parallel OODBMSs, (2) the implementation of the multiwavefront algorithms and optimization strategies, and (3) the performance evaluation results.
Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples. We propose two new classes of query evaluation algorithms that can be used to speed up the ex...
详细信息
Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples. We propose two new classes of query evaluation algorithms that can be used to speed up the execution of such queries. The algorithms are based on (1) early sorting and (2) early partitioning - or a combination of both. The idea is to push the sorting and/or the partitioning to the leaves, i.e., the base relations, of the query evaluation plans (QEPs) and thereby avoid sorting or partitioning large intermediate results generated by the joins. Both early sorting and early partitioning are used in combination with hash-based algorithms for evaluating the joints) and the grouping. To enable early sorting, the sort order generated at an early stage of the QEP is retained through an arbitrary number of so-called order-preserving hashjoins. To make early partitioning applicable to a large class of decision support queries, we generalize the so-called hash teams proposed by Graefe et al. [GBC98]. Hash teams allow to perform several hash-based operations (join and grouping! on the same attribute in one pass without repartitioning intermediate results. Our generalization consists of indirectly partitioning the input data. indirect partitioning means partitioning the input data on an attribute that is not directly needed for the next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that is needed in the next hash-based operation. Our performance experiments show that such QEPs based on early sorting, early partitioning, or both in combination perform significantly better than conventional strategies for many common classes of decision support queries.
A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management ...
详细信息
A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS.
暂无评论