In recent years, parallelprocessing and optimization algorithms for processing object-oriented databases have drawn a considerable amount of attention from the database research community. Two general types of algori...
详细信息
In recent years, parallelprocessing and optimization algorithms for processing object-oriented databases have drawn a considerable amount of attention from the database research community. Two general types of algorithms have been introduced: hybrid-hash pointer-based algorithms and multiwavefront algorithms. In this work, we quantitatively analyze the two algorithms and develop analytical formulas to capture the main performance features of these two approaches. We study their performance in three application environments: One is characterized by large databases having many object classes, each of which contains a large number of instances;the second one is characterized by large databases having many object classes, each of which contains a relatively small number of instances;and the third one is by large databases having object classes of varying sizes. A horizontal data partitioning strategy, in which each object class is partitioned into horizontal segments stored across all processors, is used in the first environment. A class-per-node assignment strategy, in which instances of each object class are stored in a single processor, is used in the second environment. In the third environment, object classes are partitioned horizontally and assigned to a varying number of processors depending on their different sizes. Our analytical results show that the multiwavefront algorithm has three distinguishing features which contribute to its better performance: 1) two-phase processing strategy, 2) vertical partitioning of horizontal segments, and 3) dynamic determination of "collision point" in multiwavefront propagations which results in an optimized query execution plan. We show that if these features are adopted by a hybrid-hash, pointer-based algorithm, its performance will be comparable with that of the multiwavefront algorithm because the difference in CPU time between them is negligible. The assumed computing environment is a network of workstations having a sha
暂无评论