Networks Of Workstations (NOWs) are attractive for parallel processing due to their cost advantage. This paper investigates the performance issues in processing join operations and the inherent tradeoff in the network...
详细信息
ISBN:
(纸本)0769505686
Networks Of Workstations (NOWs) are attractive for parallel processing due to their cost advantage. This paper investigates the performance issues in processing join operations and the inherent tradeoff in the networked workstation environment. Specifically, we look at the performance of the nested-loop joinalgorithm. Since NOWs are heterogeneous in nature, loan sharing is important for their performance. We evaluated the performance of three load sharing methods: static equal, static proportional, and dynamic scheduling with fixed-chunk size. The three scheduling methods are evaluated on an experimental heterogeneous network of workstations with non-query background loads. Our experimental results suggest that, when there is no background load, dynamic scheduling outperforms static equal scheduling (up to 40%) and marginally better (about 10% better speedup) than the static proportional scheduling. When there is dynamic background load on nodes, dynamic scheduling provides substantial performance improvement over the static proportional scheduling (up to 50%) and static equal scheduling (up to about 100%). In all cases, selection of an appropriate chunk size is important in dynamic scheduling.
Many parallel join algorithms have been proposed for parallel relational database systems. Among them, the parallel hash-based joinalgorithm (PHJA) has been found to be superior to other joinalgorithms for the unif...
详细信息
Many parallel join algorithms have been proposed for parallel relational database systems. Among them, the parallel hash-based joinalgorithm (PHJA) has been found to be superior to other joinalgorithms for the uniform distribution of data. In real databases, it is often found that certain values for a given attribute occur more frequently than other values. This phenomenon is referred to as data skew. An efficient algorithm called skew resolution joinalgorithm is proposed for paralleljoin operations with skewed data. A methodology is proposed for partitioning relations evenly across all processors in a parallel database system. Using the histogram equalization technique, the framework transforms the histogram of skewed data to uniform distribution that corresponds to the relative power of node processors in the system. The proposed algorithm exhibits better performance than the conventional PHJA in the presence of data skew, with negligible overhead in the absence of data skew.
Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based joinalgorithm is particularly efficient and easily parallelized...
详细信息
Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based joinalgorithm is particularly efficient and easily parallelized for this computation model, However, this hardware structure is very sensitive to the skew in tuple distribution. Unless the parallel hash joinalgorithm includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this paper, we investigate this issue, in particular, three parallel hash joinalgorithms are presented, We implement a simulator to study the effectiveness of these schemes. The simulation model is validated by comparing the simulation results to those produced by the actual implementation of the algorithms running on a multiprocessor system. Our performance study indicates that a naive approach is not able to provide tangible savings, However, the carefully designed strategies can offer substantial improvement over conventional techniques for a wide range of skew conditions.
暂无评论