We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in...
详细信息
We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in many important applications. Thus, there are a considerable number of studies on the parallel hashjoin. However, most of the previous research does not show how cache affects the performance of these algorithms. In this paper, we show the impact and benefits of remote caches (Netcaches) on the overall performance of parallel hash join algorithms running on SMP clusters. Furthermore, we show the effects of these caches on speedup and scalability of these algorithms. Our simulation results leads us to conclude that the execution time of hash join algorithms on modem's multiprocessors, with large local and remote caches, could be reduced up to 70%. Finally, we show results for verifying the big effects of netcaches on scalability of these algorithms.
The goal of this research is to design efficient relational joinalgorithms for large databases on a hypercube multicomputer in which data and processing power are distributed. The Cube Hybrid-hashjoin algorithm was ...
详细信息
The goal of this research is to design efficient relational joinalgorithms for large databases on a hypercube multicomputer in which data and processing power are distributed. The Cube Hybrid-hashjoin algorithm was shown to outperform other algorithms in our previous research. Unfortunately, its performance greatly deteriorates when bucket overflow occurs in the inner relation of the join operation. In this paper, we present the Cube Adaptive-hashjoin algorithm, which is designed to combine the merits of Nested-Loop and Hybrid-hash. The performance of these algorithms are compared through analytical cost modeling. The nonuniform data value distribution of the inner relation is shown to have a greater impact than that of the outer relation. The Cube Adaptive-hashjoin algorithm outperforms the Cube Hybrid-hashjoin algorithm when bucket overflow occurs. In the worst case, this algorithm converges to the Cube Nested-Loop-hashjoin algorithm. When there is no hash table overflow, the Cube Adaptive-hashjoin algorithm converges to the Cube Hybrid-hashjoin algorithm. Since the Cube Adaptive-hashjoin algorithm adapts itself depending on the characteristics of the relations, it is relatively immune to the data distribution. We believe that the Cube Adaptive-hashjoin algorithm should be the algorithm of choice to perform the relational join operator for large databases on the hypercube multicomputer.
Advanced processor architectures have been driving new designs, implementations and optimizations of main-memory hash join algorithms recently. The newly released Intel Xeon Phi many-core processor of the Knights Land...
详细信息
ISBN:
(纸本)9781450349185
Advanced processor architectures have been driving new designs, implementations and optimizations of main-memory hash join algorithms recently. The newly released Intel Xeon Phi many-core processor of the Knights Landing architecture (KNL) embraces interesting hardware features such as many low-frequency out-of-order cores connected on a 2D mesh, and high-bandwidth multi-channel memory (MCDRAM). In this paper, we experimentally revisit the state-of-the-art main-memory hash join algorithms to study how the new hardware features of KNL affect the algorithmic design and tuning as well as to identify the opportunities for further performance improvement on KNL. Our experiments show that, although many existing optimizations are still valid on KNL with proper timing, even the state-of-the-art algorithms have severely underutilized the memory bandwidth and other hardware resources.
暂无评论