This paper investigates the partition skew problem at reduce phase in the MapReduce jobs. Our studies with the Hadoop addresses this problem in both offline and online manner. Offline is a heuristics based approach wh...
详细信息
ISBN:
(纸本)9781450347563
This paper investigates the partition skew problem at reduce phase in the MapReduce jobs. Our studies with the Hadoop addresses this problem in both offline and online manner. Offline is a heuristics based approach which has to wait for the completion of map tasks and involves computation overhead to estimate the partition size. In another approach, they distribute the overloaded tasks across other nodes that needed extra split and merge operation. These extra operations, in turn, hamper the performance of the system. In this paper, we propose Aegeus, an on-line streaming based skew mitigation approach for MapReduce jobs which do not have long waiting time and extra operations for addressing the skew problem. Aegeus predicts the partition size of the each map tasks and creates the resource specification based on its requirement even before the completion of map phase. Hence, the proposed system can create the container based on the workload which can improve the overall job completion time and system performance. We evaluated Aegeus by using benchmark datasets and, compare its performance with naive Hadoop. Based on our observation, Aegeus outperforms naive Hadoop by 42% by maximizing the overall performance of the application and system.
暂无评论