MapReduce is a programming framework designed for processing and analyzing large volumes of data in a distributedcomputing environment. Despite its capabilities, it faces challenges due to silent data corruption duri...
详细信息
ISBN:
(数字)9789819708628
ISBN:
(纸本)9789819708611;9789819708628
MapReduce is a programming framework designed for processing and analyzing large volumes of data in a distributedcomputing environment. Despite its capabilities, it faces challenges due to silent data corruption during task execution, which can yield inaccurate results. Ensuring fault tolerance in the MapReduce framework while minimizing communication overhead presents considerable challenges. This study presents CDCFT (Coded distributedcomputing Fault Tolerance), a novel approach to fault tolerance within the MapReduce paradigm, combining the strengths of TMR (Triple Modular Redundancy) and CDC (Coded distributedcomputing). By leveraging task-level TMR and voting mechanisms, CDCFT robustly defends against silent data corruption. To further optimize, CDCFT employs intra-group broadcasts for relaying intermediate messages and has a finely-tuned node grouping combined with a strategic data and task allocation procedure. Through rigorous theoretical analysis, we establish that CDCFT's communication overhead during the Shuffle Stage is notably less than traditional CDC methods that rely on triple modular redundancy. Experimental results showcase the efficacy of CDCFT, signifying a substantial reduction in the overall communication overhead and execution time compared to the conventional fault-tolerant methods.
暂无评论