文献详情 >A dynamic re-partitioning stra... 收藏

A dynamic re-partitioning strategy based on the distribution of key in Spark

作者机构：1School of computer science and technology Chongqing University of Posts and Telecommunications Chongqing 400065 China 2Chongqing Engineering Research Center of Mobile Internet Data Application Chongqing 400065 China

出版物：《AIP Conference Proceedings》

年卷期：2018年第1967卷第1期

学科分类：07[理学] 0702[理学-物理学]

摘要：Spark is a memory-based distributed data processing framework, has the ability of processing massive data and becomes a focus in Big Data. But the performance of Spark Shuffle depends on the distribution of data. The naive Hash partition function of Spark can not guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to handle this problem, dynamic sampling is used. In the process of task execution, histogram is used to count the key frequency distribution of each node, and then generate the global key frequency distribution. After analyzing the distribution of key, load balance of data partition is achieved. Results show that the Dynamic Re-Partitioning function is better than the default Hash partition, Fine Partition and the Balanced-Schedule strategy, it can reduce the execution time of the task and improve the efficiency of the whole cluster.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

A dynamic re-partitioning strategy based on the distribution of key in Spark

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

A dynamic re-partitioning strategy based on the distribution of key in Spark

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：