datashuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying datashuffling is the high communication cost. Existing works use coding technology to reduc...
详细信息
ISBN:
(纸本)9781728199160
datashuffling can improve the statistical performance of distributed machine learning. However, the obstruction of applying datashuffling is the high communication cost. Existing works use coding technology to reduce communication cost. These works assume a master-worker based storage architecture. However, due to the demand for unlimited storage on the master, the master-worker storage architecture is not always practical in common data centers. In this paper, we propose a new coding method for datashuffling in the decentralized storage architecture, which is built on a fat-tree based data center network. The method determines which data samples should be encoded together and from which the encoded package should be sent to minimize the communication cost. We develop a real-world test-bed to evaluate our method. The results show that our method can reduce the transmission time by 6.4% over the state-of-art coding method, and by 27.8% over Unicasting.
暂无评论