Operator fusion is essentially and widely used in a large number of matrix computation systems in science and industry. The existing distributed operator fusion methods focus on only either low communication cost with...
详细信息
ISBN:
(纸本)9781450392495
Operator fusion is essentially and widely used in a large number of matrix computation systems in science and industry. The existing distributed operator fusion methods focus on only either low communication cost with the risk of out of memory or large-scale processing with high communication cost. We propose a distributed elastic fused operator called Cuboid-based Fused Operator (CFO) that achieves both low communication cost and large-scale processing. We also propose a novel fusion plan generator called Cuboid-based Fusion plan Generator (CFG) that finds a fusion plan to fuse more operators including large-scale matrix multiplication. We implement a fast distributed matrix computation engine called FuseME by integrating both CFO and CFG seamlessly. FuseME outperforms the state-of-the-art systems including systemDS by orders of magnitude.
Matrix computation, in particular, matrix multiplication is time-consuming, but essentially and widely used in a large number of applications in science and industry. The existing distributed matrix multiplication met...
详细信息
ISBN:
(纸本)9781450356435
Matrix computation, in particular, matrix multiplication is time-consuming, but essentially and widely used in a large number of applications in science and industry. The existing distributed matrix multiplication methods only focus on either low communication cost (i.e., high performance) with the risk of out of memory or large-scale processing with high communication overhead. We propose a distributed elastic matrix multiplication method called CuboidMM that achieves both high performance and large-scale processing. We also propose a GPU acceleration method that can be combined with CuboidMM. CuboidMM partitions matrices into cuboids for optimizing the network communication cost with considering memory usage per task, and the GPU acceleration method partitions a cuboid into subcuboids for optimizing the PCI-E communication cost with considering GPU memory usage. We implement a fast and elastic matrix computation engine called DistME by integrating CuboidMM with GPU acceleration on top of Apache Spark. Through extensive experiments, we have demonstrated that CuboidMM and DistME significantly outperform the state-of-the-art methods and systems, respectively, in terms of both performance and data size.
Statistical machine translation (SMT) is an important research branch in natural language processing (NLP). Similar to many other NLP applications, large scale training data can potentially bring higher translation ac...
详细信息
ISBN:
(纸本)9781538673089
Statistical machine translation (SMT) is an important research branch in natural language processing (NLP). Similar to many other NLP applications, large scale training data can potentially bring higher translation accuracy for SMT models. However, the traditional single-node SMT model training systems can hardly cope with the fast-growing amount of large scale training corpus in the big data era, which makes the urgent requirement of efficient large scale machine translation model training systems. In this paper, we propose Seal, an efficient, scalable, and end-to-end offline SMT model training toolkit based on Apache Spark which is a widely-used distributeddata-parallel platform. Seal parallelizes the training process of the entire three key SMT models that are the word alignment model, the translation model, and the N-Gram language model, respectively. To further improve the performance of the model training in Seal, we also propose a number of system optimization methods. In word alignment model training, by optimizing the block size tuning, the overhead of IO operation and communication is greatly reduced. In translation model training, by well encoding the training corpus, the data size transferred over the network can be reduced significantly, thus improving the overall training efficiency. We also optimize the maximum likelihood estimation (MLE) algorithm to solve the data skew issue on the join operation which is adopted both in the translation model training and the language model training. The experiment results show that Seal outperforms the well-known SMT training system Chaski with about 5x speedup for word alignment model training. For the syntactic translation model and language model training, Seal outperforms the existing cutting-edge tools with about 9 similar to 18x speedup and 8 similar to 9x speedup on average, respectively. On the whole, Seal outperforms the existing distributedsystem with 4 similar to 6x speedup and the single-node system w
暂无评论