distributed tensor decomposition (DTD) is a fundamental data-analytics technique that extracts latent important properties from high-dimensional multi-attribute datasets distributed over edge devices. Conventionally i...
详细信息
distributed tensor decomposition (DTD) is a fundamental data-analytics technique that extracts latent important properties from high-dimensional multi-attribute datasets distributed over edge devices. Conventionally its wireless implementation follows a one-shot approach that first computes local results at devices using local data and then aggregates them to a server with communication-efficient techniques such as over-the-air computation (AirComp) for global computation. Such implementation is confronted with the issues of limited storage-and-computation capacities and link interruption, which motivates us to propose a framework of on-the-fly communication-and-computing (FlyCom(2)) in this work. The proposed framework enables streaming computation with low complexity by leveraging a random sketching technique and achieves progressive global aggregation through the integration of progressive uploading and multiple-input-multiple-output (MIMO) AirComp. To develop FlyCom(2), an on-the-fly sub-space estimator is designed to take real-time sketches accumulated at the server to generate online estimates for the decomposition. Its performance is evaluated by deriving both deterministic and probabilistic error bounds using the perturbation theory and concentration of measure. Both results reveal that the decomposition error is inversely proportional to the population of sketching observations received by the server. To further rein in the noise effect on the error, we propose a threshold-based scheme to select a subset of sufficiently reliable received sketches for DTD at the server. Experimental results validate the performance gain of the proposed selection algorithm and show that compared to its one-shot counterparts, the proposed FlyCom(2) achieves comparable (even better in the case of large eigen-gaps) decomposition accuracy besides dramatically reducing devices' complexity costs.
distributed tensor decomposition (DTD) is a fundamental data-analytics technique that extracts latent important properties from multi-attribute datasets distributed over edge devices. Its conventional one-shot impleme...
详细信息
ISBN:
(纸本)9798350310900
distributed tensor decomposition (DTD) is a fundamental data-analytics technique that extracts latent important properties from multi-attribute datasets distributed over edge devices. Its conventional one-shot implementation with over-the-air computation (AirComp) is confronted with the issues of limited storage-and-computation capacities and link interruption, which motivates us to propose a framework of on-thefly communication-and-computing (FlyCom(2)) in this work. The proposed framework enables streaming computation with low complexity by leveraging a random sketching technique and achieves progressive global aggregation through the integration of progressive uploading and multiple-input-multiple-output (MIMO) AirComp. To develop FlyCom(2), an on-the-fly sub-space estimator is designed to take real-time sketches accumulated at the server to generate online estimates for the decomposition. Its performance is evaluated by deriving both deterministic and probabilistic error bounds, which reveal the scaling laws of the decomposition error and inspire a threshold-based scheme to select reliably received sketches. Experimental results validate the performance gain of the proposed selection algorithm and show that compared to its one-shot counterparts, FlyCom(2) achieves comparable (even better with large eigen-gaps) decomposition accuracy besides dramatically reducing devices' complexity costs.
Matrix factorizations and their multi-linear extensions, known as tensor factorizations are widely known and useful methods in data analysis and machine learning for feature extraction and dimensionality reduction. Re...
详细信息
ISBN:
(纸本)9781728119854
Matrix factorizations and their multi-linear extensions, known as tensor factorizations are widely known and useful methods in data analysis and machine learning for feature extraction and dimensionality reduction. Recently, new approaches to factorization models appeared - tensor network (TN) factorizations. They reduce storage, computational complexity, and aim to help with curse of dimensionality in decomposing multi-way data. tensor train (TT) is one of the most popular TN models used in wide-range areas, such as quantum physics or chemistry. In this study, we improved TTs for classification tasks by combining the fundamental TT model with randomized decompositions and extending it to a distributed version according to the MapReduce paradigm. As a result, the proposed approach is not only scalable but also much faster than competing algorithms, and is able to perform large-scale dimensionality reduction, e.g. in classification tasks.
tensor train (TT) decomposition is a method for approximating and analysing tensors. TT-SVD, the most commonly used TT decomposition algorithm, computes the TT-format of a tensor in a sequential manner by alternately ...
详细信息
ISBN:
(纸本)9798350311990
tensor train (TT) decomposition is a method for approximating and analysing tensors. TT-SVD, the most commonly used TT decomposition algorithm, computes the TT-format of a tensor in a sequential manner by alternately reshaping and compressing the tensor. For large tensors, this requires a large amount of computation time and memory. In this paper, we propose a distributed parallel algorithm, PTTD, to perform TT decomposition, which distributes parts of the tensor to all processes, decomposes it in parallel using TT-SVD, and merges the results to obtain the TT-format of the original tensor. Rounding is applied to reduce the size of the merged TT-formats. The algorithm is deterministic, which means that approximation error is controllable and there is no need to know the TT-ranks of the tensor in advance. Experimental results show that PTTD achieves an average speedup of 5384x using 8192 cores, and that the approximation error decreases as the number of cores increases, at the cost of slowly growing TT-ranks.
暂无评论