We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a gr...
详细信息
ISBN:
(数字)9798350362480
ISBN:
(纸本)9798350362497
We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm’s greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.
作者:
Aleksandr BeznosikovSamuel HorváthPeter RichtárikMher SafaryanComputer
Electrical and Math. Sciences and Engineering Division King Abdullah University of Science and Technology Thuwal KSA and Skolkovo Institute of Science and Technology Moscow Russia and School of Applied Mathematics and Informatics Moscow Institute of Physics and Technology Moscow Russia Computer
Electrical and Math. Sciences and Engineering Division King Abdullah University of Science and Technology Thuwal KSA
In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased com...
详细信息
In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp[-\frac{\mu K}{\delta L}] + \frac{(C + \delta D)}{K\mu}\right)$, where δ ≥1 is a compression parameter which grows when more compression is applied, L and µ are the smoothness and strong convexity constants, C captures stochastic gradient noise (C = 0 if full gradients are computed on each node) and D captures the variance of the gradients at the optimum (D = 0 for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.
The cutting triangular cycles of lines in space were investigated. It was shown that a collection of lines in 3-space can be cut into a subquadratic number of pieces, such that all depth cycles defined by triples of l...
详细信息
The cutting triangular cycles of lines in space were investigated. It was shown that a collection of lines in 3-space can be cut into a subquadratic number of pieces, such that all depth cycles defined by triples of lines are eliminated. A long-standing open problem in computational geometry, motivated by hidden-surface removal in computer graphics, was solved.
We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a gr...
详细信息
In this paper, we study the heterogeneous use of two programming paradigms for heterogeneous computing called Cluster-M and HAsC. Both paradigms can efficiently support heterogeneous networks by preserving a level of ...
详细信息
Let B be a set of n unit balls in 3. We show that the combinatorial complexity of the space of lines in 3 that avoid all the balls of B is O(n3+Ε), for any Ε > 0. This result has connections to problems in visibi...
详细信息
Let B be a set of n unit balls in 3. We show that the combinatorial complexity of the space of lines in 3 that avoid all the balls of B is O(n3+Ε), for any Ε > 0. This result has connections to problems in visibility, ray shooting, motion planning and geometric optimization.
In this paper we prove the nonexistence of quaternary linear codes with parameters [51, 4, 37]. This result gives the exact value of nq(k, d) for q = 4, k = 4, d = 37 and 38. These were the only minimum distances for ...
Multithreaded execution models attempt to combine some aspects of dataflow-like execution with von Neumann model execution. Their main objective is to mask the latency of inter-processor communications and remote memo...
详细信息
Multithreaded execution models attempt to combine some aspects of dataflow-like execution with von Neumann model execution. Their main objective is to mask the latency of inter-processor communications and remote memory accesses in large scale multiprocessors. An important issue in the analysis and evaluation of multithreaded execution is the design and performance of the storage hierarchy. Because of the sequential execution of threads, the locality of access within an executing thread can be exploited using registers and cache. At the inter-thread level, however, the locality of accesses to memory and its effect on the cache is not yet well understood. A storage model which can exploit this locality is developed and evaluated. The results indicate there is a large amount of inter-thread locality that can be exploited and that we can get an efficient storage system by exploiting the characteristics of nonblocking threads.
暂无评论