Pan's four decades old fast matrix multiplication algorithms have the lowest asymptotic complexity of all currently known algorithms applicable to matrices of feasible dimensions. However, the large coefficients i...
详细信息
ISBN:
(纸本)9798400700392
Pan's four decades old fast matrix multiplication algorithms have the lowest asymptotic complexity of all currently known algorithms applicable to matrices of feasible dimensions. However, the large coefficients in the arithmetic cost of these algorithms make them impractical. We reduce these coefficients by 90%-98%, in some cases to a value of 2, the same leading coefficient as the classical, cubic time algorithm. For this purpose, we utilize fast recursive transformations with sparsification of the linear operators of Pan's algorithms. Existing decomposition methods cannot be applied to Pan's algorithms due to their large base cases. We describe two new methods for finding such decompositions, by utilizing the underlying symmetries of the algorithms, and the linear transformations within the algorithms. With these tools, we obtain algorithms with the same asymptotic complexity as Pan's algorithms, but with small leading coefficients, often the same as that of the cubic time algorithm. In practice, these new algorithms have the potential to outperform the fastest currently known matrix multiplication algorithms on feasible sized inputs. Matched against known lower bounds, we show that our results are optimal or close to being optimal.
Long integer multiplication is a fundamental kernel in many linear algebra and cryptography computations. Toom-Cook-k (k is an element of N) are a family of fast long integer multiplication algorithms frequently used ...
详细信息
ISBN:
(纸本)9783031695827;9783031695834
Long integer multiplication is a fundamental kernel in many linear algebra and cryptography computations. Toom-Cook-k (k is an element of N) are a family of fast long integer multiplication algorithms frequently used in many applications, particularly for small k sizes (2, 3, and 4). Previous studies focus on minimizing Toom-Cook's arithmetic cost, sometimes at the expense of asymptotically higher communication costs and memory footprint. For many high-performance computing applications, the bottleneck is communication rather than arithmetic. We propose new versions of Toom-Cook-k algorithms that simultaneously reduce their arithmetic cost, communication cost, and memory footprint. We obtain these results by utilizing the alternative basis and Toom-Graph techniques. The arithmetic costs of the new algorithms are only slightly higher than that of the best previous solution, while the communication costs and memory footprint are significantly reduced and proved to be optimal.
Fast matrix multiplication algorithms are of practical use only if the leading coefficient of their arithmetic complexity is sufficiently small. Many algorithms with low asymptotic cost have large leading coefficients...
详细信息
ISBN:
(纸本)9781450361842
Fast matrix multiplication algorithms are of practical use only if the leading coefficient of their arithmetic complexity is sufficiently small. Many algorithms with low asymptotic cost have large leading coefficients, and are thus impractical. Karstadt and Schwartz have recently demonstrated a technique that reduces the leading coefficient by introducing fast O(n(2) logn) basis transformations, applied to the input and output matrices. We generalize their technique, by allowing larger bases for the transformations while maintaining low overhead. Thus we accelerate several matrix multiplication algorithms, beyond what is known to be possible using the previous technique. Of particular interest are a few new sub-cubic algorithms with leading coefficient 2, matching that of classical matrix multiplication. For example, we obtain an algorithm with arithmetic complexity of 2n(log3) (23) + o(n(log3) (23)) compared to 2n(3) - n(2) of the classical algorithm. Such new algorithms can outperform previous ones (classical included) even on relatively small matrices. We obtain lower bounds matching the coefficient of several of our algorithms, proving them to be optimal.
A graph-theoretic model is introduced for bilinear algorithms. This facilitates in particular the investigation of the additive complexity of matrix multiplication. The number of additions/subtractions required for ea...
详细信息
A graph-theoretic model is introduced for bilinear algorithms. This facilitates in particular the investigation of the additive complexity of matrix multiplication. The number of additions/subtractions required for each of the problems defined by symmetric permutations on the dimensions of the matrices are shown to differ conversely as the size of each product matrix. It is noted that this result holds for any system of dual problems, not only dual matrix multiplication problems. This additive symmetry is employed to obtain various results, including the fact that 15 additive operations are necessary and sufficient to multiply two $2 \times 2$ matrices by a bilinear algorithm using at most 7 multiplication operations.
作者:
Pan, V. Ya.CUNY
Dept Math & Comp Sci Lehman Coll Bronx NY 10468 USA CUNY
Grad Ctr New York NY 10036 USA
Matrix multiplication is among the most fundamental operations of modern computations. By 1969 it was still commonly believed that the classical algorithm was optimal, although the experts already knew that this was n...
详细信息
Matrix multiplication is among the most fundamental operations of modern computations. By 1969 it was still commonly believed that the classical algorithm was optimal, although the experts already knew that this was not so. Worldwide interest in matrix multiplication instantly exploded in 1969, when Strassen decreased the exponent 3 of cubic time to 2.807. Then everyone expected to see matrix multiplication performed in quadratic or nearly quadratic time very soon. Further progress, however, turned out to be capricious. It was at stalemate for almost a decade, then a combination of surprising techniques (completely independent of Strassen's original ones and much more advanced) enabled a new decrease of the exponent in 1978-1981 and then again in 1986, to 2.376. By 2017 the exponent has still not passed through the barrier of 2.373, but most disturbing was the curse of recursion-even the decrease of exponents below 2.7733 required numerous recursive steps, and each of them squared the problem size. As a result, all algorithms supporting such exponents supersede the classical algorithm only for inputs of immense sizes, far beyond any potential interest for the user. We survey the long study of fast matrix multiplication, focusing on neglected algorithms for feasible matrix multiplication. We comment on their design, the techniques involved, implementation issues, the impact of their study on the modern theory and practice of Algebraic Computations, and perspectives for fast matrix multiplication. Bibliography: 163 titles.
Strassen's algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd (1971) improved the leading coefficient of its complexity from 6 to 7. There have been many subsequent asymptotic impro...
详细信息
Strassen's algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd (1971) improved the leading coefficient of its complexity from 6 to 7. There have been many subsequent asymptotic improvements. Unfortunately, most of these have the disadvantage of very large, often gigantic, hidden constants. Consequently, Strassen-Winograd's O(n(log2)(7)) algorithm often outperforms other fast matrix multiplication algorithms for all feasible matrix dimensions. The leading coefficient of Strassen-Winograd's algorithm has been generally believed to be optimal for matrix multiplication algorithms with a 2 x 2 base case, due to the lower bounds by Probert (1976) and Bshouty (1995). Surprisingly, we obtain a faster matrix multiplication algorithm, with the same base case size and asymptotic complexity as Strassen-Winograd's algorithm, but with the leading coefficient reduced from 6 to 5. To this end, we extend Bodrato's (2010) method for matrix squaring, and transform matrices to an alternative basis. We also prove a generalization of Probert's and Bshouty's lower bounds that holds under change of basis, showing that for matrix multiplication algorithms with a 2 x 2 base case, the leading coefficient of our algorithm cannot be further reduced, and is therefore optimal We apply our method to other fast matrix multiplication algorithms, improving their arithmetic and communication costs by significant constant factors.
暂无评论