A bilinear algorithm of bilinear complexity 22 for approximate multiplication of 2 x 7 and 7 x 2 matrices is presented. An upper bound is given for the bilinear complexity of approximate multiplication of 2 x 2 and 2 ...
详细信息
A bilinear algorithm of bilinear complexity 22 for approximate multiplication of 2 x 7 and 7 x 2 matrices is presented. An upper bound is given for the bilinear complexity of approximate multiplication of 2 x 2 and 2 x n matrices (n a parts per thousand yen 1).
This paper proposes a novel enhancement method based exclusively on the bilinear interpolation algorithm for capsule endoscopy images. The proposed method does not convert the original RBG image components to HSV or a...
详细信息
ISBN:
(数字)9781510617421
ISBN:
(纸本)9781510617421
This paper proposes a novel enhancement method based exclusively on the bilinear interpolation algorithm for capsule endoscopy images. The proposed method does not convert the original RBG image components to HSV or any other color space or model;instead, it processes directly RGB components. In each component, a group of four adjacent pixels and half-unit weight in the bilinear weighting function are used to calculate the average pixel value, identical for each pixel in that particular group. After calculations, groups of identical pixels are overlapped successively in horizontal and vertical directions to achieve a preliminary-enhanced image. The final-enhanced image is achieved by halving the sum of the original and preliminary-enhanced image pixels. Quantitative and qualitative experiments were conducted focusing on pairwise comparisons between original and enhanced images. Final-enhanced images have generally the best diagnostic quality and gave more details about the visibility of vessels and structures in capsule endoscopy images.
We develop lower bounds on communication in the memory hierarchy or between processors for nested bilinear algorithms, such as Strassen's algorithm for matrix multiplication. We build on a previous framework that ...
详细信息
We develop lower bounds on communication in the memory hierarchy or between processors for nested bilinear algorithms, such as Strassen's algorithm for matrix multiplication. We build on a previous framework that establishes communication lower bounds by use of the rank expansion, or the minimum rank of any fixed size subset of columns of a matrix, for each of the three matrices encoding a bilinear algorithm. This framework provides lower bounds for a class of dependency directed acyclic graphs (DAGs) corresponding to the execution of a given bilinear algorithm, in contrast to other approaches that yield bounds for specific DAGs. However, our lower bounds only apply to executions that do not compute the same DAG node multiple times. Two bilinear algorithms can be nested by taking Kronecker products between their encoding matrices. Our main result is a lower bound on the rank expansion of a matrix constructed by a Kronecker product derived from lower bounds on the rank expansion of the Kronecker product's operands. We apply the rank expansion lower bounds to obtain novel communication lower bounds for nested Toom-Cook convolution, Strassen's algorithm, and fast algorithms for contraction of partially symmetric tensors.
Binary field multiplication is widely used in quantum information processing, such as quantum algorithms, cryptanalysis and mathematical arithmetic. The core quantum resources of binary field multiplication are the qu...
详细信息
Binary field multiplication is widely used in quantum information processing, such as quantum algorithms, cryptanalysis and mathematical arithmetic. The core quantum resources of binary field multiplication are the qubit count and Toffoli depth of its quantum circuit, both of which are largely dependent on the Toffoli gate count. In this paper, we analyze the multiplicative complexity of binary field and present quantum circuits for F28 multiplication from the perspective of time and space. We find that the Toffoli gate count of quantum circuit corresponds to the bilinear complexity in F2n multiplication. The Toffoli gate count obtained by the algebraic curve method increases linearly with n, which is slower than the sub-quadratic complexity of Karatsuba algorithm and the iterated logarithm complexity of Chinese remainder theorem (CRT). To demonstrate the advantages of the algebraic curve method, we use elliptic curve bilinear algorithm in F(22)4 and composite field arithmetic (CFA) to present two types quantum circuits for F28 multiplication, both of which have 24 Toffoli gates and are the lowest at present. The Toffoli depth of the time-efficient quantum circuit is only 1, and the product D.W of the depth and width of the circuit is 72, which is lower than before. The space-efficient quantum circuits require 24 qubits and maintain the Toffoli depth of 4, their D.W and Toffoli depth are reduced by at least 77.8% compared with the most advanced research.
The known fast sequential algorithms for multiplying two N x N matrices lover an arbitrary ring) have time complexity O(N(alpha)), where 2 p less than or equal to N(alpha), multiplying two N x N matrices can be perf...
详细信息
The known fast sequential algorithms for multiplying two N x N matrices lover an arbitrary ring) have time complexity O(N(alpha)), where 2 < < 3. The current best value of a is less than 2.3755. We show that, for all 1 p less than or equal to N(alpha), multiplying two N x N matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in O(N(alpha)/p + (N(2)/p(2)/alpha) log p) time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all 1 less than or equal to P less than or equal to N(2.3755), multiplying two iii x N matrices can be performed on a p-processor LARPBS in O(N(2.3755)/p + (N(2)/p(2)/alpha) log p) time and linear speedup can be achieved for p as large as O(N(2.3755)/(log N)(6.3262)). Furthermore. multiplying two N x N matrices can be performed on an LARPBS with O(N(alpha)) processors in O(log N) time. This compares favorably with the performance on a PRAM.
We study asymptotically fast multiplication algorithms for matrix pairs of arbitrary dimensions, and optimize the exponents of their arithmetic complexity bounds. For a large class of input matrix pairs, we improve th...
详细信息
We study asymptotically fast multiplication algorithms for matrix pairs of arbitrary dimensions, and optimize the exponents of their arithmetic complexity bounds. For a large class of input matrix pairs, we improve the known exponents. We also show some applications of our results: (i) we decrease from O(n 2 + n 1+o(1)logq) to O(n 1.9998 + n 1+o(1)logq) the known arithmetic complexity bound for the univariate polynomial factorization of degree n over a finite field with q elements; (ii) we decrease from 2.837 to 2.7945 the known exponent of the work and arithmetic processor bounds for fast deterministic (NC) parallel evaluation of the determinant, the characteristic polynomial, and the inverse of an n × n matrix, as well as for the solution to a nonsingular linear system of n equations; (iii) we decrease from O(m 1.575 n) to O(m 1.5356 n) the known bound for computing basic solutions to a linear programming problem with m constraints and n variables.
Matrix multiplication is one of the most extensively used kernels in scientific computing. Although subcubic algorithms exist, most high performance implementations are based on the classical \Theta (n3) matrix multip...
详细信息
Matrix multiplication is one of the most extensively used kernels in scientific computing. Although subcubic algorithms exist, most high performance implementations are based on the classical \Theta (n3) matrix multiplication. Designing an algorithm that obtains even modest improvements in performance over existing implementations, requires carefully addressing challenges such as reducing computation costs, communication costs, and memory footprint. We provide the first high performance general matrix-matrix multiplication that utilizes the alternative basis method on Strassen's algorithm. We reduce the basis transformation overheads and decrease the memory footprint of the bilinear phase by using the pebbling game optimization scheme, consequentially improving both arithmetic and communication costs. Our algorithm outperforms DGEMM on feasible matrix dimensions starting at n = 96. It obtains an increasing speedup of up to nearly \times 2 speedup for larger matrix dimensions when running sequentially, and even larger speedups for certain matrix dimensions when running in parallel.
暂无评论