Scientific and engineering applications rely on floating-point arithmetic to approximate real numbers. Due to the inherent rounding errors in floating-point numbers, error propagation during calculations can accumulat...
详细信息
Scientific and engineering applications rely on floating-point arithmetic to approximate real numbers. Due to the inherent rounding errors in floating-point numbers, error propagation during calculations can accumulate and lead to serious errors that may compromise the safety and reliability of the program. In theory, the most accurate method of error detection is to exhaustively search all possible floating-point inputs, but this is not feasible in practice due to the huge search space involved. Effectively and efficiently detecting maximum floating-point errors has been a challenge. To address this challenge, we design and implement an error detection tool for floating-point arithmetic expressions called HSED. It leverages modified mantissas under double precision floating-point types to simulate hierarchical searches from either half or single precision to double precision. Experimental results show that for 32 single-parameter arithmetic expressions in the FPBench benchmark test set, the error detection effects and performance of HSED are significantly better than the state-of-the-art error detection tools Herbie, S3FP and ATOMU. HSED outperforms Herbie, Herbie+, S3FP and ATOMU in 24, 19, 27 and 25 cases, respectively. The average time taken by Herbie, Herbie+, and S3FP is 1.82, 11.20, and 129.15 times longer than HSED, respectively.
Assume we use a binary floating-point arith-metic and that RN is the round-to-nearest function. Also assume that c is a constant or a real function of one or more variables, and that we have at our disposal a correctl...
详细信息
Assume we use a binary floating-point arith-metic and that RN is the round-to-nearest function. Also assume that c is a constant or a real function of one or more variables, and that we have at our disposal a correctly rounded implementation of c, sayc=RN(c). For evaluat-ing xc (resp.x/c or c/x), the natural way is to replace it by RN (xc) (***(x/c) or RN(c/x)), that is, to call functioncand to perform a floating-point multiplica-tion or division. This can be generalized to the approxima-tion of n/dbyRN(n/d) and the approximation of ndbyRN(nd), wheren=RN(n) andd=RN(d),and n and d are functions for which we have at our disposal acorrectly rounded implementation. We discuss tight error bounds in ulps of such approximations. From our results, one immediately obtains tight error bounds for calculations such as x & lowast;pi,ln(2)/x,x/(y+z),(x+y)& lowast;z,x/sqrt(y),sqrt(x)/y,(x+y)(z+t),(x+y)/(z+t),(x+y)/(zt),etc. in floating-point arithmetic
floating-point arithmetic is a well-known and extremely efficient way of performing approximate computations over the real numbers. Although it requires some careful considerations, floating-point numbers are nowadays...
详细信息
floating-point arithmetic is a well-known and extremely efficient way of performing approximate computations over the real numbers. Although it requires some careful considerations, floating-point numbers are nowadays routinely used to prove mathematical theorems. Numerical computations have been applied in the context of formal proofs too, as illustrated by the CoqInterval library. But these computations do not benefit from the powerful floating-point units available in modern processors, since they are emulated inside the logic of the formal system. This paper experiments with the use of hardware floating-point numbers for numerically intensive proofs verified by the Coq proof assistant. This gives rise to various questions regarding the formalization, the implementation, the usability, and the level of trust. This approach has been applied to the CoqInterval and ValidSDP libraries, which demonstrates a speedup of at least one order of magnitude.
We show that there is a discrepancy between the emulated floating-point multiplication in the submission package of the digital signature Falcon and the claimed behavior. In particular, we show that some floating-poin...
详细信息
ISBN:
(数字)9789819777372
ISBN:
(纸本)9789819777365;9789819777372
We show that there is a discrepancy between the emulated floating-point multiplication in the submission package of the digital signature Falcon and the claimed behavior. In particular, we show that some floating-point products with absolute values the smallest normal positive floating-point number are incorrectly zeroized. However, we show that the discrepancy doesn't affect the complex fast Fourier transform in the signature generation of Falcon by modeling the floating-point addition, subtraction, and multiplication in CryptoLine. We later implement our own floating-point multiplications in Armv7-M assembly and Jasmin and prove their equivalence with our model, demonstrating the possibility of transferring the challenging verification task (verifying highly-optimized assembly) to the presumably more readable code base (Jasmin).
Motivated by the unexpected failure of the triangle intersection component of the Projection Algorithm for Nonmatching Grids (PANG), this article provides a robust version with proof of backward stability. The new tri...
详细信息
Motivated by the unexpected failure of the triangle intersection component of the Projection Algorithm for Nonmatching Grids (PANG), this article provides a robust version with proof of backward stability. The new triangle intersection algorithm ensures consistency and parsimony across three types of calculations. The set of intersections produced by the algorithm, called representations, is shown to match the set of geometric intersections, called models. The article concludes with a comparison between the old and new intersection algorithms for PANG using an example found to reliably generate failures in the former.
This paper concerns test matrices for numerical linear algebra using an error-free transformation of floating-point arithmetic. For specified eigenvalues given by a user, we propose methods of generating a matrix whos...
详细信息
This paper concerns test matrices for numerical linear algebra using an error-free transformation of floating-point arithmetic. For specified eigenvalues given by a user, we propose methods of generating a matrix whose eigenvalues are exactly known based on, for example, Schur or Jordan normal form and a block diagonal form. It is also possible to produce a real matrix with specified complex eigenvalues. Such test matrices with exactly known eigenvalues are useful for numerical algorithms in checking the accuracy of computed results. In particular, exact errors of eigenvalues can be monitored. To generate test matrices, we first propose an error-free transformation for the product of three matrices YSX. We approximate S by S' to compute YS'X without a rounding error. Next, the error-free transformation is applied to the generation of test matrices with exactly known eigenvalues. Note that the exactly known eigenvalues of the constructed matrix may differ from the anticipated given eigenvalues. Finally, numerical examples are introduced in checking the accuracy of numerical computations for symmetric and unsymmetric eigenvalue problems.
Some recent processors are not equipped with an integer division unit. Compilers then implement division by a call to a special function supplied by the processor designers, which implements division by a loop produci...
详细信息
ISBN:
(数字)9781665478274
ISBN:
(纸本)9781665478274
Some recent processors are not equipped with an integer division unit. Compilers then implement division by a call to a special function supplied by the processor designers, which implements division by a loop producing one bit of quotient per iteration. This hinders compiler optimizations and results in non-constant time computation, which is a problem in some applications. We advocate instead using the processor's floating-point unit, and propose code that the compiler can easily interleave with other computations. We fully proved the correctness of our algorithm, which mixes floating-point and fixed-bitwidth integer computations, using the Coq proof assistant and successfully integrated it into the CompCert formally verified compiler.
We present algorithms for performing the five elementary arithmetic operations (+, -, x, divided by, and root) in floatingpointarithmetic with stochastic rounding, and demonstrate the value of these algorithms by di...
详细信息
We present algorithms for performing the five elementary arithmetic operations (+, -, x, divided by, and root) in floatingpointarithmetic with stochastic rounding, and demonstrate the value of these algorithms by discussing various applications where stochastic rounding is beneficial. The algorithms require that the hardware be compliant with the IEEE 754 floating-point standard and that a floating-point pseudorandom number generator be available. The goal of these techniques is to emulate stochastic rounding when the underlying hardware does not support this rounding mode, as is the case for most existing CPUs and GPUs. By simulating stochastic rounding in software, one has the possibility to explore the behavior of this rounding mode and develop new algorithms even without having access to hardware implementing stochastic rounding- once such hardware becomes available, it suffices to replace the proposed algorithms by calls to the corresponding hardware routines. When stochastically rounding double precision operations, the algorithms we propose are between 7.3 and 19 times faster than the implementations that use the GNU MPFR library to simulate extended precision. We test our algorithms on various tasks, including summation algorithms and solvers for ordinary differential equations, where stochastic rounding is expected to bring advantages.
With the advancements in image processing and machine learning, greater challenges have been posed to parallel computing, especially in the realm of floating-point arithmetic. In the face of increasingly complex appli...
详细信息
ISBN:
(数字)9798331528850
ISBN:
(纸本)9798331528867
With the advancements in image processing and machine learning, greater challenges have been posed to parallel computing, especially in the realm of floating-point arithmetic. In the face of increasingly complex application scenarios, single-precision floating-point units (FPUs) are becoming increasingly inadequate in terms of flexibility and versatility. Therefore, this paper proposes a design for a multi-precision floating-point unit. The FPU of this design supports multiple precision formats including fp32, fp16, fp8, and variable precision fp16, achieving high multi-precision flexibility. This design effectively reduces the calculation cycle while ensuring efficient execution performance, meeting the diverse needs for calculation accuracy in different application scenarios. In addition to basic arithmetic functions, this design also implements single instruction multiple data (SIMD) functionality, further enhancing the processing power and efficiency of the arithmetic unit. Meanwhile this design introduces a series of custom simd_fmt instructions, expanding the RISC-V instruction set to enable it to support a wider range of computational operations. These instructions exhibit significant performance advantages when dealing with multithreaded tasks and vector operations.
Efficient multiple precision linear numerical computation libraries such as MPLAPACK are critical in dealing with ill-conditioned problems. Specifically, there are optimization methods for matrix multiplication, such ...
详细信息
ISBN:
(数字)9798350316926
ISBN:
(纸本)9798350316933
Efficient multiple precision linear numerical computation libraries such as MPLAPACK are critical in dealing with ill-conditioned problems. Specifically, there are optimization methods for matrix multiplication, such as the Strassen algorithm and the Ozaki scheme, which can be used to speed up computation. For complex matrix multiplication, the 3M method can also be used, which requires only three multiplications of real matrices, instead of the 4M method, which requires four multiplications of real matrices. In this study, we extend these optimization methods to arbitrary precision complex matrix multiplication and verify the possible increase in computation speed through benchmark tests. The optimization methods are also applied to complex LU decomposition using matrix multiplication to demonstrate that the Ozaki scheme can be used to achieve higher computation speeds.
暂无评论