The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the...
详细信息
The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as Filter Bank Scheme (FBS) and Lifting Scheme (LS), and have always concluded that LS is the most efficient option. However, there is no such study on streaming processors such as modern Graphics Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current-generation GPUs. In our experiments, the actual FBS gains range between 10 percent and 140 percent, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future-generation GPUs.
We describe how using a redundant Montgomery representation allows for high-performance SIMD-based implementations of RSA and elliptic curve cryptography. This is in addition to the known benefits of immunity from tim...
详细信息
We describe how using a redundant Montgomery representation allows for high-performance SIMD-based implementations of RSA and elliptic curve cryptography. This is in addition to the known benefits of immunity from timing attacks afforded by the use of such a representation. We present some preliminary implementation timings using the SSE2 instruction set on a Pentium 4 processor and show that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code. This is especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels. Finally, we remark on other application areas that improve the security of our work in the context of side-channel analysis while maintaining high performance.
The design of PDE black-box solvers (for nonlinear systems of elliptic and parabolic PDEs) needs many compromises between efficiency and robustness which we call 'Numerical Engineering'. The requirements for a...
详细信息
The design of PDE black-box solvers (for nonlinear systems of elliptic and parabolic PDEs) needs many compromises between efficiency and robustness which we call 'Numerical Engineering'. The requirements for a black-box solver are formulated and the way how to meet them is presented, guided by many years of practical experience in the design of the program packages FIDISOL/CADSOL, VECFEM and LINSOL. The basic approach to the new finite difference element method (FDEM) program package, an FDM on an unstructured FEM grid, is discussed. The common feature of all these methods is the error equation that allows a transparent balancing of all errors. The discretization errors are estimated from difference formulae of different consistency orders. The error balancing must include the iterative solution of the large and sparse linear systems by the LINSOL program package. The real challenge is the parallelization on distributed memory parallel computers which is solved by corresponding data structures with optimal communication patterns and redistribution after each grid refinement cycle. (C) 2000 IMACS. Published by Elsevier Science B.V. All rights reserved.
暂无评论