检索结果-内蒙古大学图书馆

parallel implementation of the 2D discrete wavelet transform on Graphics Processing Units:: Filter Bank versus Lifting

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2008年第3期19卷 299-310页

作者： Tenllado, Christian Setoain, Javier Prieto, Manuel Pinuel, Luis Tirado, Francisco Univ Complutense Madrid Fac Ciencias Fis Dept Comp Architecture ArTeCS Grp E-28040 Madrid Spain

The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as Filter Bank Scheme (FBS) and Lifting Scheme (LS), and have always concluded that LS is the most efficient option. However, there is no such study on streaming processors such as modern Graphics Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current-generation GPUs. In our experiments, the actual FBS gains range between 10 percent and 140 percent, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future-generation GPUs.

关键词： graphics processors parallel processing parallel algorithms parallel and vector implementations wavelets and fractals SIMD processors optimization parallel discrete wavelet transform lifting filter bank GPU stream processors

来源：评论

学校读者我要写书评

暂无评论

parallel cryptographic arithmetic using a redundant Montgomery representation

引用

IEEE TRANSACTIONS ON COMPUTERS 2004年第11期53卷 1474-1482页

作者： Page, D Smart, NP Univ Bristol Dept Comp Sci Bristol BS8 1UB Avon England

We describe how using a redundant Montgomery representation allows for high-performance SIMD-based implementations of RSA and elliptic curve cryptography. This is in addition to the known benefits of immunity from timing attacks afforded by the use of such a representation. We present some preliminary implementation timings using the SSE2 instruction set on a Pentium 4 processor and show that an SIMD parallel implementation of RSA can be around twice as fast as traditional sequential code. This is especially useful given the larger 2,048 bit RSA keys which are now being proposed for standard security levels. Finally, we remark on other application areas that improve the security of our work in the context of side-channel analysis while maintaining high performance.

关键词： public key cryptosystems algorithm design and analysis parallel and vector implementations performance measures

来源：评论

学校读者我要写书评

暂无评论

Numerical engineering:: design of PDE black-box solvers

引用

MATHEMATICS AND COMPUTERS IN SIMULATION 2000年第4-5期54卷 269-277页

作者： Schönauer, W Univ Karlsruhe Rech Zentrum D-76128 Karlsruhe Germany

The design of PDE black-box solvers (for nonlinear systems of elliptic and parabolic PDEs) needs many compromises between efficiency and robustness which we call 'Numerical Engineering'. The requirements for a black-box solver are formulated and the way how to meet them is presented, guided by many years of practical experience in the design of the program packages FIDISOL/CADSOL, VECFEM and LINSOL. The basic approach to the new finite difference element method (FDEM) program package, an FDM on an unstructured FEM grid, is discussed. The common feature of all these methods is the error equation that allows a transparent balancing of all errors. The discretization errors are estimated from difference formulae of different consistency orders. The error balancing must include the iterative solution of the large and sparse linear systems by the LINSOL program package. The real challenge is the parallelization on distributed memory parallel computers which is solved by corresponding data structures with optimal communication patterns and redistribution after each grid refinement cycle. (C) 2000 IMACS. Published by Elsevier Science B.V. All rights reserved.

关键词： G4 mathematical software algorithm design and analysis efficiency parallel and vector implementations reliability and robustness

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：