检索结果-内蒙古大学图书馆

22nd International Conference on High Performance computing

作者： Enfedaque, Pablo Auli-Llinas, Francesc Moure, Juan C. Univ Autonoma Barcelona Dep Informat & Commun Engn Bellaterra Spain Univ Autonoma Barcelona Dept Comp Architecture & Operating Syst Bellaterra Spain

ISBN: (纸本)9781467384889

The main difficulty to implement modern image coding systems in a GPU is that the algorithms employed in the core of the coding scheme are inherently sequential. We recently proposed bitplane image coding with parallel coefficient processing (BPC-PaCo), a coding scheme that, contrarily to most systems, permits the processing of multiple coefficients of the image in parallel. This enables the use of simd computing, ideal for its implementation in a GPU. This paper introduces and evaluates the GPU implementation of BPC-PaCo employing two different strategies that tradeoff computational throughput and compression efficiency. The proposed implementation is compared to the best CPU and GPU implementations of JPEG2000, the state-of-the-art image compression standard. Experimental results indicate that BPC-PaCo achieves a computational throughput that is an order of magnitude superior to that achieved with such implementations with a small reduction in coding efficiency.

关键词： image coding parallel architectures simd computing GPU

来源：评论

学校读者我要写书评

暂无评论

String searching with mismatches using AVX2 and AVX-512 instructions

引用

INFORMATION PROCESSING LETTERS 2025年 189卷

作者： Chhabra, Tamanna Ghuman, Sukhpal Singh Tarhio, Jorma Sheridan Coll Fac Appl Sci & Technol Brampton ON Canada Aalto Univ Dept Comp Sci Espoo Finland

We present new algorithms for the k mismatches version of approximate string matching. Our algorithms utilize the simd (Single Instruction Multiple Data) instruction set extensions, particularly AVX2 and AVX-512 instructions. Our approach is an extension of an earlier algorithm for exact string matching with SSE2 and AVX2. In addition, we modify this exact string matching algorithm to work with AVX-512. We demonstrate the competitiveness of our solutions by practical experiments. Our algorithms outperform earlier algorithms for both exact and approximate string matching on various benchmark data sets.

关键词： Approximate string matching Hamming distance Exact string matching simd computing Experimental comparison

来源：评论

学校读者我要写书评

暂无评论

A DIRECTIONAL EQUISPACED INTERPOLATION-BASED FAST MULTIPOLE METHOD FOR OSCILLATORY KERNELS

引用

SIAM JOURNAL ON SCIENTIFIC computing 2023年第1期45卷 C20-C48页

作者： Chollet, Igor Claeys, Xavier Fortin, Pierre Grigori, Laura INRIA Alpines Inst Sci Calcul & Donnees ISCD Paris France Sorbonne Univ Inria Equipe ALPINES Lab Jacques Louis Lions F-75005 Paris France Sorbonne Univ CNRS LIP6 F-75005 Paris France Univ Lille CNRS Cent Lille UMR CRIStAL 9189 F-59000 Lille France

Fast multipole methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from boundary integral equations (BIEs) in acoustics or electromagnetics. However, their cost strongly increases in the high-frequency regime. This paper introduces a new directional FMM for oscillatory kernels (defmm: directional equispaced interpolation-based fmm), whose precomputation and application are FFT-accelerated due to poly-nomial interpolations on equispaced grids. We demonstrate the consistency of our FFT approach and show how symmetries can be exploited in the Fourier domain. We also describe the algorithmic de-sign of defmm, well-suited for the BIE nonuniform particle distributions, and present performance optimizations on one CPU core. Finally, we exhibit important performance gains on all test cases for defmm over a state-of-the-art FMM library for oscillatory kernels.

关键词： directional fast multipole method fast Fourier transform high performance com-puting symmetries simd computing

来源：评论

学校读者我要写书评

暂无评论

GPU Implementation of Bitplane Coding with Parallel Coefficient Processing for High Performance Image Compression

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017年第8期28卷 2272-2284页

作者： Enfedaque, Pablo Auli-Llinas, Francesc Moure, Juan Carlos Univ Autonoma Barcelona Dept Informat & Commun Engn E-08193 Barcelona Spain Univ Autonoma Barcelona Dept Comp Architecture & Operating Syst E-08193 Barcelona Spain

The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40x less energy for equivalent performance than state-of-the-art methods.

关键词： Image coding simd computing graphics processing unit (GPU) compute unified device architecture (CUDA)

来源：评论

学校读者我要写书评

暂无评论

SWAMP: Smith-Waterman using Associative Massive Parallelism

SWAMP: Smith-Waterman using Associative Massive Parallelism

引用

10th Workshop on Advances in Parallel and Distributed Computational Models/22nd IEEE International Parallel and Distributed Processing Symposium

作者： Steinfadt, Shannon Baker, Johnnie W. Kent State Univ Dept Comp Sci Kent OH 44242 USA

ISBN: (纸本)9781424416936

One of the most commonly used tools by computational biologists is some form of sequence alignment. Heuristic alignment algorithms developed for speed and their multiple results such as BLAST [1] and FASTA [2] are not a total replacement for the more rigorous but slower algorithms like Smith-Waterman [3]. The different techniques complement one another. A heuristic can filter dissimilar sequences from a large database such as GenBank [4] and the Smith-Waterman algorithm performs more detailed, in-depth alignment in a way not adequately handled by heuristic methods. An associative parallel Smith-Waterman algorithm has been improved and further parallelized. Analysis between different algorithms, different types of file input, and different input sizes have been performed and are reported here. The newly developed associative algorithm reduces the running time for rigorous pairwise local sequence alignment.

关键词： associative computing simd computing parallel algorithms sequence alignment

来源：评论

学校读者我要写书评

暂无评论

The use of nanoelectronic devices in highly parallel computing systems

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1998年第1期6卷 31-38页

作者： Fountain, TJ Duff, MJB Crawley, DG Tomlinson, CD Moffat, CD UCL Dept Phys & Astron Image Proc Grp London WC1E 7HN England

The continuing development of smaller electronic devices into the nanoelectronic regime offers great possibilities for the construction of highly parallel computers, This paper describes work designed to discover the best ways to take advantage of this opportunity, Simulated results are presented which indicate that improvements in clock rates of two orders of magnitude, and in packing density of three orders of magnitude, over the best current systems, should be attainable, These results apply to the class of data-parallel computers, and their attainment demands modifications to the design which are also described, Evaluation of the requirements of alternative classes of parallel architecture is currently under way, together with a study of the vitally important area of fault-tolerance.

关键词： fault tolerance high-performance computing nanotechnology resonant tunneling diode simd computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：