检索结果-内蒙古大学图书馆

Design and architecture of low-latency high-speed turbo decoders

ETRI JOURNAL 2005年第5期27卷 525-532页

作者： Jung, JW Lee, IK Choi, DG Jeong, JH Kim, KM Choi, EA Oh, DG Korea Maritime Univ Dept Radio & Informat Commun Pusan South Korea ETRI Digital Broadcasting Div Taejon South Korea

In this paper, we propose and present implementation results of a high-speed turbo decoding algorithm. The latency caused by (de)interleaving and iterative decoding in a conventional maximum a posteriori turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is from the combination of the radix-4, center to top, parallel decoding, and early-stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real-time wireless communication services. The proposed scheme results in a slight degradation in bit error rate performance for large block sizes because the effective interleaver size in a radix-4 implementation is reduced to half, relative to the conventional method. To prove the latency reduction, we implemented the proposed scheme on a field-programmable gate array and compared its decoding speed with that of a conventional decoder. The results show an improvement of at least five fold for a single iteration of turbo decoding.

关键词： turbo code radix-4 center-to-top parallel decoding early-stop FPGA

来源：评论

学校读者我要写书评

暂无评论

Periodic Pattern Coding for Last Level Cache Data Compression

引用

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES 2013年第12期E96A卷 2351-2359页

作者： Kaneko, Haruhiko Tokyo Inst Technol Dept Comp Sci Tokyo 1528552 Japan

In spite of continuous improvement of computational power of multi/many-core processors, the memory access performance of the processors has not been improved sufficiently, and thus the overall performance of recent processors is often restricted by the delay of off-chip memory accesses. Low-delay data compression for last level cache (LLC) would be effective to improve the processor performance because the compression increases the effective size of LLC, and thus reduces the number of off-chip memory accesses. This paper proposes a novel data compression method suitable for high-speed parallel decoding in the LLC. Since cache line data often have periodicity of certain lengths, such as 32- or 64-bit instructions, 32-bit integers, and 64-bit floating point numbers, an information word is encoded as a base pattern and a differential pattern between the original word and the base pattern. Evaluation using a GPU simulator shows that the compression ratio of the proposed coding is comparable to LZSS coding and X-Match Pro and superior to other conventional compression algorithms for cache memories. Also this paper presents an experimental decoder designed for ASIC, and the synthesized result shows that the decoder can decompress cache line data of length 32 bytes in four clock cycles. Evaluation of the IPC on the GPU simulator shows that, for several benchmark programs, the IPC achieved by the proposed coding is higher than that by the conventional B Delta I coding, where the maximum improvement of the IPC is 20%.

关键词： cache memory compression parallel decoding GPGPU de-compression delay

来源：评论

学校读者我要写书评

暂无评论

Improving the redundancy of Knuth's balancing scheme for packet transmission systems

引用

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES 2019年第4期27卷 2579-2589页

作者： Mambou, Elie Ngomseu Esenogho, Ebenezer Ferreira, Hendrik Univ Johannesburg Dept Elect & Elect Engn Auckland Pk South Africa

A simple scheme was proposed by Knuth to generate binary balanced codewords from any information word. However, this method is limited in the sense that its redundancy is twice that of the full sets of balanced codes. The gap between Knuth's algorithm's redundancy and that of the full sets of balanced codes is significantly considerable. This paper attempts to reduce that gap. Furthermore, many constructions assume that a full balancing can be performed without showing the steps. A full balancing refers to the overall balancing of the encoded information together with the prefix. We propose an efficient way to perform a full balancing scheme that does not make use of lookup tables or enumerative coding.

关键词： Balanced codes binary word parallel decoding prefix coding full balancing

来源：评论

学校读者我要写书评

暂无评论

HIGH DEFINITION IEEE AVS DECODER ON ARM NEON PLATFORM

HIGH DEFINITION IEEE AVS DECODER ON ARM NEON PLATFORM

引用

IEEE International Conference on Image Processing

作者： Ronggang Wang Jie Wan Wenmin Wang Zhenyu Wang Shengfu Dong Wen Gao School of Electronic and Computer Engineering Peking University Shenzhen Graduate School

ISBN: (纸本)9781479923427

Nowadays, mobile devices are capable of displaying video up to HD resolution. In this paper, we propose two acceleration strategies for Audio Video coding Standard (AVS) software decoder on multi-core ARM NEON platform. Firstly, data level parallelism is utilized to effectively use the SIMD capability of NEON and key modules are redesigned to make them SIMD friendly. Secondly, a macroblock level wavefront parallelism is designed based on the decoding dependencies among macroblocks to utilize the processing capability of multiple cores. Experiment results show that AVS (IEEE 1857) HD video stream can be decoded in real-time by applying the proposed two acceleration strategies.

关键词： Video decoder parallel decoding IEEE 1857 AVS ARM NEON SIMD Neon AVS single instruction multiple datastream Video Decoder decoders Critical block macro block

来源：评论

学校读者我要写书评

暂无评论

A Novel parallel Turbo Coding Technique Based on Frame Split and Trellis Terminating

A Novel Parallel Turbo Coding Technique Based on Frame Split...

引用

Proceedings of The Fourth International Conference on parallel and Distribyted Computing,Applications and Technologies(第四届并行与分布式计算应用与技术国际会议)

作者： Ke Wan Qingchun Chen Pingzhi Fan Southwest Jiaotong University Chengdu 610031 China

ISBN: (纸本)0780378407

In this paper, a new parallel Turbo encoding and decoding technique is introduced. In this technique, a long information data frame is first divided into sub-blocks which are then encoded with trellis terminating and decoded by parallel multiple SISO modules. It is shown that, at a slight increase in hardware complexity and a slight loss in the transmission efficiency due to the extra terminating bits appended, the proposed scheme can effectively reduce the decoding delay, and at the same time achieve noticeably better error performance compared with the regular schemes, especially in high code rate situation.

关键词： Turbo codes parallel decoding Trellis terminating

来源：评论

学校读者我要写书评

暂无评论

Hybrid parallelization for HEVC Decoder

Hybrid Parallelization for HEVC Decoder

引用

International Congress on Image and Signal Processing

作者： Hyunho Jo Donggyu Sim Byeungwoo Jeon Computer Engineering Kwangwoon University Seoul Korea School of Electronic and Electrical Engineering Sungkyunkwan University Suwon Korea

ISBN: (纸本)9781479927654

This paper presents a new hybrid parallelization method for High Efficiency Video Coding (HEVC) decoder. The proposed method groups HEVC decoding modules into entropy decoding, pixel decoding, and in-loop filtering parts for optimal parallelization considering the characteristic of all the parts. The proposed method employs coding tree unit (CTU)-level 2D wavefront for the pixel decoding part. To decrease the delay between the entropy decoding and pixel decoding, task level parallelism (TLP) is additionally employed for two parts. For the HEVC deblocking filter, CTU-level data level parallelism (DLP) with equally partitioned CTUs is proposed. In addition, CTU row-level DLP for sample adaptive offset (SAO) is proposed to achieve maximum parallel performance and to minimize the overhead of organizing a backup buffer. The experimental results show that the proposed approach for parallel deblocking filter achieved a speed-up of 5.4x and the parallel SAO approach achieved a speed-up of 3.7x maximally on the multi-core platform. Furthermore, the proposed parallel HEVC decoder shows a speed-up of 2.9x with 6 threads without any encoder parallel tools such as wavefront parallel processing (WPP) coding and picture partitioning with tile and slice segments.

关键词： component HEVC Video coding parallel decoding

来源：评论

学校读者我要写书评

暂无评论

A SCALABLE HASH SCHEDULER FOR decoding OF MULTIPLE H.264/AVC STREAMS ON MULTI-CORE ARCHITECTURE

A SCALABLE HASH SCHEDULER FOR DECODING OF MULTIPLE H.264/AVC...

引用

IEEE International Conference on Multimedia and Expo

作者： Dung Vu Jilong Kuang Laxmi Bhuyan Computer Science & Engineering Department University of California

ISBN: (纸本)9781479947607

Existing scheduling schemes for decoding H.264/AVC multiple streams on multi-core are largely limited by ineffective use of multi-core architecture. Among the reasons are inefficient load balancing, in which common load metrics (e.g. tasks, frames, bytes) are unable to correctly reflect processing load at cores, unscalability of scheduling algorithms for a large scale multi-core, and bottlenecks at schedulers for multi-stream decoding. In this paper, we propose a scalable adaptive Highest Random Weight (HA-HRW) hash scheduler for distributed shared memory multi-core architecture considering the following: 1) memory access and core/cache topology of the multi-core architecture;2) appropriate processing time load metric to enforce a true load balancing;3) hierarchical parallel scheduling to decode multiple streams simultaneously;4) locality characteristics of processing unit candidate to limit search within neighboring cores to enable scalable scheduling. We implement and evaluate our approach on a 32-core SGI server with realistic workload. Comparing with existing schemes, our scheme achieves higher throughput, better load balancing, better CPU utilization, and no jitter problem. Our scheme scales with multi-core and multiple streams as its time complexity is O(1).

关键词： H.264/AVC parallel decoding Multiple streams Highest Random Weight

来源：评论

学校读者我要写书评

暂无评论

A Gb/s parallel Block-based Viterbi Decoder for Convolutional Codes on GPU

A Gb/s Parallel Block-based Viterbi Decoder for Convolutiona...

引用

International Conference on Wireless Communications and Signal Processing

作者： Hao Peng Rongke Liu Yi Hou Ling Zhao School of Electrical and Information Engineering Beihang University

ISBN: (纸本)9781509028610

In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of the trellis is exploited to reduce the metric computation. Based on the compute unified device architecture (CUDA), two kernels with different parallelism are designed to map two decoding phases. Moreover, the optimal design of data structures for several kinds of intermediate information are presented, to improve the efficiency of internal memory transactions. Experimental results demonstrate that the proposed decoder achieves high throughput of 598Mbps on NVIDIA GTX580 and 1802Mbps on GTX980 for the 64-state convolutional code, which are 1.5 times speedup compared to the existing fastest works on GPUs.

关键词： CUDA Convolutional codes Viterbi algorithm parallel decoding SDR

来源：评论

学校读者我要写书评

暂无评论

A Novel parallel Turbo Coding Technique Based on Frame Split and Trellis Terminating

A Novel Parallel Turbo Coding Technique Based on Frame Split...

引用

The Fourth International Conference on parallel and Distributed Computing, Applications and Technologies

作者： Ke Wan Qingchun Chen Pingzhi Fan Southwest Jiaotong University

In this paper,a new parallel Turbo encoding and decoding technique is *** this technique,a long information data frame is first divided into sub-blocks which are then encoded with trellis terminating and decoded by parallel multiple SISO *** is shown that,at a slight increase in hardware complexity and a slight loss in the transmission efficiency due to the extra terminating bits appended,the proposed scheme can effectively reduce the decoding delay,and at the same time achieve noticeably better error performance compared with the regular schemes,especially in high code rate situation.

关键词： Turbo codes parallel decoding Trellis terminating

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：