In this letter, we present two mechanisms based on soft-output Viterbi algorithm (SOVA) technique that achieve a performance close to maximum a posteriori (MAP) decoding. First, we propose a new technique called alter...
详细信息
In this letter, we present two mechanisms based on soft-output Viterbi algorithm (SOVA) technique that achieve a performance close to maximum a posteriori (MAP) decoding. First, we propose a new technique called alternating direction SOVA, which allows for improving the performance of SOVA in similar terms as bidirectional SOVA while barely affecting the area complexity. Then, we introduce the idea of parallel turbo decoding based on path metric/state border exchange. Unlike other techniques, overlapping between adjacent subblocks is not required, thus reducing the decoding latency dramatically. We evaluate their performance in the context of 3GPP-LTE, showing that the combination of both mechanisms achieves a BER performance close to parallel MAP decoding while having a lower complexity.
This paper presents the main methods used for sorting when parallel decoding architecture is used for Long Term Evolution (LTE) systems turbo codes. Usually, the parallelization factor N represents the number of requi...
详细信息
ISBN:
(纸本)9781467374880
This paper presents the main methods used for sorting when parallel decoding architecture is used for Long Term Evolution (LTE) systems turbo codes. Usually, the parallelization factor N represents the number of required interleavers. We propose a decoding architecture with only one Quadratic Permutation Polynomial (QPP) interleaver. All the interleaved addresses are placed on the same memory location, as a consequence of QPP interleaver algebraic properties, but sorting is needed before sending the correct data to each decoder unit. The sorting methods are compared in order to find the most suitable one for Field Programmable Gate Array (FPGA) implementation. Even-odd merge sorting method is selected, and the obtained results are provided in terms of occupied resources and maximum speed.
A context adaptive variable length coding (CAVLC) decoder do not know the exact start position of the k-th syntax element in a bitstream until it finishes parsing the (k-1)-th syntax element. It makes a parallel CAVLC...
详细信息
ISBN:
(纸本)9780819484086
A context adaptive variable length coding (CAVLC) decoder do not know the exact start position of the k-th syntax element in a bitstream until it finishes parsing the (k-1)-th syntax element. It makes a parallel CAVLC decoding difficult. It significantly increases implementation cost to predict the exact start position of a syntax element prior to parsing its previous one. In this paper, we propose a new bitstream structure to concurrently access multiple syntax elements for parallel CAVLC decoding. The method divides a bit-stream into N kinds of segments whose size is M bits and puts syntax elements into the segments, based on a proposed rule. Then, a CAVLC decoder can simultaneously access N segments to read N syntax elements from a single bitstream and decode them in parallel. This technique increases the speed of CAVLC decoding by up to N times. Since the method just rearranges the generated bitstream, it does not affect coding efficiency. Experimental results show that the proposed algorithm significantly increases decoding speed.
Quadratic permutation polynomial (QPP) interleaver has the advantage of contention-free for parallel memory access and has been adopted in the 3GPP LTE for turbo coding. Conventional implementations of the QPP interle...
详细信息
ISBN:
(纸本)9781424425198
Quadratic permutation polynomial (QPP) interleaver has the advantage of contention-free for parallel memory access and has been adopted in the 3GPP LTE for turbo coding. Conventional implementations of the QPP interleaver based on the look-up table or on-line calculation usually result in large circuit area or higher clock rate for parallel turbo decoding. In this paper, an architecture design of QPP interleaver for parallel turbo decoding is presented which can provide parallel memory access without extra storage of interleaving patterns or the increment of clock rate compared with the conventional approaches. The proposed design is also reconfigurable for variable interleaver lengths.
parallel decoding of Turbo codes is vital to the applications with very high data rates. There are mainly two existing methods for the parallel decoding of turbo codes: one is to have the sub-blocks being overlapped (...
详细信息
parallel decoding of Turbo codes is vital to the applications with very high data rates. There are mainly two existing methods for the parallel decoding of turbo codes: one is to have the sub-blocks being overlapped (OL);the other is to store the intermediate information of last iteration at the sub-block boundary (SBI). The overlapping in OL methods will slow down the decoding speed while the SBI requires some extra memory. In this paper, we present a new method which is essentially the combination of the OL and SBI but with two new features being introduced: (1) instead of storing the full information of the boundary distribution as SBI does, the proposed method only stores the index of the most probable state at the boundary and a reliability metric for initialization;(2) the boundary positions of the sub-blocks are moving in each iteration of the turbo decoding process. The proposed method outperforms the existing methods and is flexible in design and implementation.
In order to efectively reduce the “decoding latency” of successive cancellation decoding aided Fano(SC-Fano) decoder employed in polarization-adjusted convolutional(PAC) codes, a fast parallel SCFano decoding algori...
详细信息
In order to efectively reduce the “decoding latency” of successive cancellation decoding aided Fano(SC-Fano) decoder employed in polarization-adjusted convolutional(PAC) codes, a fast parallel SCFano decoding algorithm is designed, which leverages multiple individual decoding components. Critical challenges arise from how to appropriately initialize the thresholds of diferent SC-Fano decoding components and how will these decoding components cooperate with each other. After appropriately overcoming these challenges, our design is capable of halving the “decoding latency” without incurring any noticeable frame error rate(FER) performance degradation, meanwhile, the overall computational complexity is normally constrained below one and a half times of that required by conventional SC-Fano decoders in simulated typical configurations.
To reduce the computational decoding delay of turbo codes, we propose a parallel algorithm for maximum a posteriori (MAP) decoders. We divide a whole noisy codeword into sub-blocks and use multiple processors to perfo...
详细信息
To reduce the computational decoding delay of turbo codes, we propose a parallel algorithm for maximum a posteriori (MAP) decoders. We divide a whole noisy codeword into sub-blocks and use multiple processors to perform sub-block MAP decoding in parallel. Unlike the previously proposed approach with sub-block overlapping, we utilize the forward and backward variables computed in the previous iteration to provide boundary distributions for each sub-block MAP decoder. Our scheme depicts asymptotically optimal performance in the sense that the BER is the same as that of the regular turbo decoder.
This letter proposes a parallel belief propagation algorithm for fast iterative decoding of rate compatible modulation. In this proposed algorithm, the processing at symbol node is partitioned into several low complex...
详细信息
This letter proposes a parallel belief propagation algorithm for fast iterative decoding of rate compatible modulation. In this proposed algorithm, the processing at symbol node is partitioned into several low complexity sub-processors. All sub-processors have similar structures and work in parallel without cross connections among them. Numerical and simulation results show that proper partition results in significant reduction of the computational complexity without decoding performance degradation.
During the check node (CN) update, the elements of input message vectors are redundant for the output message vectors. Hence, in this paper, we exactly select from the input message vectors the elements, which really ...
详细信息
During the check node (CN) update, the elements of input message vectors are redundant for the output message vectors. Hence, in this paper, we exactly select from the input message vectors the elements, which really have contributions to the error-correction performance and constitute the minimum set for the CN update. With adoption of the forward and backward (FB) scheme, an adaptive minimum-set min-sum algorithm (AMSA) is proposed to reduce the computation complexity of the FB process. In order to concurrently update the output vectors belonging to the same CN, we present a parallel minimum-set min-sum algorithm (PMSA) with lower memory complexity than the AMSA. Compared with the min-sum algorithms, the two proposed minimum-set based algorithms introduce no error performance loss.
This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with ...
详细信息
This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from $2.06\times $ to $2.93\times $ .
暂无评论