The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that aug...
详细信息
ISBN:
(纸本)0780393333
The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that augments a hardware hard-decision decoder with soft-decision decoding software on an embedded processor. In this paper we investigate the implementation of the interpolation step of the Koetter-Vardy algorithm on SIMD processor architectures. A parallelization of the algorithm is given using the K'th order Horner's rule for parallel polynomial evaluation. The SIMD algorithm has a running time 2.5 to 4 times faster than a serial implementation on a DSP processor. To gain further speedup we propose a merged-SIMD architecture that calculates the Hasse derivative in parallel with the polynomial updates.
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implem...
详细信息
ISBN:
(纸本)9781538604465
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implementations are obtained by developing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in resource utilization and increase in computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and performance requirements of the system architecture. The implementation solution presented in the paper can efficiently scale up to 250MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 MegaAlignments/min using 36 S-W cores on the same FPGA for sequences of 160 x 100 nucleotides.
This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N-...
详细信息
This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N-2/12 can be easily achieved for an N x N image with controlled increase of hardware cost. Compared with recently published 2-D DWT architectures with computation time of N-2/3 and 2N(2)/3, the proposed designs can also save a large amount of multipliers and/or storage elements. It can also be used to implement those 2-D DWT traditionally suitable for lifting or flipping-based designs, such as (9,7) and (6,10) DWT. The throughput rate can be improved by a factor of 4 by the proposed approach, but the hardware cost increases by a factor of around 3. Furthermore, the proposed designs have very simple control signals, regular structures and 100% hardware utilization for continuous images.
The purpose of this work is to show the importance of an adequate generation of the excitation signal for the performance of bandwidth extension algorithms for speech signals. Two previously proposed methods of obtain...
详细信息
ISBN:
(纸本)0780393333
The purpose of this work is to show the importance of an adequate generation of the excitation signal for the performance of bandwidth extension algorithms for speech signals. Two previously proposed methods of obtaining the excitation signal are analyzed and, based on this analysis, a new method is proposed. The influence of each method in the quality of the reconstructed wideband speech signal is evaluated by quantitative parameters of speech quality.
This paper details the design of a new high-speed pipelined elliptic curve cryptography (ECC) application specific instruction set processor (ASIP) using field programmable gate array (FPGA) technology. A six-stage pi...
详细信息
ISBN:
(纸本)9781424403820
This paper details the design of a new high-speed pipelined elliptic curve cryptography (ECC) application specific instruction set processor (ASIP) using field programmable gate array (FPGA) technology. A six-stage pipeline has been applied to the design, and pipeline stalls are avoided via instruction reordering and data forwarding. Three complex instructions are introduced to reduce the latency by reducing the overall number of instructions. The new processor shows improvements over previously reported designs in terms of throughput, latency and area. The higher clock frequencies and low latencies lead to the fastest point multiplication time reported in the literature. An FPGA implementation over GF(2(163)) is shown, which achieves a point multiplication time of 36.77 microseconds at 77.01 MHz on a Xilinx Virtex-E device- over 50% faster than the best figure previously reported.
It is noted that signalprocessingdesigns for real-time large-scale systems are increasingly confronted with two conflicting objectives. The traditional objective of optimal design in low signal-to-noise ratio enviro...
详细信息
It is noted that signalprocessingdesigns for real-time large-scale systems are increasingly confronted with two conflicting objectives. The traditional objective of optimal design in low signal-to-noise ratio environments is confronted with the need for simplicity in implementation and speed of computation. The inclusion of high throughput and efficient hardware utilization as constraints on digital filter designs is considered. In particular, implementation of the design via an array processor is introduced. The concept of fast processing becomes synonymous with high throughout and efficient implementation on such a device. Using an array interpretation of the FFT structure, the retention of this highly efficient structure in a general design setting is demonstrated. For a typical signal extraction design, a constrained least-squares minimization is introduced to determine optimal enhancing filters with highly efficient array implementation.< >
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity i...
详细信息
ISBN:
(纸本)0780377958
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity is discussed. Further, sub-optimal receiver suitable for OFDM signals with large number of sub-carriers and high-order constellation is presented. The performance of optimal and sub-optimal receiver for nonlinearly amplified m-QAM-OFDM signal is studied by means of simulation.
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion...
详细信息
ISBN:
(纸本)0780393333
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (Sum of Absolute Difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multiblock SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of Me for D1 size (720*480) video is possible using a 720MHz TMS320C6416 digital signal processor.
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hy...
详细信息
ISBN:
(纸本)9781538604465
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), which is a recently-introduced software tool for implementing scalable workflows for high performance computing applications. HMBE integrates advanced design optimization techniques provided in HTGS with model-based approaches that are founded on dataflow principles. Such integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process. In this paper, we present HMBE with an emphasis on novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE through a case study involving an image stitching application for large scale microscopy images.
Presented in this paper is a low-complexity iris identification architecture built upon an enhanced periodicity transform, referred to as the prime subspace periodicity transform (PSPT). The proposed PSPT achieves eff...
详细信息
ISBN:
(纸本)0780393333
Presented in this paper is a low-complexity iris identification architecture built upon an enhanced periodicity transform, referred to as the prime subspace periodicity transform (PSPT). The proposed PSPT achieves efficient computation by partitioning periodic subspaces into hierarchical prime subspaces. Data decomposition at prime subspaces can be implemented in a simple manner by exploiting the redundancy in correlation computation. The proposed PSPT establishes a theoretical foundation for our work in developing integrated biometric systems for identity authentication. A PSPT-based iris identification architecture is developed that achieves 32.1%-56.2% reduction in computational complexity. Experimental results demonstrate an efficient solution for reliable and accurate iris identification. The proposed PSPT algorithm in combination with architecture optimizations address the challenges in single-chip implementation of biometric systems.
暂无评论