Mobile video processing as defined in standards like MPEG-4 and H.263 contains a number of timeconsuming computations that cannot be efficiently executed on current hardware architectures. The authors recently introdu...
详细信息
ISBN:
(纸本)9780769520858
Mobile video processing as defined in standards like MPEG-4 and H.263 contains a number of timeconsuming computations that cannot be efficiently executed on current hardware architectures. The authors recently introduced a reconfigurable SoC platform that permits a low-power, high-throughput and flexible implementation of the motion estimation and DCT algorithms. The computations are done using domainspecific reconfigurable arrays that have demonstrated up to 75% reduction in power consumption when compared to generic FPGA architecture, which makes them suitable for portable devices. This paper presents and compares different configurations of the arrays to efficiently implementing DCT and motion estimation algorithms. A number of algorithms are mapped into the various reconfigurable fabrics demonstrating the flexibility of the new reconfigurable SoC architecture and its ability to support a number of implementations having different performance characteristics.
The proceedings contains 56 papers from the conference on SPIE: advancedsignalprocessingalgorithms, architectures, and implementations XI. The topics discussed include: modulation frequency and efficient audio codi...
详细信息
The proceedings contains 56 papers from the conference on SPIE: advancedsignalprocessingalgorithms, architectures, and implementations XI. The topics discussed include: modulation frequency and efficient audio coding;application of wavelet- and wavelet-packet-transform to human skin data;Wigner distribution and pulse propagation;time-frequency analysis using sidelobe apodization;spectral phase algorithm for detecting and estimating pitch;and minimum entropy approach to denoising time-frequency distributions.
Inter-symbol interference (ISI) and co-channel interference (CCI) are two major sources to signal impairment in mobile communications. To suppress both ISI and CCI, space-time adaptive processing (STAP) systems are sh...
详细信息
ISBN:
(纸本)0819450782
Inter-symbol interference (ISI) and co-channel interference (CCI) are two major sources to signal impairment in mobile communications. To suppress both ISI and CCI, space-time adaptive processing (STAP) systems are shown to be effective, leading to increased capacity and improved quality of service. The high complexity and slow convergence, however, are often the hurdles in practical implementation of the STAP systems. Several subband array implementations have been proposed for STAP over the past a few years. These methods are to provide optimal or sub-optimal steady state performance with reduced implementation complexity and improved convergence performance. The purpose of this paper is to investigate the steady state performance of subband arrays with different decimation rates and to derive analytical expressions of the minimum mean square error (MMSE). The discrete Fourier transform (DFT) based subband arrays and both unconstrained and constrained weight adaptations are considered.
This paper presents single-chip FPGA Rijndael algorithm implementations of the advanced Encryption Standard (AES) algorithm, Rijndael. In particular, the designs utilise look-up tables to implement the entire Rijndael...
详细信息
This paper presents single-chip FPGA Rijndael algorithm implementations of the advanced Encryption Standard (AES) algorithm, Rijndael. In particular, the designs utilise look-up tables to implement the entire Rijndael Round function. A comparison is provided between these designs and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, Field Programmable Gate Arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. In this paper, a Look-Up Table (LUT) methodology is introduced where complex and slow operations are replaced by simple LUTs. A LUT-based fully pipelined Rijndael implementation is described which has a pre-placement performance of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilised to implement only one of the Round function transformations, and 6 times faster than other previous single-chip implementations. Iterative Rijndael implementations based on the Look-Up-Table design approach are also discussed and prove faster than typical iterative implementations.
This document presents a methodology based on a signal algebra operator theoretic approach for the mathematical formulation of signalprocessingalgorithms and efficient systematic procedures for mapping these algorit...
This document presents a methodology based on a signal algebra operator theoretic approach for the mathematical formulation of signalprocessingalgorithms and efficient systematic procedures for mapping these algorithms to target hardware computing structures through iconic and functional programming techniques, and automatic core generation efforts. An algorithm development and implementation environment is described in this work as a central theme in studying DSP computing methods. This environment is an aggregate of the following items: a PC Workstation platform, MATLAB® tools, digital signalprocessing (DSP) microprocessor units, and field programmable gate array (FPGA) units. Special emphasis is given to the concepts of modularity and scalability during a hardware implementation. A main goal of this on going work is to establish formal links amongst the elements of the environment in order to assist in reducing the algorithm development and implementation time-line. The results presented here center on the formulation of a methodology for computing methods as an operator theoretic approach to the digital processing of signals, and on the study of the computing hardware structure and overall architecture of floating point DSP microprocessor units and FPGAs for the implementation of complex fast Fourier transform (FFT) cores. The new methodology presented in this work was successfully utilized for the generation of signalprocessing cores for DSP and FPGA hardware units in a signal algebra setting, with advantages such as improved latency time, and modular and reconfigurable features, which make the developed cores desirable for the implementation of more advanced digital signalprocessing applications.
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed archit...
详细信息
ISBN:
(纸本)0819445584
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter is presented, along with equations for computing the signal-to-noise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and number of truncated columns. We show that a 22.5 % reduction in area can be achieved for a 24-tap filter with 16-bit operands, 4 parallel multipliers, and 12 truncated columns. For this implementation, the average reduction error is only 9.18 x 10(-5) ulps, and the reduction error SNR is only 2.4 dB less than the roundoff SNR of an equivalent filter without truncation.
architectures for low-density parity-check (LDPC) decoders are discussed, with methods to reduce their complexity. Serial implementations similar to traditional microprocessor datapaths are compared against implementa...
详细信息
architectures for low-density parity-check (LDPC) decoders are discussed, with methods to reduce their complexity. Serial implementations similar to traditional microprocessor datapaths are compared against implementations with multiple processing elements that exploit the inherent parallelism in the decoding algorithm. Several classes of LDPC codes, such as those based on irregular random graphs and geometric properties of finite fields are evaluated in terms of their suitability for VLSI implementation and performance as measured by bit-error rate. Efficient realizations of low-density parity check decoders under area, power, and throughput constraints are of particular interest in the design of communications receivers.
暂无评论