It is noted that signalprocessingdesigns for real-time large-scale systems are increasingly confronted with two conflicting objectives. The traditional objective of optimal design in low signal-to-noise ratio enviro...
详细信息
It is noted that signalprocessingdesigns for real-time large-scale systems are increasingly confronted with two conflicting objectives. The traditional objective of optimal design in low signal-to-noise ratio environments is confronted with the need for simplicity in implementation and speed of computation. The inclusion of high throughput and efficient hardware utilization as constraints on digital filter designs is considered. In particular, implementation of the design via an array processor is introduced. The concept of fast processing becomes synonymous with high throughout and efficient implementation on such a device. Using an array interpretation of the FFT structure, the retention of this highly efficient structure in a general design setting is demonstrated. For a typical signal extraction design, a constrained least-squares minimization is introduced to determine optimal enhancing filters with highly efficient array implementation.< >
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity i...
详细信息
ISBN:
(纸本)0780377958
In this paper receiver synthesis for nonlinearly amplified orthogonal frequency division multiplexing (OFDM) signal is presented. Optimal maximum-likelihood (ML) receiver is proposed and its computational complexity is discussed. Further, sub-optimal receiver suitable for OFDM signals with large number of sub-carriers and high-order constellation is presented. The performance of optimal and sub-optimal receiver for nonlinearly amplified m-QAM-OFDM signal is studied by means of simulation.
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion...
详细信息
ISBN:
(纸本)0780393333
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (Sum of Absolute Difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multiblock SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of Me for D1 size (720*480) video is possible using a 720MHz TMS320C6416 digital signal processor.
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hy...
详细信息
ISBN:
(纸本)9781538604465
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signalprocessing applications. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), which is a recently-introduced software tool for implementing scalable workflows for high performance computing applications. HMBE integrates advanced design optimization techniques provided in HTGS with model-based approaches that are founded on dataflow principles. Such integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process. In this paper, we present HMBE with an emphasis on novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE through a case study involving an image stitching application for large scale microscopy images.
We present two case studies of different architectures for H.264 video decoder. The objective of this case study is to show the design methodology that can maximize the flexibility of video decoder. First, H.264 is de...
详细信息
ISBN:
(纸本)9781424403820
We present two case studies of different architectures for H.264 video decoder. The objective of this case study is to show the design methodology that can maximize the flexibility of video decoder. First, H.264 is designed based on configurable processor. The configurable processor was used to complement the existing functional units with instruction extensions for the H.264 hardware kernel. Secondly, we profile the H.264 application to capture the amount of data traffic among modules. We will use this information to guide the placement of H.264 hardware modules in the dataflow architecture. A simulated annealing based placement algorithm produces the final placement aiming to optimize the communication costs between the modules in a dataflow architecture. With both our design methodologies, emerging embedded applications requiring several GOPS to meet real-time constraints can be drafted within a reasonable amount of design time with maximum design flexibility
Presented in this paper is a low-complexity iris identification architecture built upon an enhanced periodicity transform, referred to as the prime subspace periodicity transform (PSPT). The proposed PSPT achieves eff...
详细信息
ISBN:
(纸本)0780393333
Presented in this paper is a low-complexity iris identification architecture built upon an enhanced periodicity transform, referred to as the prime subspace periodicity transform (PSPT). The proposed PSPT achieves efficient computation by partitioning periodic subspaces into hierarchical prime subspaces. Data decomposition at prime subspaces can be implemented in a simple manner by exploiting the redundancy in correlation computation. The proposed PSPT establishes a theoretical foundation for our work in developing integrated biometric systems for identity authentication. A PSPT-based iris identification architecture is developed that achieves 32.1%-56.2% reduction in computational complexity. Experimental results demonstrate an efficient solution for reliable and accurate iris identification. The proposed PSPT algorithm in combination with architecture optimizations address the challenges in single-chip implementation of biometric systems.
In this paper, a reduced-complexity, scalable implementation of LDPC decoder is presented. The decoder architecture in this paper is an improved version of [1, 2]. The new architecture makes the implementation of mult...
详细信息
ISBN:
(纸本)9781424403820
In this paper, a reduced-complexity, scalable implementation of LDPC decoder is presented. The decoder architecture in this paper is an improved version of [1, 2]. The new architecture makes the implementation of multiple code rates, multiple block sizes and multiple standards LDPC decoder very straightforward. As an example, we implemented a parameterized decoder that supports the LDPC code in ieee 802.16e standard, which requires code rates of 1/2, 2/3 and 3/4, with block sizes varying from 576 to 2304. The decoder is synthesized with Texas Instruments' 90 nm ASIC process technology, with a target operation frequency of 100 MHz, 15 decoding iterations, the maximum data rate is up to 256 Mbps.
This paper presents the implementation of a wireless multimedia DSP chip for mobile applications. The implemented DSP chip supports communication instructions for Viterbi, timing synchronization, etc. as well as multi...
详细信息
ISBN:
(纸本)0780377958
This paper presents the implementation of a wireless multimedia DSP chip for mobile applications. The implemented DSP chip supports communication instructions for Viterbi, timing synchronization, etc. as well as multimedia instructions. The DSP can handle variable length data and perforin four MACs in a cycle. The proposed DSP employs parallel processing techniques, such as SIMD, vector processing, DSP schemes and adopts low power features for wireless applications. The implemented DSP chip includes test circuits and various peripherals, such as DMA, bus arbitration, timer, etc. This chip has been modeled by Verilog HDL and implemented using the 0.35 mum HCB60 library. The total gate count excluding memory is about 170,000 gates and the clock frequency is 100 MHz.
In order to cope with the increasing number of functions that need to be implemented on a single chip as telecommunication products become more complex, a rapid trend towards programmable architectures as a base for d...
详细信息
ISBN:
(纸本)0780338065
In order to cope with the increasing number of functions that need to be implemented on a single chip as telecommunication products become more complex, a rapid trend towards programmable architectures as a base for digital signalprocessing (DSP) systems is occurring. The reason for this is that extremely complex algorithms and protocols must be implemented to economically use the available bandwidth for the next generation of wireless networks. The rapidly changing system requirements and design productivity and the intellectual property reuse are also promoting this trend.
A novel on-line Mixed-Scaling-Rotation CORDIC (MSR-CORDIC) VLSI architecture is proposed. This architecture not only maintains the scaling-free property of the original,MSR-CORDIC, but also achieves the target of on-l...
详细信息
ISBN:
(纸本)9781424403820
A novel on-line Mixed-Scaling-Rotation CORDIC (MSR-CORDIC) VLSI architecture is proposed. This architecture not only maintains the scaling-free property of the original,MSR-CORDIC, but also achieves the target of on-line angle computation. Compared with other existing CORDIC solutions, the proposed architecture is faster and more cost-efficient, especially for QRD-RLS filtering systems. Moreover, this on-line MSR-CORDIC can also be adopted by other rotation-based DSP applications.
暂无评论