In this paper we describe two efficient software implementations of bi-dimensional IDCT (Inverse Discrete Cosine Transform). Instead of using a traditional separation into eight horizontal and vertical mono-dimensiona...
详细信息
In this paper we describe two efficient software implementations of bi-dimensional IDCT (Inverse Discrete Cosine Transform). Instead of using a traditional separation into eight horizontal and vertical mono-dimensional IDCT stages, we apply a novel approach to directly represent the bi-dimensional IDCT into only eight mono-dimensional units followed by a network of addition and subtraction operations. We have then optimized this method in pure ANSI-C for 32-bit architecture VLIW (Very Long Instruction Word) processors. By arranging the network structure in a proper way to exploit sub-word parallelism and by defining totally new multimedia instructions, we have implemented a second version that is 23% more efficient than the previous one. Our fixed-point arithmetic IDCT implementations are fully compliant with the IEEE 1180 standard, as required by most of the video compression standards.
We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of ...
详细信息
We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the previous designs and our design are 14.45, 4.93, and 2.35, respectively, for 4 /spl times/ 4 matrix multiplication. The latency of one of the previous design is 0.57 /spl mu/s, while our design takes 0.15 /spl mu/s using 18% less area. The area of our designs is smaller by 11% - 46% compared with the best known systolic designs with the same latency for the matrices of sizes 3 /spl times/ 3 - 12 /spl times/ 12. The performance improvements tend to grow with the problem size.
This paper describes two new matrix transform algorithms for the Max-Log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional Max-Log-MAP algorith...
详细信息
ISBN:
(纸本)0780370805
This paper describes two new matrix transform algorithms for the Max-Log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional Max-Log-MAP algorithm are performed in parallel, and well formulated into a set of simple and regular matrix operations, which can therefore considerably speed up the decoding operations and reduce the computational complexity. The matrix Max-Log-MAP algorithms also maintain the advantage of the general logarithmic MAP like algorithms in avoiding complex numerical representation problems. They particularly facilitate the implementations of the logarithmic MAP like algorithms in special-purpose parallel processing VLSI hardware architectures. The matrix algorithms also allow simple implementations by using shift registers. The proposed implementation architectures for the matrix Max-Log-MAP decoding can effectively reduce the memory capacity and simplify the data accesses and transfers required by the conventional Max-Log-MAP as well as MAP algorithms.
Sonar beamforming is an ideal application for reconfigurable computing due to its high available levels of parallelism, relatively low sample rates, and modest word sizes. In this paper we describe a family of beamfor...
详细信息
ISBN:
(纸本)078037147X
Sonar beamforming is an ideal application for reconfigurable computing due to its high available levels of parallelism, relatively low sample rates, and modest word sizes. In this paper we describe a family of beamforming algorithms and their implementation using configurable computing technology, These include algorithms for time-delay, frequency-domain, and matched field beamforming. Configurable computing architectures appropriate for each are described and the tradeoffs associated with the mapping of each to concrete platform is discussed.
The paper describes two approaches Suitable for a field-programmable gate-array (FPGA) implementation of fast Walsh-Hadamard transforms. These transforms are important in many signal-processing applications including ...
详细信息
The paper describes two approaches Suitable for a field-programmable gate-array (FPGA) implementation of fast Walsh-Hadamard transforms. These transforms are important in many signal-processing applications including speech compression, filtering and coding. Two novel architectures for the fast Hadamard transforms using both a systolic architecture and distributed arithmetic techniques are presented. The first approach uses the Baugh-Wooley multiplication algorithm for a systolic architecture implementation. The second approach is based on both a distributed arithmetic ROM and accumulator structure, and a sparse matrix-factorisation technique. implementations of the algorithms on a Xilinx FPGA board are described. The distributed arithmetic approach exhibits better performances when compared with the systolic architecture approach.
This paper describes two new matrix transform algorithms for the max-log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional max-log-MAP algorith...
详细信息
ISBN:
(纸本)0780370805
This paper describes two new matrix transform algorithms for the max-log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional max-log-MAP algorithm are performed in parallel, and well formulated into a set of simple and regular matrix operations, which can therefore considerably speed up the decoding operations and reduce the computational complexity. The matrix max-log-MAP algorithms also maintain the advantage of the general logarithmic MAP like algorithms in avoiding complex numerical representation problems. They particularly facilitate the implementations of the logarithmic MAP like algorithms in special-purpose parallel processing VLSI hardware architectures. The matrix algorithms also allow simple implementations by using shift registers. The proposed implementation architectures for the matrix max-log-MAP decoding can effectively reduce the memory capacity and simplify the data accesses and transfers required by the conventional max-log-MAP as well as MAP algorithms.
Sonar beamforming is an ideal application for reconfigurable computing due to its high available levels of parallelism, relatively low sample rates and modest word sizes. We describe a family of beamforming algorithms...
详细信息
ISBN:
(纸本)078037147X
Sonar beamforming is an ideal application for reconfigurable computing due to its high available levels of parallelism, relatively low sample rates and modest word sizes. We describe a family of beamforming algorithms and their implementation using configurable computing technology. These include algorithms for time-delay, frequency-domain and matched field beamforming. Configurable computing architectures appropriate for each are described and the tradeoffs associated with the mapping of each to concrete platforms is discussed.
Reports on a new recursive discrete cosine transform (DCT) architecture that is more efficient in terms of area and power in comparison with recently published recursive DCT architectures. Our approach employs A-, B- ...
详细信息
Reports on a new recursive discrete cosine transform (DCT) architecture that is more efficient in terms of area and power in comparison with recently published recursive DCT architectures. Our approach employs A-, B- and C-type Goertzel filters. These three different realizations of Goertzel filters, together with a multiplier-less implementation of loop multiplications, are used so as to reduce the area, the multiplier delay and undesirable transitions, and hence the power consumption. The newly proposed DCT structure has been compared with conventional recursive implementations at different transform lengths, to observe that there are potential savings both in area and power.
The proceedings contains 58 papers from the conference of SPIE: advancedsignalprocessingalgorithms, architectures, and implementations VIII. The topics discussed include: blind channel identification and extraction...
详细信息
The proceedings contains 58 papers from the conference of SPIE: advancedsignalprocessingalgorithms, architectures, and implementations VIII. The topics discussed include: blind channel identification and extraction of more sources than sensors;blind channel estimation for CDMA systems with orthogonal modulation;blind equalization and source separation with MSK inputs and adaptive blind channel estimation by least-squares smoothing for CDMA.
暂无评论