An energy aware DCT (Discrete Cosine Transform) architecture based on the distributed arithmetic concept is proposed. Architectures based on the distributed arithmetic concept are inherently low power as they are mult...
详细信息
An energy aware DCT (Discrete Cosine Transform) architecture based on the distributed arithmetic concept is proposed. Architectures based on the distributed arithmetic concept are inherently low power as they are multiplication free algorithms. One characteristic of the DCT is that upon transformation signal energies are concentrated in only a few coefficients (less than 25%) with the rest (75%) of the coefficients being insignificant and negligible. One can skip the computation of these terms without seriously affecting the output signal quality. Exploiting this idea, we propose a low energy DCT architecture that can achieve 55% savings in the energy dissipation and 28 db in signal quality. In addition, we propose an adaptive energy aware DCT architecture that trades off energy consumption for signal quality. Using this adaptive architecture, we present a study of the effect of coefficient elimination on energy consumption and signal quality.
Fast implementations of the inner product of two (nxl) vectors, of the (nxn) MVM (matrix vector multiplication) operation and (nxn) MMM (matrix matrix multiplication) operation are of paramount importance in many rese...
详细信息
ISBN:
(纸本)0907776205
Fast implementations of the inner product of two (nxl) vectors, of the (nxn) MVM (matrix vector multiplication) operation and (nxn) MMM (matrix matrix multiplication) operation are of paramount importance in many research fields such as signal processing, control systems and robotics. So far, the proposed distributed arithmetic (DA) architectures provide fast implementation of such products, but they require that the elements of one of the vectors contain constant a-priori known values. In this paper we propose a new general purpose DA architecture which considers that both vector or matrix elements are variable. The block diagram of the proposed hardware design is given and its performance is theoretically estimated.
Real-time segmentation and tracking of biopsy needles is a very important part of image-guided surgery. Since the needle appears as a straight line in medical images, the Hough transform for straight-line detection is...
详细信息
Real-time segmentation and tracking of biopsy needles is a very important part of image-guided surgery. Since the needle appears as a straight line in medical images, the Hough transform for straight-line detection is a natural and powerful choice for needle segmentation. However, the transform is computationally expensive and in the standard form is ineffective for real-time segmentation applications. This paper proposes a dedicated hardware architecture for the Hough transform based on distributed arithmetic (DA) principles that results in a real-time implementation. The architecture exploits the inherent parallelism of the Hough transform and reduces the overall computation time. The DA-Hough transform architecture has been implemented using the Xilinx field-programmable gate array (FPGA). For a 256 x 256-bit image, the proposed design takes between 0.1 ms and 1.2 ms to process the Hough transform when the feature points in the image are varied from 2% to 50% of the total image;these values are well within the bounds of real-time operation and thus can facilitate needle segmentation in real time.
This paper presents an efficient intra prediction implementation for H.264/AVC in the frequency domain. Intra prediction in the frequency domain is vital for transform domain heterogeneous video transcoding in wireles...
详细信息
ISBN:
(纸本)9781424417650
This paper presents an efficient intra prediction implementation for H.264/AVC in the frequency domain. Intra prediction in the frequency domain is vital for transform domain heterogeneous video transcoding in wireless and mobile networks. A limited computation capabilities constraint constitutes a burden over such networks. New distributed intra prediction arithmetic is proposed with only addition and shift operations for transform domain intra prediction modes computations. Compared to previous attempts the proposed method reduces the computations extensively by eliminating the expensive matrix multiplications and omits the need for excessive memory storage.
In this paper, an efficient design scheme for implementation of the High-speed CNC Position Controller (PC) using Field Programmable Gate Array (FPGA) technology is presented. The algorithm is implemented using a Dist...
详细信息
ISBN:
(纸本)9781424421701
In this paper, an efficient design scheme for implementation of the High-speed CNC Position Controller (PC) using Field Programmable Gate Array (FPGA) technology is presented. The algorithm is implemented using a distributed arithmetic (DA)-based scheme where a Look-Up-Table (LUT) mechanism inside the FPGA is utilized. Two novel DA-based CNC Position Controllers have been proposed for FPGA implementation. The implementation results show that the two DA-based PCs use 0.8% and 1.5% logic resource of FPGA device respectively comparing the multiplier-based design uses 51.1% logic resource of FPGA device. These two DA-based designs, using a 32 MHz clock as input this can ensure the servo loop update frequency reaches I MHz to satisfy the high-speed CNC requirement.
Multimedia applications are becoming even more demanding. Hence, the next generation codecs invariably should be floating point compliant. With the field programmable gate arrays (FPGAs) technology getting mature, mor...
详细信息
ISBN:
(纸本)9781424424085
Multimedia applications are becoming even more demanding. Hence, the next generation codecs invariably should be floating point compliant. With the field programmable gate arrays (FPGAs) technology getting mature, more and more signal processing applications are finding their niche in FPGAs. Current generation FPGAs have got hardware multipliers. However, these are general purpose multipliers and cannot be used for specific purposes. The present paper presents a novel FPGA implementation of one dimensional (8 x 1) point, multiplier less, floating point Discrete Cosine Transform. distributed arithmetic, parallelism and pipelining are exploited to produce a DCT implementation on a single FPGA. Two implementations are presented, one using single LUT and second using 2 parallel LUTs utilizing 68% and 89% area respectively with a maximum clock frequency of around 50MHz.
A recent trend in low-power design has been the employment of reduced precision processing methods for decreasing arithmetic activity and average power dissipation. Such designs can trade off power and arithmetic prec...
详细信息
A recent trend in low-power design has been the employment of reduced precision processing methods for decreasing arithmetic activity and average power dissipation. Such designs can trade off power and arithmetic precision as system requirements change. This work explores the potential of distributed arithmetic (DA) computation structures for low-power precision-on-demand computation. We present an ultralow-power DSP which uses variable precision arithmetic, low-voltage circuits, and conditional clocks to implement a biomedical detection and classification algorithm using only 560 nW. Low energy consumption enables self-powered operation using ambient mechanical vibrations, converted to electric energy by a MEMS transducer and accompanying power electronics. The MEMS energy scavenging system is estimated to deliver 4.3 to 5.6 muW of power to the DSP load.
An efficient architecture for a FPGA symmetry FIR filter is proposed that employs 2-bit parallel-distributed arithmetic (2-bit PDA). The partial product is pre-calculated and saved into the distributed ROM. This elimi...
详细信息
An efficient architecture for a FPGA symmetry FIR filter is proposed that employs 2-bit parallel-distributed arithmetic (2-bit PDA). The partial product is pre-calculated and saved into the distributed ROM. This eliminates the large amount of logic needed to compute multiplication results. The proposed architecture consumes less area and offers higher speed operation because the multiplier is omitted.
This letter proposes a new distributed arithmetic (DA) algorithm for low-power finite-impulse response (FIR) filter implementation. The characteristic of the proposed algorithm is that the FIR filters using the propos...
详细信息
This letter proposes a new distributed arithmetic (DA) algorithm for low-power finite-impulse response (FIR) filter implementation. The characteristic of the proposed algorithm is that the FIR filters using the proposed algorithm do not need to employ two's complement representation in lookup tables as well as multiply-and -accumulation blocks. Thus, the proposed algorithm can minimize the dynamic power consumption of the FIR filters. The experimental results show that the lowpass FIR filter using the proposed algorithm achieves 29% and 26% power consumption reduction compared to that using the conventional algorithm for zero-mean random inputs and speech inputs, respectively.
Discrete Cosine Transform ( DCT), which is an important component of image and video compression, is adopted in various standardized coding schemes, such as JPEG MPEGx and H.26x. But when compute a two-dimensional (2D...
详细信息
ISBN:
(纸本)9788955191318
Discrete Cosine Transform ( DCT), which is an important component of image and video compression, is adopted in various standardized coding schemes, such as JPEG MPEGx and H.26x. But when compute a two-dimensional (2D) DCT, a large number of multiplications and additions are required in the direct approach. Multiplications, which are the most time-consuming operations in simple processor, can be completely avoided in the proposed architecture for real-time image compression. An area efficient high performance VLSI architecture for DCT based on the distributed arithmetic is proposed in this paper. Minimum number of additions is used to the DCT by exploiting the timing property of the DCT transform based on the distributed arithmetic. A case study of 8 x 8 DCT architecture based on the DA is analyzed. Savings exceeding 97% are achieved for the DCT implementation.
暂无评论