This paper discusses the recursive implementation of the discrete cosine transform (DCT) and its inverse (IDCT). The transform is constructed by using recursive filter structure to generate the transform kernel values...
详细信息
This paper discusses the recursive implementation of the discrete cosine transform (DCT) and its inverse (IDCT). The transform is constructed by using recursive filter structure to generate the transform kernel values. We first derive two trigonometric equations, which can be represented as the Chebyshev polynomial. Then we demonstrate that general length of the DCT and IDCT can be efficiently implemented by using the regressive structure derived from the recursive formulae. The computational complexity of each data throughput in these architectures is less than that in the conventional ones by as many as 50%. The proposed architectures are regular and suitable for parallel VLSI implementation.
An n-bit fixed-width multiplier keeps the input-width and output-width the same by truncating the n least significant output bits. In order to reduce the complexity, direct-truncation multipliers omit the half of the ...
详细信息
An n-bit fixed-width multiplier keeps the input-width and output-width the same by truncating the n least significant output bits. In order to reduce the complexity, direct-truncation multipliers omit the half of the partial products corresponding to the truncated part. However, a large truncation error will be introduced. Thus, error compensation, which equals to estimating the carry bits, is required. In this paper, three carry estimation schemes based on the dependency among the partial products and the inputs are proposed. Not only this dependency is investigated, statistical analysis for these estimation approaches is provided. Applying the proposed schemes, at least 84% the truncation error can be reduced
Large multi-user MIMO systems with spatial multiplexing are among the most promising approaches for increasing wireless throughput while serving many clients. Yet, the achievable spectral efficiency of current large M...
详细信息
ISBN:
(纸本)9781538647271
Large multi-user MIMO systems with spatial multiplexing are among the most promising approaches for increasing wireless throughput while serving many clients. Yet, the achievable spectral efficiency of current large MIMO systems is limited by the adoption of simple, but sub-optimal, linear precoding techniques (e.g, minimum-mean-square-error (MMSE)). Non-linear precoding methods, like Vector Perturbation (VP), claim to be able to provide improved network throughput. However, such methods are still purely theoretical and they do not account for the practical aspects of actual wireless systems, as the corresponding complexity and latency requirements, or the need for feasible rate adaptation. This paper presents ViPer, the first practical VP-based MIMO system design. ViPer substantially reduces the latency requirements of VP by employing massively parallel processing and realizes a practical rate adaptation method that efficiently translates VP's signal-to-noise-ratio (SNR) gains into actual throughput gains. In our first systematic experimental evaluation of VP-based precoders, we show that ViPer can deliver in practice up to 30% higher throughput than MMSE precoding with comparable latency requirements. In addition, ViPer can match the performance of state-of-the-art parallel VP precoding schemes, by utilizing less than one tenth of the processing elements.
We present an algorithmic noise-tolerance (ANT) technique for designing low-power DSP systems. The proposed technique achieves substantial energy savings via voltage overscaling, whereby the supply voltage is scaled b...
详细信息
We present an algorithmic noise-tolerance (ANT) technique for designing low-power DSP systems. The proposed technique achieves substantial energy savings via voltage overscaling, whereby the supply voltage is scaled beyond the minimum supply voltage V/sub dd-crit/ at which the architecture operates correctly for a given throughput specification. The resulting input-dependent soft errors are corrected via a low-complexity error canceller and hence is referred to as adaptive error-cancellation. The trade-off between energy savings and algorithmic performance is illustrated by employing a reduced-order least mean square (LMS) algorithm to compensate for the design overhead. Simulation results in a 0.35 /spl mu/m CMOS technology demonstrate that the proposed technique achieves up to 73% energy savings in a multiuser communication scenario over present-day voltage-scaling, with a 3 dB algorithmic performance loss. Moreover, a 40% energy reduction is obtained over conventional DSP systems without algorithmic performance degradation.
Multiple-input multiple-output (MIMO) systems are of significant interest due to their ability to increase the capacity of wireless communications systems, but for these to be useful they must also be practical for im...
详细信息
Multiple-input multiple-output (MIMO) systems are of significant interest due to their ability to increase the capacity of wireless communications systems, but for these to be useful they must also be practical for implementation in VLSI circuits. A particularly difficult part of these systems is the decoder, where the optimal maximum-likelihood (ML) solution is desirable, but cannot be directly implemented due to its exponential complexity. The paper presents the first published 8times8 MIMO detection engine with an integrated channel preprocessing unit, achieving near-ML BER results at 57.6 Mbps, using QPSK in an extended HSDPA application. Other novelties include the high speed sorting mechanism and power saving features
We develop and describe some algorithms dedicated to people characteristic movement type detection using ultra wide band radar. Derived methods are implemented in real time system where breath detection and its basic ...
详细信息
ISBN:
(纸本)9781467383615
We develop and describe some algorithms dedicated to people characteristic movement type detection using ultra wide band radar. Derived methods are implemented in real time system where breath detection and its basic parameters e.g. frequency and depth are estimated. We present also some crucial details of software implementation where we deal with radar signal acquisition and processing in real time. Presented method has many interesting applications in healthcare: noninvasive respiration or heartbeat monitoring and even though the wall measurement. Unlike camera radar respects anonymity. All development was made as Radcare project.
Providing low power consumption, high throughput and flexible solution is a challenge during designing process of a mobile software defined radio (SDR) system. The need for simple software generation using common prog...
详细信息
ISBN:
(纸本)9781467311854
Providing low power consumption, high throughput and flexible solution is a challenge during designing process of a mobile software defined radio (SDR) system. The need for simple software generation using common programming tools becomes also a very significant factor. The paper presents the design and implementation of a chip multithreading general-purpose processor core (GPP), as the first step towards designing a flexible and programmer friendly SDR processor platform. Software tools developed for the hardware are described. The future work will be focused on designing tightly-coupled coprocessor extensions (TCC) for an application specific digital signalprocessing (DSP) purposes. AGATE processor system is described in form of a highly configurable library using Verilog language. The concept verification process was performed on the Xilinx Virtex-6 ML605 FPGA evaluation board. The maximum achieved frequency for the 8-thread processor is 190 MHz. Gate level simulation along with Value Change Dump (VCD) power estimation analysis were performed using three CMOS technologies: 130 nm, 90 nm and 65 nm. AGATE is capable of performing up to 0.72 DMIPS/MHz/thread with the maximum frequency of over 700 MHz and the power consumption of about 3 mW/core using 65 nm process.
The cost-effective hardware architecture of a low bit rate 1.6 Kbit/s LPC (linear predictive coefficient)-based vocoder is proposed. The proposed architecture integrates both algorithms of the encoder and decoder. In ...
详细信息
The cost-effective hardware architecture of a low bit rate 1.6 Kbit/s LPC (linear predictive coefficient)-based vocoder is proposed. The proposed architecture integrates both algorithms of the encoder and decoder. In the encoder, a simple finite state machine is presented to compute the autocorrelation function of speech. At the decoder side, efficient circuits are designed to transfer LSP (lne spectrum pair) to LPC. Only 29000 gate counts of XILINX XC4036XL FPGA are used to implement the vocoder.
Adaptive filters are used in many applications of digital signalprocessing. Digital communications and digital video broadcasting are just two examples. The GSFAP algorithm, discussed in the paper, is characterized b...
详细信息
Adaptive filters are used in many applications of digital signalprocessing. Digital communications and digital video broadcasting are just two examples. The GSFAP algorithm, discussed in the paper, is characterized by convergence superior to the popular NLMS, with only slightly higher complexity. The paper deals with floating-point-like implementation of algorithm using FPGA hardware. We present an optimized core for the GSFAP, built using logarithmic arithmetic which provides very low cost multiplication and division. The design is crafted to make efficient use of the pipelined logarithmic addition units. The resulting GSFAP core can be clocked at more than 80 MHz on the one million gate Xilinx XC2VI000-4 device. It can be used to implement filters of orders 20 to 1000 with a sampling rate exceeding 50 kHz. For comparison, we implemented a similar NLMS core and found that although it is slightly smaller than the GSFAP core and it allows a higher signal sampling rate (around 70 kHz) for the corresponding filter orders, GSFAP has adaptation properties that are much superior to NLMS, and that our core can provide very sophisticated adaptive filtering capabilities for resource-constrained embedded systems
A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is presented in this paper. The proposed scheme decreases the number of partial products to 50%, compared to othe...
详细信息
ISBN:
(纸本)078037147X
A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is presented in this paper. The proposed scheme decreases the number of partial products to 50%, compared to other methods, while achieving better speed and area performance and providing pipeline extension opportunities. When modified Booth encoding is used, partial products are reduced by almost 75%, thereby significantly reducing the multiplier addition depth. The design is applicable for digital signal and image processing applications that require inner-product arithmetic, such as digital filters, correlation and convolution. The proposed design is well suited for VLSI implementation, and it can also be embedded as an inner-product core inside a DSP processor or FPGA-based processor.
暂无评论