In this paper, we describe a novel algorithm for modular exponentiation of large integers and present its hardware implementation. This algorithm combines elements from Montgomery's modular multiplication techniqu...
详细信息
ISBN:
(纸本)0780393333
In this paper, we describe a novel algorithm for modular exponentiation of large integers and present its hardware implementation. This algorithm combines elements from Montgomery's modular multiplication technique, carry-save and carry-delayed number representations. The major advantage of this algorithm over previously reported algorithms is that it does not require the result of each modular multiplication in the exponentiation process to be converted from the redundant representation back to a nonredundant form. In our algorithm, the conversion is only necessary at the end of all the modular multiplications. Avoiding the conversion speeds up the modular exponentiation process. In addition, the algorithm allows for a fast, modular, and scalable hardware implementation.
An approach to the realisation of 2D FIR filters based on a novel radix-differential arithmetic is introduced. The differential algorithm is accomplished by coding the input video signal more efficiently using a DPCM ...
详细信息
ISBN:
(纸本)0780338065
An approach to the realisation of 2D FIR filters based on a novel radix-differential arithmetic is introduced. The differential algorithm is accomplished by coding the input video signal more efficiently using a DPCM coding system. Whereas the filter's coefficients are fed in digit serial fashion and specified using radix-2(n) arithmetic. The proposed approach provides a spectrum of architectures to allow a more flexible design trade off analysis between throughput rate and hardware cost.
This paper addresses a new kind of security vulnerable spots introduced by Network-on-chip (NoC) use in System-on-Chip (SoC) design. This study is based on the experience of a CAD framework for NoC design and proposes...
详细信息
ISBN:
(纸本)0780393333
This paper addresses a new kind of security vulnerable spots introduced by Network-on-chip (NoC) use in System-on-Chip (SoC) design. This study is based on the experience of a CAD framework for NoC design and proposes a classification of weaknesses with regard to usual routing and interface techniques. Finally design strategies are proposed and a new path routing technique (SCP) is introduced with the aim to enforce security.
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion...
详细信息
ISBN:
(纸本)0780393333
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (Very Long Instruction Word) SIMD (Single Instruction Multiple Data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (Sum of Absolute Difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multiblock SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of Me for D1 size (720*480) video is possible using a 720MHz TMS320C6416 digital signal processor.
This paper presents the implementation of the decorrelating (DECOR) transformation technique for low power FIR filtering cores. The technique was introduced in the past, but was not fully evaluated for its area, delay...
详细信息
ISBN:
(纸本)0780385047
This paper presents the implementation of the decorrelating (DECOR) transformation technique for low power FIR filtering cores. The technique was introduced in the past, but was not fully evaluated for its area, delay and power performance. Early evaluations did not consider the whole implementation and were merely based on either some analytical methods or high level simulation models. This paper presents the complete VLSI implementation of the technique and a study of its area, delay and power performance with different order of coefficient differences and various multiplier types. We show that although the technique achieves up to 47% power saving in the multiplier unit, the overall power saving is up to 25% with up to 24% increase in area.
Multiband orthogonal frequency-division multiplexing (MB-OFDM) systems employ frequency-hopping technology to achieve the capabilities of multiple access and frequency diversity. However, they also complicate packet d...
详细信息
ISBN:
(纸本)9781424403820
Multiband orthogonal frequency-division multiplexing (MB-OFDM) systems employ frequency-hopping technology to achieve the capabilities of multiple access and frequency diversity. However, they also complicate packet detector (PD) in terms of the requirement for the high hardware complexity. In this paper, we propose several low-cost design schemes for the PD, such as Walsh-Hadamard decomposition, buffered summation, and sign-bit-remaining methods. The estimated gate count of the resulting implemented PD is less than half that of existing solutions.
Low-Density Parity-Check (LDPC) codes have been adopted in the physical layer of many communication systems because of their superior performance. The direct implementation of these codes onto an existing software def...
详细信息
ISBN:
(纸本)9781424403820
Low-Density Parity-Check (LDPC) codes have been adopted in the physical layer of many communication systems because of their superior performance. The direct implementation of these codes onto an existing software defined radio (SDR) platform is likely to be inefficient. Our approach is to design the LDPC code to match the constraints imposed by the existing architecture, without compromising the communication performance. We present a procedure for architecture-aware code design that involves feature identification, code construction and verification. Details of the procedure for the case when the SDR platform is equipped with a multi-stage interconnection network (MIN) is presented. By analyzing the characteristics of the MIN, simple yet explicit constraints are derived and used in the code construction step. The resulting LDPC code can not only be mapped very efficiently onto the SDR platform but also has very good bit error rate (BER) performance.
Modular adders are fundamental arithmetic components that are employed in Residue Number System (RNS) based digital signalprocessing (DSP) systems. They are widely used in modular multipliers, residue to binary conve...
详细信息
ISBN:
(纸本)0780385047
Modular adders are fundamental arithmetic components that are employed in Residue Number System (RNS) based digital signalprocessing (DSP) systems. They are widely used in modular multipliers, residue to binary converters and in implementing other arithmetic operations such as scaling. In addition, increasing operating frequencies as well as a growing demand for portable electronics have brought power reduction to the forefront of modem day design methodologies. Thus, the design of power efficient modular adders is of great significance if RNS circuits are to be utilized in future DSP systems. In this paper, we propose a new modular adder that is based on the ELM addition algorithm. VLSI implementations using 0.13/mum standard-cell technology show that the proposed architecture not only exhibits power efficiency, but also delay x area efficiency when compared to existing modular adder designs in the literature.
The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that aug...
详细信息
ISBN:
(纸本)0780393333
The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that augments a hardware hard-decision decoder with soft-decision decoding software on an embedded processor. In this paper we investigate the implementation of the interpolation step of the Koetter-Vardy algorithm on SIMD processor architectures. A parallelization of the algorithm is given using the K'th order Horner's rule for parallel polynomial evaluation. The SIMD algorithm has a running time 2.5 to 4 times faster than a serial implementation on a DSP processor. To gain further speedup we propose a merged-SIMD architecture that calculates the Hasse derivative in parallel with the polynomial updates.
In this paper we propose a methodology that takes into account bit-width to optimize area and power consumption of hardware architectures provided by high level synthesis tools. The methodology is based on a bit-width...
详细信息
ISBN:
(纸本)9781424403820
In this paper we propose a methodology that takes into account bit-width to optimize area and power consumption of hardware architectures provided by high level synthesis tools. The methodology is based on a bit-width analysis using information that comes from the designer. This bit-width information is propagated through a graph which models the application. The resulting annotated graph enables datapath structure optimizations for high level synthesis without increasing dramatically its processing time (complexity: O(n)). The methodology was applied to several signal and image processing applications. Our results demonstrate the effectiveness of the approach. It can be also applied in a more general design context for sizing the data of an application knowing the input data formats and their potential correlation.
暂无评论