In this paper we present an automatic design generation methodology for heterogeneous architectures composed of processors, DSPs and FPGAs. This methodology is based on an Adequation Algorithm Architecture where appli...
详细信息
ISBN:
(纸本)0780393333
In this paper we present an automatic design generation methodology for heterogeneous architectures composed of processors, DSPs and FPGAs. This methodology is based on an Adequation Algorithm Architecture where application is represented by a control data flow graph and architecture by an architecture graph. We focus on how to take into account specificities of partially reconfigurable components during the adequation process and for the design generation. We present a method which generates automatically the design for both fixed and partially reconfigurable parts of a FPGA. This method uses prefetching technic to minimize reconfiguration latency of runtime reconfiguration and buffer merging to minimize memory requirements of the generated design.
Multiplication over the field GF(2(m)) is computationally expensive, not least because the operation involves modulo reduction. It is typical to fix the field and field representation to improve performance, but some ...
详细信息
ISBN:
(纸本)9781424403820
Multiplication over the field GF(2(m)) is computationally expensive, not least because the operation involves modulo reduction. It is typical to fix the field and field representation to improve performance, but some applications need to operate over multiple fields. This work investigates the cost of this flexibility with application to elliptic curve cryptography (ECC), both analytically and empirically through FPGA implementation. A design methodology is presented for limiting the flexibility to a number of prescribed fields with the representation fixed for each, and the methodology is applied to the design of a bit-serial multiplier over GF(2(m)). FPGA implementation results are given;and it is shown that the practical advantage of the proposed approach is considerable in terms of speed versus area trade-off. In fact, only a 12.3% area overhead was incurred by the flexible implementation compared to the fixed field implementation, while still achieving the same speed performance.
The aim of this paper is to present a new approach to creating low-power high-performance DSP using delay-insensitive asynchronous circuits. To attain this, we pipeline the asynchronous circuit at logic gate level in ...
详细信息
ISBN:
(纸本)0780364880
The aim of this paper is to present a new approach to creating low-power high-performance DSP using delay-insensitive asynchronous circuits. To attain this, we pipeline the asynchronous circuit at logic gate level in such a way that every functional unit can be pipelined in many stages, up to as many as half the number of gate levels. Also, we want to integrate this approach with the traditional method to synthesise synchronous circuits. In order to achieve this, we create a new library of gates which satisfy the constraints that asynchronous design requires. Finally, we present the results after building a pipelined multiplier with both, synchronous and asynchronous, approaches.
Low-Density Parity-Check (LDPC) codes have been adopted in the physical layer of many communication systems because of their superior performance. The direct implementation of these codes onto an existing software def...
详细信息
ISBN:
(纸本)9781424403820
Low-Density Parity-Check (LDPC) codes have been adopted in the physical layer of many communication systems because of their superior performance. The direct implementation of these codes onto an existing software defined radio (SDR) platform is likely to be inefficient. Our approach is to design the LDPC code to match the constraints imposed by the existing architecture, without compromising the communication performance. We present a procedure for architecture-aware code design that involves feature identification, code construction and verification. Details of the procedure for the case when the SDR platform is equipped with a multi-stage interconnection network (MIN) is presented. By analyzing the characteristics of the MIN, simple yet explicit constraints are derived and used in the code construction step. The resulting LDPC code can not only be mapped very efficiently onto the SDR platform but also has very good bit error rate (BER) performance.
The design and implementation of SP systems (DISPS) includes the development of software tools and methodologies to support the design of these complex systems. In its early days, DISPS focused on the hardware-based d...
详细信息
The design and implementation of SP systems (DISPS) includes the development of software tools and methodologies to support the design of these complex systems. In its early days, DISPS focused on the hardware-based design of SP algorithms to meet real-time requirements. This topic was often called very large-scale integration (VLSI) SP. The emphasis has gradually shifted to include software and hardware/software codesign and implementation aspects. Programmable digital signal processors (DSPs) and embedded central processing units (CPUs) are now popular for real-time SP, such as mobile phones. Field programmable gate array (FPGA)-based designs are also replacing application-specific integrated circuits (ASICs) in many applications.
The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that aug...
详细信息
ISBN:
(纸本)0780393333
The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that augments a hardware hard-decision decoder with soft-decision decoding software on an embedded processor. In this paper we investigate the implementation of the interpolation step of the Koetter-Vardy algorithm on SIMD processor architectures. A parallelization of the algorithm is given using the K'th order Horner's rule for parallel polynomial evaluation. The SIMD algorithm has a running time 2.5 to 4 times faster than a serial implementation on a DSP processor. To gain further speedup we propose a merged-SIMD architecture that calculates the Hasse derivative in parallel with the polynomial updates.
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implem...
详细信息
ISBN:
(纸本)9781538604465
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implementations are obtained by developing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in resource utilization and increase in computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and performance requirements of the system architecture. The implementation solution presented in the paper can efficiently scale up to 250MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 MegaAlignments/min using 36 S-W cores on the same FPGA for sequences of 160 x 100 nucleotides.
This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N-...
详细信息
This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N-2/12 can be easily achieved for an N x N image with controlled increase of hardware cost. Compared with recently published 2-D DWT architectures with computation time of N-2/3 and 2N(2)/3, the proposed designs can also save a large amount of multipliers and/or storage elements. It can also be used to implement those 2-D DWT traditionally suitable for lifting or flipping-based designs, such as (9,7) and (6,10) DWT. The throughput rate can be improved by a factor of 4 by the proposed approach, but the hardware cost increases by a factor of around 3. Furthermore, the proposed designs have very simple control signals, regular structures and 100% hardware utilization for continuous images.
This paper details the design of a new high-speed pipelined elliptic curve cryptography (ECC) application specific instruction set processor (ASIP) using field programmable gate array (FPGA) technology. A six-stage pi...
详细信息
ISBN:
(纸本)9781424403820
This paper details the design of a new high-speed pipelined elliptic curve cryptography (ECC) application specific instruction set processor (ASIP) using field programmable gate array (FPGA) technology. A six-stage pipeline has been applied to the design, and pipeline stalls are avoided via instruction reordering and data forwarding. Three complex instructions are introduced to reduce the latency by reducing the overall number of instructions. The new processor shows improvements over previously reported designs in terms of throughput, latency and area. The higher clock frequencies and low latencies lead to the fastest point multiplication time reported in the literature. An FPGA implementation over GF(2(163)) is shown, which achieves a point multiplication time of 36.77 microseconds at 77.01 MHz on a Xilinx Virtex-E device- over 50% faster than the best figure previously reported.
The purpose of this work is to show the importance of an adequate generation of the excitation signal for the performance of bandwidth extension algorithms for speech signals. Two previously proposed methods of obtain...
详细信息
ISBN:
(纸本)0780393333
The purpose of this work is to show the importance of an adequate generation of the excitation signal for the performance of bandwidth extension algorithms for speech signals. Two previously proposed methods of obtaining the excitation signal are analyzed and, based on this analysis, a new method is proposed. The influence of each method in the quality of the reconstructed wideband speech signal is evaluated by quantitative parameters of speech quality.
暂无评论