With the growing importance of energy efficiency, heterogeneous computing has become more popular in recent years, field~programmablegate array (FPGA) devices are no exception: offering highly parallel execution at l...
详细信息
With the growing importance of energy efficiency, heterogeneous computing has become more popular in recent years, field~programmablegate array (FPGA) devices are no exception: offering highly parallel execution at low power, they are an option worth considering for many tasks, and increasingly more available for users through cloud computing services. While FPGA devices offer a lower barrier to entry to logic design than integrated circuit design, they are still difficult to design for compared with instruction set processors. While tools exist for translating 2: high-level language description of an algorithm into an FPGA design, they still require expertise most software designers do not have. One way around this problem is building soft processors onto the programmable logic as a programmability layer for sofware designers. Transport-triggered architectures (TTAS) are a promising avenue of research in this area for their simple implementation and inherently parallel programming model. This thesis presents FPGA-centric optimizations for transport-triggered architectures and evaluation of these optimizations through synthesis. Together, these optimizations yielded between 20 and 30 percent reduction in logic utilization in the tested architectures while having little effect on the clock frequency. Additionally, the scalability of TTAs for more parallel workloads is evaluated with various configurations of a TTA vector processor as well as a convolutional neural network processor case study.
This study presents the detailed design and analysis of a new memristor-based look-up table (LUT) for field programmable gate arrays (FPGAs). The proposed memory utilises memristors as storage elements with N-type met...
详细信息
This study presents the detailed design and analysis of a new memristor-based look-up table (LUT) for field programmable gate arrays (FPGAs). The proposed memory utilises memristors as storage elements with N-type metal-oxide-semiconductor transistors for row access. New WRITE and READ operations are proposed;the proposed LUT requires no additional circuit to handle the WRITE 1 (0) operation. The proposed method requires a RESTORE pulse only for the READ 0 operation. Moreover, the WRITE operation of the proposed method requires three power lines and a RESTORE pulse only for the READ 0 operation, thus saving 25% READ time when compared with previous methods. In addition, the proposed method does not require the REFRESH pulse and does not dissipate power during stand-by mode. Extensive simulation results are presented with respect to different operational features such as normalised state parameter, pulse width and LUT size. In addition to a circuit-level evaluation, the proposed LUT scheme has also been assessed with respect to FPGA implementation. Simulation results using sequential benchmarks mapped on Spartan 4 and 5 FPGAs show that the proposed non-volatile LUT outperforms existing static random access memory cell-based LUTs in terms of performance.
This paper presents a stochastic logic-based method for quantitative risk assessment using fault tree analysis (FTA) that can take into account both types of uncertainties including objective and subjective uncertaint...
详细信息
This paper presents a stochastic logic-based method for quantitative risk assessment using fault tree analysis (FTA) that can take into account both types of uncertainties including objective and subjective uncertainties. In the proposed method, each fault tree gate is translated to its corresponding stochastic logic template and then is implemented on a fieldprogrammablegate array (FPGA). Because the analysis does not utilize any transformation methods, the results of analysis are more accurate than those methods which are based on transformation from possibility to probability distributions or vice versa. Experimental results for a benchmark fault tree show that this method accelerates analysis time compared to conventional hybrid uncertainty analysis method and transformation methods. The efficiency of the proposed method is demonstrated by implementation in a real steel structure project. The quantitative risk assessment is performed for the incomplete penetration as one of the defects encountered in arc welding process, and its results are compared with transformation methods. The comparison results show the proposed hybrid uncertainty analysis method is also more accurate in comparison to the transformation-based approaches. Copyright (c) 2016 John Wiley & Sons, Ltd.
Data intensive computations in data centers are performed by an increasingly popular programming framework named MapReduce. An advantage of this framework is that the algorithm is divided into simple tasks that enable...
详细信息
ISBN:
(纸本)9781538622889
Data intensive computations in data centers are performed by an increasingly popular programming framework named MapReduce. An advantage of this framework is that the algorithm is divided into simple tasks that enables the exploitation of its parallelism. A great variety of processing elements architectures, such as shared memory systems, clusters of computers and heterogeneous systems, have accommodated applications of the MapReduce framework in order to enhance its robustness and efficiency. field programmable gate arrays (FPGAs) are known for implementing algorithms while providing higher parallelism compared to their software counterparts. The mapping of a MapReduce framework on specialized hardware is proposed here. The proposed FPGA architecture is using a pipeline principle in order to alleviate the need of large memory resources. The proposed system was analyzed implementing a basic application, namely matrix multiplication.
The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analys...
详细信息
ISBN:
(纸本)9781509012527
The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analysis, etc. The algorithm is essentially a fast implementation of the Discrete Fourier Transform which allows it to reduce the asymptotic complexity of the latter from O(n(2)) to the former's O(n log n). In this paper, the radix-2 decimation in time FFT algorithm is implemented and investigated on field programmable gate arrays (FPGA) and Graphic Processing Units (GPU). The hardware descriptive language Verilog HDL (VHDL) is used for the FPGA, while the Open Computing Language (OpenCL) is used for the GPU. Both implementations are compared with various pre-installed IP-core modules of Xilinx and MATLAB for complex input of various sample sizes. From the results, it is concluded that the FPGA shows faster performance for a large number of FFT's of small sizes. On the other hand, the GPU is more promising for large number of FFT's of large sizes. The results also confirm that the FPGA based implementation is faster then the built-in IP-core modules of Xilinx. A hardware synthesis for FPGA is also provided.
Due to the inherent time-varying characteristics of physiological systems, most biomedical signals (BSs) are expected to have non-stationary character. Therefore, any appropriate analysis method for dealing with BSs s...
详细信息
ISBN:
(纸本)9781424492701
Due to the inherent time-varying characteristics of physiological systems, most biomedical signals (BSs) are expected to have non-stationary character. Therefore, any appropriate analysis method for dealing with BSs should exhibit adjustable time-frequency (TF) resolution. The wavelet transform (WT) provides a TF representation of signals, which has good frequency resolution at low frequencies and good time resolution at high frequencies, resulting in an optimized TF resolution. Discrete wavelet transform (DWT), which is used in various medical signal processing applications such as denoising and feature extraction, is a fast and discretized algorithm for classical WT. However, the DWT has some very important drawbacks such as aliasing, lack of directionality, and shift-variance. To overcome these drawbacks, a new improved discrete transform named as Dual Tree Complex Wavelet Transform (DTCWT) can be used. Nowadays, with the improvements in embedded system technology, portable real-time medical devices are frequently used for rapid diagnosis in patients. In this study, in order to implement DTCWT algorithm in FPGAs, which can be used as real-time feature extraction or denoising operator for biomedical signals, a novel hardware architecture is proposed. In proposed architecture, DTCWT is implemented with only one adder and one multiplier. Additionally, considering the multi-channel outputs of biomedical data acquisition systems, this architecture is capable of running N channels in parallel.
High-precision sensor networks and localization systems require precise time and frequency synchronization. In this paper, we present a novel high-precision frequency synchronization approach for wireless network devi...
详细信息
ISBN:
(纸本)9781509061143
High-precision sensor networks and localization systems require precise time and frequency synchronization. In this paper, we present a novel high-precision frequency synchronization approach for wireless network devices. It adapts the local oscillator frequency of a receiver to the frequency of a transmitter and can be integrated into existing wireless communication systems. The measurement of frequency differences as well as the frequency adjustment is realized in field programmable gate arrays (FPGAs). Using a 60 GHz wireless experimental setup, the receiver clock is aligned to the transmitter clock with a precision of 37 picoseconds.
We present an open source digital camera implemented on a fieldprogrammablegate array (FPGA). The camera functionality is completely described in VHDL and tested on the DE2-115 educational FPGA board. Some of the cu...
详细信息
ISBN:
(纸本)9781467399869
We present an open source digital camera implemented on a fieldprogrammablegate array (FPGA). The camera functionality is completely described in VHDL and tested on the DE2-115 educational FPGA board. Some of the current features of the camera include video mode at 30 fps, storage of taken snapshots into SDRAM memories, and grayscale and edge detection filters. The main contributions of this project include 1) the actual system level design of the camera, tested and verified on an actual FPGA chip, and 2) the public release of the entire implementation including source code and documentation. While the proposed camera is far from being able to compete with commercial offerings, it can serve as a framework to test new research ideas related to digital camera systems, image processing, computer vision, etc., as well as an educational platform for advanced digital design with VHDL and FPGAs.
The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analys...
详细信息
The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analysis, etc. The algorithm is essentially a fast implementation of the Discrete Fourier Transform which allows it to reduce the asymptotic complexity of the latter from O(n 2 ) to the former's O(n log n). In this paper, the radix-2 decimation in time FFT algorithm is implemented and investigated on field programmable gate arrays (FPGA) and Graphic Processing Units (GPU). The hardware descriptive language Verilog HDL (VHDL) is used for the FPGA, while the Open Computing Language (OpenCL) is used for the GPU. Both implementations are compared with various pre-installed IP-core modules of Xilinx and MATLAB for complex input of various sample sizes. From the results, it is concluded that the FPGA shows faster performance for a large number of FFT's of small sizes. On the other hand, the GPU is more promising for large number of FFT's of large sizes. The results also confirm that the FPGA based implementation is faster then the built-in IP-core modules of Xilinx. A hardware synthesis for FPGA is also provided.
This paper describes how modern fieldprogrammablegate array (FPGA) technology can be used to build practical and efficient multiplicative finite impulse response (MFIR) filters with low-pass, high-pass, band-pass an...
详细信息
This paper describes how modern fieldprogrammablegate array (FPGA) technology can be used to build practical and efficient multiplicative finite impulse response (MFIR) filters with low-pass, high-pass, band-pass and band-stop characteristics. This paper explains how MFIR structures can be built with or without linear phase characteristics and implemented efficiently on modern FPGA architectures using fixed-point arithmetic without incurring stability problems or limit cycles which commonly occur when using equivalent infinite impulse response structures. These properties have a particular importance for applications such as tunable resonators, narrow band rejectors and linear phase filters which have demanding, narrow transition band requirements. The results presented in this paper indicate that MFIR filters are, for some applications, a viable alternative to existing filter structures when implemented on an FPGA.
暂无评论