Modular multiplication is the fundamental operation in Elliptic Curve Cryptography (ECC) and a multitude of hardware implementations have been developed so far. In this paper, a series of modifications to a high perfo...
详细信息
ISBN:
(纸本)9781665426053
Modular multiplication is the fundamental operation in Elliptic Curve Cryptography (ECC) and a multitude of hardware implementations have been developed so far. In this paper, a series of modifications to a high performance radix-2 interleaved modular multiplication architecture are proposed. The design was implemented on a Virtex-7 FPGA for five prime fields recommended for ECC by the National Institute of Standards and Technology (NIST), showing a significant improvement in area-time efficiency in comparison to the original architecture.
This investigation examines the application of two strategies for reducing computation time required by the aircraft design suite SUAVE - CPU parallelization and GPU accereration. Results are shown for the application...
详细信息
ISBN:
(数字)9781624106095
ISBN:
(纸本)9781624106095
This investigation examines the application of two strategies for reducing computation time required by the aircraft design suite SUAVE - CPU parallelization and GPU accereration. Results are shown for the application of between 1 and 24 simultaneous, asynchronous computations and JIT compilation of code for GPU execution via a JAX-XLA-CUDA stack. CPU parallelization is shown to degrade performance, and GPU acceleration to improve it by five orders of magnitude.
One of the most popular Brain-Computer Interface (BCI) paradigms is the classification of motor imagery tasks using Electroencephalograph signals (EEG). Recent works suggest the use of Convolutional Neural Networks (C...
详细信息
ISBN:
(纸本)9781728195018
One of the most popular Brain-Computer Interface (BCI) paradigms is the classification of motor imagery tasks using Electroencephalograph signals (EEG). Recent works suggest the use of Convolutional Neural Networks (CNNs) to both extract the EEG features and classify them in a single compact solution. Since BCIs are meant to be run in embedded hardware, compact models and data reduction strategies are necessary. An EEGNet-based model is presented in this work, which achieves results similar to those of the state-of-the-art of 83.15 %, 75.74 % and 65.75 % in classification accuracy on 2-, 3-, and 4-class MI tasks in global validation on the Physionet Motor Movement/Imagery dataset. Taking advantage of its lower model complexity, a preliminary FPGA processor design using fixed-point datatypes is introduced, to evaluate resources consumption and latency on a low-spec system on chip approach.
The field programmable gate array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed ar...
详细信息
ISBN:
(纸本)9783030880040;9783030880033
The field programmable gate array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.
This study provides an application-specific integrated circuit (ASIC) diagram of Artificial Neural Networks (ANN) with module design for 32-bit floating point operations on field programmable gate array (FPGA). It is ...
详细信息
ISBN:
(纸本)9781728128689
This study provides an application-specific integrated circuit (ASIC) diagram of Artificial Neural Networks (ANN) with module design for 32-bit floating point operations on field programmable gate array (FPGA). It is aimed that ANNs train operations are moved from software to hardware and calculations are made by using IEEE 754 single precision floating point number format. The proposed architecture is designed with combination of Verilog and Very High Speed Integrated Circuits Hardware Description Language (VHDL). Sigmoidal non-linear function was used as the activation function of the train and look-up table (LUT) was created for process efficiency of the designed circuit. Natural parallelisms were used in the calculation of the operations, which are implemented on FPGA, thus the system operations was accelerated by performing independent operations during the same clock cycle. The results obtained from FPGA were compared with the results obtained from MATLAB R2016b.
This paper provides design framework of digital controller design for power conditioning unit comprising rectifier and boost converter for piezoelectric energy harvesting system. Complete design of digitally controlle...
详细信息
Digital Pulse Processing offers multiple advantages over traditional analogue processing chains. As a disadvantage, produce gigabytes of data every second. Storing and processing such data rates in real-time still rem...
详细信息
Digital Pulse Processing offers multiple advantages over traditional analogue processing chains. As a disadvantage, produce gigabytes of data every second. Storing and processing such data rates in real-time still remains a challenge. Analogue solutions are not riddled with this issue, however, they offer limited flexibility and modifiability. This work highlights the advantages of Digital Pulse Processing over Analogue Pulse Processing and describes a successful implementation of a digital pulse detection and acquisition system based on field programmable gate arrays. The system is tasked with processing pulses generated by a Photo Multiplier Tube nuclear detector. Incoming signals are sampled at a 1 GS/s rate, so to enable full acquisition resolution, throughput is reduced with digital detection filters and leading-edge triggering or with a derivative zero-crossing detector. Three different fast timing filters are adapted to high-speed real-time acquisition and compared in a simulated scenario. A trapezoidal filter is implemented in firmware alongside the detection channel for pulse height analysis. Thanks to the use of reprogrammable devices, the system remains versatile and can be remotely adapted to different needs with no additional hardware costs.
Modern deep learning schemes have shown human-level performance in the area of medical science. However, the implementation of deep learning algorithms on dedicated hardware remains a challenging task because modern a...
详细信息
Modern deep learning schemes have shown human-level performance in the area of medical science. However, the implementation of deep learning algorithms on dedicated hardware remains a challenging task because modern algorithms and neuronal activation functions are generally not hardware-friendly and require a lot of resources. Recently, researchers have come up with some hardware-friendly activation functions that can yield high throughput and high accuracy at the same time. In this context, we propose a hardware-based neural network that can predict the presence of cancer in humans with 98.23% accuracy. This is done by making use of cost-efficient, highly accurate activation functions, Sqish and LogSQNL. Due to its inherently parallel components, the system can classify a given sample in just one clock cycle, i.e., 15.75 nanoseconds. Though this system is dedicated to cancer diagnosis, it can predict the presence of many other diseases such as those of the heart. This is because the system is reconfigurable and can be programmed to classify any sample into one of two classes. The proposed hardware system requires about 983 slice registers, 2,655 slice lookup tables, and only 1.1 kilobits of on-chip memory. The system can predict about 63.5 million cancer samples in a second and can perform about 20 giga-operations per second. The proposed system is about 5-16 times cheaper and at least four times speedier than other dedicated hardware systems using neural networks for classification tasks.
Clouds play an important role in weather and climate-related investigations. However, they often influence the quality of images and waste resources of storage and bandwidth in remote sensing. So, it is critical to de...
详细信息
ISBN:
(纸本)9781510650060;9781510650053
Clouds play an important role in weather and climate-related investigations. However, they often influence the quality of images and waste resources of storage and bandwidth in remote sensing. So, it is critical to detect clouds for less cost of payload. In this paper, the design of a real-time cloud detection camera for small satellite platforms is proposed based on field programmable gate array (FPGA). Two MicroBlaze Soft Cores are embedded in the FPGA to accomplish the task without other chips assist. By using this way, the system is highly programmable and integrated, the weight of which also becomes lighter. We implemented the system on a Xilinx Virtex-4 FPGA. The test results show that the signal-to-noise ratio (SNR) is 128.1 at 80% of the saturated exposure. We select Arabian Peninsula-Pakistan-West India area to evaluate the cloud judgment accuracy. Compare with moderate resolution imaging spectroradiometer (MODIS) cloud mask products, the false alarm rate (FAR) is less than 3%. The application of the proposed approach in a simulation and engineering system indicates its effectiveness and practicability.
This paper presents a new approach for the modeling and real-time simulation of an embedded hybrid power source comprised of a fuel cell (FC) and a battery. The proposed modeling is based on a state-space-like represe...
详细信息
ISBN:
(纸本)9781665448642
This paper presents a new approach for the modeling and real-time simulation of an embedded hybrid power source comprised of a fuel cell (FC) and a battery. The proposed modeling is based on a state-space-like representation of the system equations obtained from the modified-augmented nodal analysis. Systematic formulation of system equations is presented and the solution to the non-linear equations are discussed. The proposed model was implemented on an entry level field programmable gate array (FPGA), and demonstrates sub-microsecond simulation time-step capability. The real-time solution is obtained by precomputing the system equations for all switch state combinations, and using the backward Euler integration scheme for solving differential equations. A fixed point number representation is utilized for speed and reduced configurable resource utilization. Our results show a high fidelity of the proposed model over a wide range of simulation time-steps and switching frequencies.
暂无评论