The Pietra-Ricci index detector (PRIDe) has been recently proposed as one of the simplest techniques for centralized, data-fusion cooperative spectrum sensing, attaining robustness against time-varying signal and nois...
详细信息
The Pietra-Ricci index detector (PRIDe) has been recently proposed as one of the simplest techniques for centralized, data-fusion cooperative spectrum sensing, attaining robustness against time-varying signal and noise levels, constant false alarm rate, and high detection power. In this paper, we propose the design and implementation of the PRIDe detector, targeting field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) solutions. Novel approaches are proposed for computing the PRIDe's test statistic, including the absolute value of complex quantities, the complex multiplier-accumulator, and the spectrum occupancy decision. The absolute value operation, which is critical to the PRIDe test statistic computational cost, applies the coordinate rotation digital computer (CORDIC) algorithm as a low latency and resource-efficient option. Register transfer level (RTL) and Monte Carlo simulations show that the resulting ultra-low latency PRIDe detector architectures attain no performance loss with respect to floating-point simulations. One of the two proposed ASIC design versions of the PRIDe sensor occupies 34.9% lower area compared to the most area-efficient sensor reported in literature, whereas the other one is $5.7\times$ faster than the fastest state-of-the-art sensor. In a nutshell, the proposed detector architecture delivers the highest area and power efficiencies, considering the scaled values of area-time product (ATP) and power-delay product (PDP) metrics, in comparison to implementations reported to date.
Recurrent neural networks (RNNs) are extensively employed to perform inference based on the temporal features of the input data. However, their computational workload and power consumption involved in inference are pr...
详细信息
Recurrent neural networks (RNNs) are extensively employed to perform inference based on the temporal features of the input data. However, their computational workload and power consumption involved in inference are prohibitively high in practice, which may be problematic to achieve a high-speed inference in devices with tight limitations in the available silicon resources and power supply. This paper presents an efficient inference processor for RNNs, named ROSETTA. ROSETTA supports multiple data formats programmable for each vector operand to achieve a wide range or high precision with a limited data size. ROSETTA consistently performs every vector operation based on homogeneous processing units with a high utilization rate. Moreover, ROSETTA skips operations and reduces memory accesses to achieve high energy efficiency by pruning the activation elements in a fine-grained manner. Implemented in a low-cost 28 nm field-programmablegatearray, ROSETTA exhibits a resource and energy efficiency as high as 2.51 - 1.14 MOP/s/LUT and 434.01 - 113.29 GOP/s/W, respectively, while producing near-floating-point inference results. The resource and energy efficiency of ROSETTA are higher than those of the previous processor implemented in the same device by up to 206.1% and 304.0%, respectively. The functionality has been verified for several RNN models of various types under a fully-integrated inference system.
Background: Accurate and fast image registration (IR) is critical during surgical interventions where the ultrasound (US) modality is used for image-guided intervention. Convolutional neural network (CNN)-based IR met...
详细信息
Background: Accurate and fast image registration (IR) is critical during surgical interventions where the ultrasound (US) modality is used for image-guided intervention. Convolutional neural network (CNN)-based IR methods have resulted in applications that respond faster than traditional iterative IR methods. However, general-purpose processors are unable to operate at the maximum speed possible for real-time CNN algorithms. Due to its reconfigurable structure and low power consumption, the field programmable gate array (FPGA) has gained prominence for accelerating the inference phase of CNN applications. Methods: This study proposes an FPGA-based ultrasound IR CNN (FUIR-CNN) to regress three rigid registration parameters from image pairs. To speed up the estimation process, the proposed design makes use of fixed-point data and parallel operations carried out by unrolling and pipelining techniques. Experiments were performed on three US datasets in real time using the xc7z020, and the xcku5p was also used during implementation. Results: The FUIR-CNN produced results for the inference phase 139 times faster than the software-based network while retaining a negligible drop in regression performance of under 200 MHz clock frequency. Conclusions: Comprehensive experimental results demonstrate that the proposed end-to-end FPGA-based accelerated CNN achieves a negligible loss, a high speed for registration parameters, less power when compared to the CPU, and the potential for real-time medical imaging.
In this study an optimized UNET model is used for FPGA-based inference in the context of brain tumour segmentation using the BraTS dataset. The presented model features reduced depth and fewer filters, tailored to enh...
详细信息
In this study an optimized UNET model is used for FPGA-based inference in the context of brain tumour segmentation using the BraTS dataset. The presented model features reduced depth and fewer filters, tailored to enhance efficiency on FPGA hardware. The implementation leverages High -Level Synthesis for Machine Learning (HLS4ML) to optimize and convert a Keras-based UNET model to Hardware Description Language (HDL) in the Kintex Ultrascale (xcku085flva1517-3-e) FPGA. Resource strategy, First in First out (FIFO) depth optimization, and precision adjustment were employed to optimize FPGA resource utilization. Resource strategy is demonstrated to be effective, with resource utilization reaching a saturation point at a 1000 -reuse factor. Following FIFO optimization, significant reductions are observed, including a 55 percent decrease in Block RAM (BRAM) usage, a 43 percent reduction in Flip -Flops (FF), and a 49 percent reduction in LookUp Tables (LUT). In C/RTL co -simulation, the proposed FPGAbased UNET model achieves an Intersection over Union (IoU) score of 74 percent, demonstrating comparable segmentation accuracy to the original Keras model. These findings underscore the viability of the optimized UNET model for efficient brain tumour segmentation on FPGA platforms.
Improving real-time computational efficiency is a major research direction in Direction-Of-Arrival (DOA) estimation. In this paper, a novel computationally efficient real-valued DOA estimator is presented, in which th...
详细信息
Improving real-time computational efficiency is a major research direction in Direction-Of-Arrival (DOA) estimation. In this paper, a novel computationally efficient real-valued DOA estimator is presented, in which the estimation is performed without the need for EigenValue Decomposition (EVD) and therefore avoids estimating the source number in advance. Following the comparison between the traditional MUSIC algorithm and the Capon Method, we present a general form of DOA estimation, which reveals that the construction of the noise subspace in the traditional MUSIC algorithm derives from the activation function performed on the eigenvalues. Unlike the classic subspace-based algorithm, our proposed activation-like function eliminates the reliance on subspace decomposition, thereby removing the need for source number estimation and mitigating performance degradation caused by incorrect estimations. Moreover, existing real-valued DOA algorithms would estimate both the true DOAs and their corresponding mirror DOAs, and the space-shifting property is used to eliminate the mirror DOAs. In addition, the field programmable gate array (FPGA) implementation for our proposed real-valued algorithm is developed, showing a dramatic reduction of the hardware resource consumption and computation burden compared with the complex-valued MUSIC. Experiments illustrate that our proposed algorithm is computationally more efficient, and achieves higher estimation resolution compared to the existing methods.
Unmanned Aerial Vehicles (UAVs), sometimes known as drones, evolved from military to civilian applications, opening up novel perspectives in a variety of everyday services. The rapidly growing consumer interest in ama...
详细信息
Unmanned Aerial Vehicles (UAVs), sometimes known as drones, evolved from military to civilian applications, opening up novel perspectives in a variety of everyday services. The rapidly growing consumer interest in amateur drones equipped with high-end cameras compromises the everyday safety and privacy of people. In the literature, a variety of sensing techniques based on different physical phenomena have been proposed for drone detection. Among acoustic, optical, or radar detection systems, passive radiofrequency sensing is the only one that can identify a drone even before it takes off and additionally indicate the operator's location. A spectrogram-based method is developed and optimised in terms of computing location, resulting in the possibility of sensor grid deployment over a standard Ethernet network. The detection phase involves hardware-accelerated energy sensing to extract the data frames from the background noise. Drone presence is then identified using machine learning based solely on preamble pattern recognition, which reduces the computational effort. The presented procedure is evaluated in an isolated setting employing an open-source dataset and tuned across multiple neural network architectures. Next, the complete sensor processing chain is examined in a real-life scenario. The analytical energy detector stage reaches a margin of roughly -8.7 dB in the signal-to-noise (SNR) ratio. With 1.1 M parameters, the proposed neural network achieves 99.93% simulation accuracy in up to -9.5 dB SNR range. Even after quantization for embedded platform implementation, the device can be used as a stand-alone early intrusion detector or as part of a distributed sensor grid.
A novel ergodic cellular automaton model of gene-protein network is presented. It is shown that the presented model can predict occurrences of typical nonlinear phenomena of a conventional ordinary differential equati...
详细信息
A novel ergodic cellular automaton model of gene-protein network is presented. It is shown that the presented model can predict occurrences of typical nonlinear phenomena of a conventional ordinary differential equation gene-protein network model. In addition, theoretical analysis methods of the presented model are proposed. Using the analysis methods, an important advantage of the presented model is revealed: the ergodic cellular automaton is better suited to predict the occurrences of the nonlinear phenomena of the differential equation gene-protein network model compared to a regular (standard) cellular automaton. Furthermore, the presented model is implemented by a field programmable gate array and experiments validate its operations. It is then revealed that the presented model is much more hardware-efficient compared to a standard numerical integration formula of the differential equation model.
The constant false-alarm rate (CFAR) algorithm is essential for detecting targets during radar signal processing. It has been improved to accurately detect targets, especially in nonhomogeneous environments, such as m...
详细信息
The constant false-alarm rate (CFAR) algorithm is essential for detecting targets during radar signal processing. It has been improved to accurately detect targets, especially in nonhomogeneous environments, such as multitarget or clutter edge environments. For example, there are sort-based and variable index-based algorithms. However, these algorithms require large amounts of computation, making them difficult to apply in radar applications that require real-time target detection. We propose a new CFAR algorithm that determines the environment of a received signal through a new decision criterion and applies the optimal CFAR algorithms such as the modified variable index (MVI) and automatic censored cell averaging-based ordered data variability (ACCA-ODV). The Monte Carlo simulation results of the proposed CFAR algorithm showed a high detection probability of 93.8% in homogeneous and nonhomogeneous environments based on an SNR of 25 dB. In addition, this paper presents the hardware design, field-programmablegatearray (FPGA)-based implementation, and verification results for the practical application of the proposed algorithm. We reduced the hardware complexity by time-sharing sum and square operations and by replacing division operations with multiplication operations when calculating decision parameters. We also developed a low-complexity and high-speed sorter architecture that performs sorting for the partial data in leading and lagging windows. As a result, the implementation used 8260 LUTs and 3823 registers and took 0.6 mu s to operate. Compared with the previously proposed FPGA implementation results, it is confirmed that the complexity and operation speed of the proposed CFAR processor are very suitable for real-time implementation.
In this work, modified histogram estimation (MHE) architecture is proposed to verify the histogram count in the FPGA platform, and the Basic HE (BHE) architecture is also implemented for comparative purpose. The entir...
详细信息
In this work, modified histogram estimation (MHE) architecture is proposed to verify the histogram count in the FPGA platform, and the Basic HE (BHE) architecture is also implemented for comparative purpose. The entire proposed MHE architecture is developed newly so as to reduce the logical elements involved in the HE process. In MHE architecture, dual port read only memory (DPROM), carry select adder based counter (CSAC), and Optimal Bin Counter (OBC) are used to evaluate the HE count with effective accuracy. The amount of percentage reduced by the 256 sample MHE is 17.62%, 15.41% and 23.01% for area, power and delay respectively. Additionally, the performance of the proposed MHE is compared with four existing methods HOG, HBS, MBPA and DMH. The number of flip flops utilised by the MHE architecture is 2177 for Vertex 6 device, which is less compared to the HOG and MBPA.
A novel low-complexity combined resampling, retiming and equalizing (RRE) algorithm is proposed. The RRE algorithm uses a single FIR filter for resampling, retiming and equalizing and thus lower the complexity. In the...
详细信息
A novel low-complexity combined resampling, retiming and equalizing (RRE) algorithm is proposed. The RRE algorithm uses a single FIR filter for resampling, retiming and equalizing and thus lower the complexity. In the numerical simulation, with an oversampling rate of 32/27, compared to the traditional time-domain scheme with a 15-tap CMA equalizer and the frequency-domain scheme based on 256-point FFT, the RRE algorithm with a 15-tap RRE filter lowers the error vector magnitude (EVM) by 0.036 dB and 0.043 dB and the complexity is lowered by 48.3% and 31.9%, respectively. In the offline experiment, with a received optical power of -35 dBm, compared to the traditional time-domain scheme with a 15-tap CMA equalizer and the frequency-domain scheme based on 256-point FFT, the RRE algorithm with a 15-tap RRE filter lowers the EVM by 0.26 dB and 0.36 dB. And the RRE algorithm respectively lowers the complexity by 48.3% and 31.9%. The RRE algorithm also enables a real-time 106.24 Gbps (26.56 GBaud) DP-QPSK coherent optical receiver based on a single FPGA chip using four 6-bit ADCs with a sampling rate of similar to 31.48 GSa/s. The FPGA-based receiver achieves a sensitivity of -34 dBm at BER of 1E-3. As far as we know, this is the highest reported bit rate of a coherent receiver based on a single FPGA chip.
暂无评论