Deep neural network (DNN), being an important member of machine learning family, has been employed to serve a wide range of applications in the area of signal and image processing like pattern recognition, speech reco...
详细信息
Deep neural network (DNN), being an important member of machine learning family, has been employed to serve a wide range of applications in the area of signal and image processing like pattern recognition, speech recognition, language processing, image segmentation, etc. To this aim, this paper concentrates on the design of a narrow transition-band finite impulse response (FIR) filter with the aid of back-propagation-based deep learning approach. The proposed deep learning-based approach offers a unified design framework for a variety of FIR filters. Convergence behaviour of the proposed algorithm has been proved analytically in situations when weights between adjacent layers are updated continuously. Simulation results have shown the frequency response characteristics of several FIR filters with narrow transition-band, designed with the help of proposed approach. Advantage of our design strategy has also been established in terms of magnitude response over a number of state-of-the-art techniques of recent interest. Simulation results have shown noticeable improvement in terms of transition bandwidth when compared with few existing works. Designed filter is subsequently implemented on Altera's Cyclone IV field programmable gate array (FPGA) chip, and hardware efficiency of the suggested design has strongly been established by correlating its hardware cost with many of the state-of-the-art FIR filters.
The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an...
详细信息
The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an FPGA-based accelerator called FPGAN for graph attention networks that achieves significant improvement on performance and energy efficiency without losing accuracy compared with PyTorch baseline. It eliminates the dependence on digital signal processors (DSPs) and large amounts of on-chip memory and can even work well on low-end FPGA devices. We design FPGAN with software and hardware co-optimization across the full stack from algorithm through architecture. Specifically, we compress model to reduce the model size, quantify features to perform fixed-point calculation, replace multiplication addition cell (MAC) with shift addition units (SAUs) to eliminate the dependence on DSPs, and design an efficient algorithm to approximate SoftMax function. We also adjust the activation functions and fuse operations to further reduce the computation requirement. Moreover, all data is vectorized and aligned for scalable vector computation and efficient memory access. All the above optimizations are integrated into a universal hardware pipeline for various structures of GATs. We evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that FPGAN can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.
Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount o...
详细信息
Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount of computing resources. A novel one-dimensional (1D) convolution processor with reconfigurable architecture is implemented in this study. This processor is a combination of a line buffer, controller units, as well as a reconfigurable and separable convolution module. The use of a reconfigurable architecture and separable convolution approach improves the flexibility and performance of the convolution processor. The reconfigurable and separable convolution array, which is the main component of the processor, can simultaneously execute convolution operation with different kernels, with a maximum kernel size of up to 24 x 24. Experimental results show that the maximum frames rate of the processor is approximately 194 frames per second (fps), which exceeds the real-time requirement. Synthesis results show that the processor occupies 13.39 mm (2) at a 204 MHz system clock and consumes a power of 419 mW at maximum kernel size at a 120 MHz system clock in SMIC 0.18 mu m CMOS technology. Verification experiments on field programmable gate arrays (FPGAs) demonstrate that the processor is suitable for real-time image processing applications even for high-resolution images.
Recurrent neural networks (RNNs) are extensively employed to perform inference based on the temporal features of the input data. However, their computational workload and power consumption involved in inference are pr...
详细信息
Recurrent neural networks (RNNs) are extensively employed to perform inference based on the temporal features of the input data. However, their computational workload and power consumption involved in inference are prohibitively high in practice, which may be problematic to achieve a high-speed inference in devices with tight limitations in the available silicon resources and power supply. This paper presents an efficient inference processor for RNNs, named ROSETTA. ROSETTA supports multiple data formats programmable for each vector operand to achieve a wide range or high precision with a limited data size. ROSETTA consistently performs every vector operation based on homogeneous processing units with a high utilization rate. Moreover, ROSETTA skips operations and reduces memory accesses to achieve high energy efficiency by pruning the activation elements in a fine-grained manner. Implemented in a low-cost 28 nm field-programmablegatearray, ROSETTA exhibits a resource and energy efficiency as high as 2.51 - 1.14 MOP/s/LUT and 434.01 - 113.29 GOP/s/W, respectively, while producing near-floating-point inference results. The resource and energy efficiency of ROSETTA are higher than those of the previous processor implemented in the same device by up to 206.1% and 304.0%, respectively. The functionality has been verified for several RNN models of various types under a fully-integrated inference system.
With the increasing number of computation nodes integrated in multi and many-core platforms, network-on-chips (NoCs) emerged as a new communication medium in systems-on-chips (SoCs). HopliteRT is a new NoC design that...
详细信息
With the increasing number of computation nodes integrated in multi and many-core platforms, network-on-chips (NoCs) emerged as a new communication medium in systems-on-chips (SoCs). HopliteRT is a new NoC design that was recently proposed to address the needs of real-time systems whilst respecting the constraints of field-programmablegatearray (FPGA) platforms. In this article, we: 1) introduce priority-based routing in HopliteRT;2) change the network topology in order to improve the packets' worst-case traversal time (WCTT);3) identify a flaw in the existing timing analysis of HopliteRT;and 4) develop a new timing analysis that is proven correct. We also show by means of experiments that the modifications of HopliteRT proposed in this article allows for at least 2x improvement on the worst and average case traversal time of high priority packets, without impacting the quality of service of low-priority packets. The timing properties of high priority flows are greatly improved for negligible additional hardware costs. The proposed NoC has been implemented in Verilog and synthesized for a Xilinx Virtex-7 FPGA platform.
Infrared target detection and recognition are investigated by considering the wide application requirements of an airborne photoelectric system. The proposed algorithm can be divided into three parts. First, on the ba...
详细信息
Infrared target detection and recognition are investigated by considering the wide application requirements of an airborne photoelectric system. The proposed algorithm can be divided into three parts. First, on the basis that the target of infrared images dominates the background in the frequency domain, this paper presents a method of candidate region detection. The detection algorithm first generates a saliency map using the discrete cosine transform and then identifies candidate regions by computing and comparing saliency scores of different regions. Second, to extract the features of each candidate region for recognition, the paper presents a local descriptor and subsequently uses locality-constrained linear coding and a pooling operator to obtain the feature vector of the target, and then further completes target recognition via a simple linear classifier. Finally, as preliminary research on the engineering application of related algorithms, the detection and recognition algorithms are transplanted to an embedded platform. The paper conducts experiments on six test sequences to evaluate the performance of the proposed algorithms and the computing efficiency on the embedded platform. An evaluation experiment and comparison experiment verify the effectiveness and practicability of the proposed algorithms.
In this paper, a new tomography technique called electrical charge tomography for two-phase flow imaging is presented. The probe consists of few pair of electrodes which are electrically energized to generate electric...
详细信息
In this paper, a new tomography technique called electrical charge tomography for two-phase flow imaging is presented. The probe consists of few pair of electrodes which are electrically energized to generate electrical charges within the fluid under test. The intensity of these charges depends on the chemical and physical properties of the fluid, as well as to its molecular distribution. Another group of electrodes surrounding the cross section of the fluid under test are used to capture the induced electrical charges. These are then converted into an electrical signal using a high sensitive charge amplifier. A postprocessing unit which consists of an analog to digital converter, followed by an field programmable gate array (FPGA) module is then used for high level signal processing (i.e., a dedicated dynamic thresholding algorithm) and image reconstruction. Experimental results demonstrate the capability of the system to accurately generate 2-D cross-sectional images, where the error is lower by up to 14% when using another electrical capacitance (ECT) tomography probe. The other advantage of this technique over ECT is the reduced data acquisition time, since in ECT a minimum time is required for the charge and discharge of the capacitance in order to achieve acceptable accuracy. This makes the probe another attractive concept for future tomography systems targeting real-time applications.
Normally-off computing (Noff computing) using a multicontext field programmable gate array (MC-FPGA) consisting of crystalline oxide semiconductor FETs has been developed. The Noff computing discussed in this paper is...
详细信息
Normally-off computing (Noff computing) using a multicontext field programmable gate array (MC-FPGA) consisting of crystalline oxide semiconductor FETs has been developed. The Noff computing discussed in this paper is a control architecture for an MC-FPGA capable of performing fine-grained power gating on each programmable logic element (PLE) whose registers include a volatile register and also a nonvolatile shadow register for storing and loading data in the volatile register. The MC-FPGA performs fine-grained control of power supplied only to PLEs contributing to effective calculation, when context switching happens. With an MC-FPGA fabricated with a hybrid process of a 1.0 mu m crystalline oxide semiconductor FET on a 0.5 mu m CMOS FET, it has been confirmed that the proposed Noff computing can resume the previous task when a context switches back to it, increases PLE use efficiency, and reduces the power consumption by 27.7% at operating frequencies of 20 MHz with a driving voltage of 2.5 V.
The radio frequency (RF) spectrum is a limited resource. Spectrum allotment disputes stem from this scarcity as many radio devices are confined to a fixed frequency. One alternative is to incorporate reconfigurability...
详细信息
The radio frequency (RF) spectrum is a limited resource. Spectrum allotment disputes stem from this scarcity as many radio devices are confined to a fixed frequency. One alternative is to incorporate reconfigurability within a cognitive radio platform, thereby enabling the radio to adapt to dynamic RF spectrum environments. In this way, the radio is able to actively observe the RF spectrum, orient itself to the current RF environment, decide on a mode of operation, and act accordingly, thereby sharing the spectrum and operating in more a flexible manner. This research presents a novel architecture for the purpose of adapting radio operation to the current RF spectrum environment. Specifically, this research makes three contributions: (1) a framework for testing and evaluating clustering algorithms in the context of cognitive radio networks, (2) a new RF spectrum map merging technique for adaptive waveform selection, with initial integration testing on a field-programmablegatearray (FPGA), and (3) a novel cognitive radio network emulation framework for testing and evaluating totally-ordered multicast as a means for inter-node communication.
In the past decades, field-programmablegatearrays (FPGAs) have demonstrated an interesting physical platform to facilitate quantum information processing, particularly in the emergence of domain-specific hardware ac...
详细信息
In the past decades, field-programmablegatearrays (FPGAs) have demonstrated an interesting physical platform to facilitate quantum information processing, particularly in the emergence of domain-specific hardware accelerators for quantum computing emulation and quantum key distillation. While conventional general-purpose hardware platforms have been used for quantum information processing, FPGAs promise deep pipeline parallelism, adaptable interface, and trivial support for custom-precision operation. Therefore, the time is ripe for describing recent development of quantum computing emulators and quantum key distillation accelerators on FPGAs. In this article, we provide a comprehensive review of the state-of-the-art in this active field, with a balance between theoretical, implementational, and technological results. Challenges and promising research opportunities are also discussed.
暂无评论