FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accel...
详细信息
FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signalprocessing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.
End-to-end image compression has achieved satisfactory results in recent studies. However, existing methods suffer from high complexity of complicated neural network computation and cannot be directly deployed on mobi...
详细信息
ISBN:
(纸本)9798350387261;9798350387254
End-to-end image compression has achieved satisfactory results in recent studies. However, existing methods suffer from high complexity of complicated neural network computation and cannot be directly deployed on mobile devices due to the limitations of computing ability and storage. Therefore, considering the resource and computing ability constrains of the mobile devices, we make a trade-off in this paper between rate-distortion (R-D) performance, inference time, and model complexity. Then we design a novel lightweight perceptual image compression framework to alleviate the storage and complexity burden of mobile devices. Moreover, we design a hardware-friendly deployment scheme to apply the proposed compression framework on high-end mobile devices, which can achieve efficient image compression. Based on the above structures, we propose the first mobile system that achieves image compression on mobile devices. The supplementary material of our system demo is on https://***/documents/extreme-lowbitrate-image-compression-system-mobile-deployment.
In this article, we investigate the spontaneity issue in facial expression sequence generation. Current leading methods in the field are commonly reliant on manually adjusted conditional variables to direct the model ...
详细信息
In this article, we investigate the spontaneity issue in facial expression sequence generation. Current leading methods in the field are commonly reliant on manually adjusted conditional variables to direct the model to generate a specific class of expression. We propose a neural network-based method which uses Gaussian noise to model spontaneity in the generation process, removing the need for manual control of conditional generation variables. Our model takes two sequential images as input, with additive noise, and produces the next image in the sequence. We trained two types of models: single-expression, and mixed-expression. With single-expression, unique facial movements of certain emotion class can be generated;with mixed expressions, fully spontaneous expression sequence generation can be achieved. We compared our method to current leading generation methods on a variety of publicly available datasets. Initial qualitative results show our method produces visually more realistic expressions and facial action unit (AU) trajectories;initial quantitative results using image quality metrics (SSIM and NIQE) show the quality of our generated images is higher. Our approach and results are novel in the field of facial expression generation, with potential wider applications to other sequence generation tasks.
This paper deals with the state-space modelling of nonlinear stochastic dynamic systems. The emphasis is laid on the emerging area of data-augmented physics-based modelling of the state dynamics, which combines the be...
详细信息
ISBN:
(纸本)9798350373769;9798350373752
This paper deals with the state-space modelling of nonlinear stochastic dynamic systems. The emphasis is laid on the emerging area of data-augmented physics-based modelling of the state dynamics, which combines the benefits of the physics-driven and data-based identified models. As the augmented state-space models depend on the measured data, modelling the state noise properties becomes challenging. This paper proposes and validates a concept for the state noise identification of nonlinear data-augmented state equation using the maximum likelihood and correlation-based methods. The numerical simulation of a tracking scenario shows significant improvement of the state estimation accuracy and consistency when using the identified noise model.
Training deep neural networks has become a common approach for addressing image restoration problems. An alternative for training a "task-specific" network for each observation model is to use pretrained dee...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Training deep neural networks has become a common approach for addressing image restoration problems. An alternative for training a "task-specific" network for each observation model is to use pretrained deep denoisers for imposing only the signal's prior within iterative algorithms, without additional training. Recently, this approach has become increasingly popular with the rise of diffusion/score-based generative models, whose core is iterative denoising. Using denoisers for general purpose restoration requires guiding the iterations to ensure agreement of the signal with the observations. In low-noise settings, guidance that is based on back-projection (BP) has been shown to be a promising strategy (used recently in the context of diffusion models also under the names "pseudoinverse" or "range/null-space" guidance). However, the presence of noise in the observations hinders the gains from this approach. In this paper, we propose a novel guidance technique, based on preconditioning that allows traversing from BP-based guidance to least squares based guidance along the restoration scheme. The proposed approach is robust to noise while still having much simpler implementation than alternative methods (e.g., no SVD is required). We demonstrate its advantages for image deblurring and superresolution.
In this paper, we propose a face detection and recognition system using deep learning method. It can be used as an access control system that performs face detection and recognition in real-time processing. Our goal i...
详细信息
In this paper, we propose a face detection and recognition system using deep learning method. It can be used as an access control system that performs face detection and recognition in real-time processing. Our goal is to achieve a one-shot recognition instead of traditional two-step methods. We use SSD as the main model for face detection and VGG-Face as the main model for face recognition. We perform the deep learning method through the collection of datasets. Moreover, we use some techniques, such as data augmentation, preprocessing of the image, and post-processing of the image to train the robust face detection and recognition subsystems. We use continuous frames as input to avoid false-positive cases and make the system output without wrong results. A real demonstration system is constructed to determine the identification of the laboratory members. We use 1280 x 960 resolution video for experimental testing and achieve about 30 fps speed under GPU acceleration.
This research presents a new unified dictionary training super-resolution (UDTSR) approach to single and multi-image super-resolutions that uses a bilevel optimization framework and patchwise sparse recovery. By emplo...
详细信息
ISBN:
(纸本)9798350350661;9798350350654
This research presents a new unified dictionary training super-resolution (UDTSR) approach to single and multi-image super-resolutions that uses a bilevel optimization framework and patchwise sparse recovery. By employing interconnected dictionaries to bridge the two spaces of image patches, we ensure that a sparse representation of a low-resolution (LR) image patch can accurately reconstruct its corresponding high-resolution (HR) patch. To facilitate efficient stochastic gradient descent, implicit differentiation calculates the gradient. Furthermore, by using a neural network model for rapid sparse inference and selective processing of visually essential areas, we can almost tenfold improve the performance of real-world applications. Also, by discovering the fundamental relationships between different data modalities, our approach overcomes the difficulty of dealing with panchromatic and multispectral images. For example, using shared and individual sparse representations, we describe a data model that can detect similarities and differences in multimodal signals. Single image super-resolution (SISR) and multi-frame super-resolution (MFSR) are advanced separately, with minimal research on their ideal combination. We propose a novel UDTSR analysis using an iterative shrinkage and thresholding algorithm. Our simulations of many combinations of SISR and MFSR, such as x2, x3, and x4, confirm our theory quantitatively and qualitatively.
Convolutional neural Networks (CNN)-based Single-image Super-Resolution (SISR) methods for RGB images have flourished rapidly. However, thermal images SR methods based on CNN are rarely studied. The performance of exi...
详细信息
Convolutional neural Networks (CNN)-based Single-image Super-Resolution (SISR) methods for RGB images have flourished rapidly. However, thermal images SR methods based on CNN are rarely studied. The performance of existing deep SR methods is limited by the narrow receptive field of single small convolution kernel (e.g., 3 x 3). In this paper, we propose a thermal image SISR deep network MPRANet, combining multi-path residual and attention blocks. Specifically, an innovative design multi-path residual block, constructed by parallel depth-wise separable convolution paths composed of convolution kernels of different sizes, is used to extract local minute and global large features, effectively enhancing the capacity of MPRANet. Meanwhile, the attention block is formed by cascading channel attention and spatial attention modules to re-scale features in the channel and spatial dimensions sequentially. A Mixture of Data Augmentation (MoDA) strategy for meliorating MPRANet performance without increasing computational burden is proposed. MoDA makes full use of multiple pixel-domain data augmentation methods to raise the generalization of MPRANet. Qualitative and quantitative experiments on three test datasets show that the proposed MPRANet has obvious advantages over state-of-the-art thermal and RGB image SR methods for the preservation of details such as edges and textures.
作者:
Wang, RuizhePang, JiaojiaoHan, XiaoleXiang, MinNing, XiaolinBeihang Univ
Sch Instrumentat & Optoelect Engn Key Lab Ultraweak Magnet Field Measurement Technol Minist Educ Beijing 100191 Peoples R China Beihang Univ
Hangzhou Innovat Inst Zhejiang Prov Key Lab Ultraweak Magnet Field Space Hangzhou 310051 Zhejiang Peoples R China Beihang Univ
Hangzhou Inst Natl Extremely Weak Magnet Field Inf Hangzhou 310028 Zhejiang Peoples R China Shandong Univ
Inst Magnet Field Free Med & Funct Imaging Shandong Key Lab Magnet Field Free Med & Funct Ima Jinan Peoples R China Shandong Univ
Shandong Prov Clin Res Ctr Emergency & Crit Care M Dept Emergency Med Qilu Hosp Jinan Peoples R China Shandong Univ
Natl Innovat Platform Ind Educ Intearat Med Engn I Jinan Peoples R China Hefei Natl Lab
Hefei 230088 Anhui Peoples R China
Objective: This study developed a fast and accurate automated method for magnetocardiography (MCG) classification. Approach: We propose a deformable convolutional block attention module (DCBAM)-based method for classi...
详细信息
Objective: This study developed a fast and accurate automated method for magnetocardiography (MCG) classification. Approach: We propose a deformable convolutional block attention module (DCBAM)-based method for classifying coronary artery disease (CAD) using MCG. After preprocessing, the raw MCG data were segmented into individual heartbeat segments and encoded into image representations using the Hilbert curve to convert the temporal features into spatial image features. We combined DCBAM with convolutional neural networks (CNNs) for MCG classification. DCBAM incorporated a deformable convolutional architecture along with temporal and spatial attention mechanisms to capture representative and correlative features of the image representation MCG along the temporal and spatial multichannel dimensions. We performed ablation experiments to evaluate the rationality and validity of the proposed model structure. Additionally, we performed an interpretability analysis to investigate the model's region of interest for CAD diagnosis. Results: The proposed method achieved an average accuracy of 93.57%, precision of 94.71%, sensitivity of 92.56%, specificity of 94.68%, and average F1-score of 93.60%. In contrast to existing methods, our proposed model achieved superior diagnostic classification results in MCG with fewer parameters. Significance: Integrating DCBAM with image-representation MCG establishes a novel feature extraction method that enhances the clinical utility of MCG and effectively addresses long-range dependencies and spatiotemporal inconsistencies in time-series signal analysis.
Video surveillance continues to have difficulties with identifying the anomalies such as illegal activities and crimes despite the development of interactive multimedia anomaly detection systems. To address this issue...
详细信息
Video surveillance continues to have difficulties with identifying the anomalies such as illegal activities and crimes despite the development of interactive multimedia anomaly detection systems. To address this issue, an Optimized Interpretable Generalized Additive neural Networks based Malicious Activity Detection with Video Surveillance (IGANN-MAD-VS-EOSSOA) is proposed in this paper. Initially, the input videos are collected from UCF-Crime and ShanghaiTech dataset. The collected video is fed to pre-processing for improving the quality of video, removing the noise and enhancing the clarity of image using Multiple Local Particle Filtering (MLPF). The pre-processed video is fed to the segmentation process. Here, the input videos are segmented into image using Maximum Entropy Scaled Super-pixels Segmentation (MESPS). Then the feature extraction is done by Synchro-Transient-Extracting Transform (STET) to extract the features, like color, texture, size, shape, and orientation. The extracted features are provided to the Interpretable Generalized Additive neural Networks (IGANN) for classifying malicious activity, like Normal, Assault, Fighting, Shooting, Vandalism, Abuse and Accident. In general, IGANN does not adapt any optimization techniques for determining the optimal parameters to assure appropriate categorization. Hence, Elite opposite Sparrow Search Optimization Algorithm (EOSSOA) is proposed to enhance the weight parameter of IGANN for the detection of malicious activity with video surveillance. The proposed IGANN-MAD-VS-EOSSOA method is implemented in Python. The proposed technique attains 26.36%, 20.69% and 30.29% higher accuracy, 19.12%, 28.32%, and 27.84% higher precision when compared with the existing methods: Video anomaly detection scheme with deep convolutional and recurrent techniques (AD-CNN-VS), Toward trustworthy human suspicious activity detection from surveillance videos with deep learning (HSAD-SV-RNN), Deep learning-based real-world object dete
暂无评论