Clinical adoption of multispectral optoacoustic tomography necessitates improvements of the image quality available in real-time, as well as a reduction in the scanner financial cost. deeplearning approaches have rec...
详细信息
ISBN:
(纸本)9798350371918;9798350371901
Clinical adoption of multispectral optoacoustic tomography necessitates improvements of the image quality available in real-time, as well as a reduction in the scanner financial cost. deeplearning approaches have recently unlocked the reconstruction of high-quality optoacoustic images in real-time. However, currently used deep neural network architectures require powerful graphics processing units to infer images at sufficiently high frame-rates, consequently greatly increasing the price tag. Herein we propose EfficientdeepMB, a relatively lightweight (17M parameters) network architecture achieving high frame-rates on medium-sized graphics cards with no noticeable downgrade in image quality. EfficientdeepMB is built upon deepMB, a previously established deeplearning framework to reconstruct high-quality images in real-time, and upon EfficientNet, a network architectures designed to operate of mobile devices. We demonstrate the performance of EfficientdeepMB in terms of reconstruction speed and accuracy using a large and diverse dataset of in vivo optoacoustic scans. EfficientdeepMB is about three to five times faster than deepMB: deployed on a medium-sized NVIDIA RTX A2000 Ada, EfficientdeepMB reconstructs images at speeds enabling live image feedback (59 Hz) while deepMB fails to meets the real-time inference threshold (14 Hz). The quantitative difference between the reconstruction accuracy of EfficientdeepMB and deepMB is marginal (data residual norms of 0.1560 vs. 0.1487, mean absolute error of 0.642 vs. 0.745). There are no perceptible qualitative differences between images inferred with the two reconstruction methods.
Rice pest identification is essential in modern agriculture for the health of rice crops. As global rice consumption rises, yields and quality must be maintained. Various methodologies were employed to identify pests,...
详细信息
Rice pest identification is essential in modern agriculture for the health of rice crops. As global rice consumption rises, yields and quality must be maintained. Various methodologies were employed to identify pests, encompassing sensor -based technologies, deeplearning, and remote sensing models. Visual inspection by professionals and farmers remains essential, but integrating technology such as satellites, IoT-based sensors, and drones enhances efficiency and accuracy. A computer vision system processes images to detect pests automatically. It gives real-time data for proactive and targeted pest management. With this motive in mind, this research provides a novel farmland fertility algorithm with a deeplearning -based automated rice pest detection and classification (FFADL-ARPDC) technique. The FFADLARPDC approach classifies rice pests from rice plant images. Before processing, FFADLARPDC removes noise and enhances contrast using bilateral filtering (BF). Additionally, rice crop images are processed using the NASNetLarge deeplearning architecture to extract image features. The FFA is used for hyperparameter tweaking to optimise the model performance of the NASNetLarge, which aids in enhancing classification performance. Using an Elman recurrent neural network (ERNN), the model accurately categorises 14 types of pests. The FFADL-ARPDC approach is thoroughly evaluated using a benchmark dataset available in the public repository. With an accuracy of 97.58, the FFADL-ARPDC model exceeds existing pest detection methods.
Inherently low spatial resolution is a well-known challenge in electrical impedance tomography image reconstruction. Various approaches such as the use of spatial priors and post-processing techniques have been propos...
详细信息
Inherently low spatial resolution is a well-known challenge in electrical impedance tomography image reconstruction. Various approaches such as the use of spatial priors and post-processing techniques have been proposed to improve the resolution, but in the literature, comparisons using a common dataset representative of clinical images have not been considered. Here, we consider a database of 81,710 simulated EIT datasets constructed from pulmonary CT scans of 89 infants. Four techniques for improved image resolution and several combinations thereof are proposed and compared quantitatively on 16,341 known test cases reserved from the database. The techniques include an end-to-end deeplearning reconstruction approach, post-processing of real-time one-step Gauss-Newton (GN) reconstructions using machine learning, post-processing using the Schur complement method, the use of an initial guess for the one-step GN method derived from the image database, and a method that makes use of the eigenfunctions of the principal component analysis of image vectors in the database. All methods resulted in improved metrics of error measurement compared to the Newton one-step error reconstruction method used as the basis for comparison.
As a critical load-bearing and running component of railway systems, the wheelset's operational safety fundamentally depends on precise detection and localisation of tread defects. Current deeplearning-based dete...
详细信息
As a critical load-bearing and running component of railway systems, the wheelset's operational safety fundamentally depends on precise detection and localisation of tread defects. Current deeplearning-based detection methods face significant challenges in extracting discriminative edge features under small-sample conditions, leading to suboptimal defect localisation accuracy. To address these limitations, this study proposes TLEAR-Net, a novel defect detection framework integrating transfer learning with an edge-adaptive reinforcement attention mechanism. The methodology employs RetinaNet as the baseline architecture, enhanced through multi-stage domain adaptation using COCO 2017 pretraining and parameter-shared ResNet-50 backbone optimisation to bridge cross-domain feature discrepancies. An innovative edge-adaptive reinforcement (EAR) attention module is developed to selectively amplify defect boundary features through learnable gradient operators and hybrid spatial-channel attention mechanisms. Comprehensive evaluations on a proprietary data set annotated defect samples demonstrate the framework's superior performance, achieving state-of-the-art detection accuracy (89.22% mAP) while maintaining real-timeprocessing capability (42.45 FPS).
Tomato, as an essential food crop, is consumed worldwide, and at the same time, it is susceptible to several diseases that lead to a reduction in tomato yield. Proper diagnosis of tomato diseases is required to increa...
详细信息
Tomato, as an essential food crop, is consumed worldwide, and at the same time, it is susceptible to several diseases that lead to a reduction in tomato yield. Proper diagnosis of tomato diseases is required to increase the output of tomato crops. For this purpose, this paper proposes a tomato plant disease detection algorithm based on Pyramid Scene Parsing Network (PSPNet) and deeplearning. First, the training data set is augmented with data to alleviate the data imbalance problem in each category, and then the augmented images are fed into the proposed Mob-PSP network for training. The proposed network utilizes the lightweight MobileNet-V2 model as the feature extraction technique while integrating the PSPNet module to enhance the network's detection performance. The aim is to effectively extract local and global features from plant disease images, which are being introduced in plant disease detection. This study evaluated the model on the tomato subset of the public data set PlantVillage. The experimental results demonstrate that this algorithm achieves a balance between inference speed and detection accuracy, outperforming other state-of-the-art algorithms. Additionally, compared to the baseline model Inception-V3, the inference speed is improved by 10.73 frames per second, while maintaining an average accuracy of 99.69%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} with only 6.5M parameters.
This paper provides an insight into the development of a state-of-the-art video processing system to address limitations within Durham University's 'Encore' lecture capture solution. The aim of the researc...
详细信息
This paper provides an insight into the development of a state-of-the-art video processing system to address limitations within Durham University's 'Encore' lecture capture solution. The aim of the research described in this paper is to digitally remove the persons presenting from the view of a whiteboard to provide students with a more effective online learning experience. This work enlists a 'human entity detection module', which uses a remodelled version of the Fast Segmentation Neural Network to perform efficient binary image segmentation, and a 'background restoration module', which introduces a novel procedure to retain only background pixels in consecutive video frames. The segmentation network is trained from the outset with a Tversky loss function on a dataset of images extracted from various Tik-Tok dance videos. The most effective training techniques are described in detail, and it is found that these produce asymptotic convergence to within 5% of the final loss in only 40 training epochs. A cross-validation study then concludes that a Tversky parameter of 0.9 is optimal for balancing recall and precision in the context of this work. Finally, it is demonstrated that the system successfully removes the human form from the view of the whiteboard in a real lecture video. Whilst the system is believed to have the potential for real-time usage, it is not possible to prove this owing to hardware limitations. In the conclusions, wider application of this work is also suggested.
Place recognition is a critical technology in robot navigation and autonomous driving, remains challenging due to inefficient point cloud computation, limited feature representation capability, and poor robustness to ...
详细信息
Place recognition is a critical technology in robot navigation and autonomous driving, remains challenging due to inefficient point cloud computation, limited feature representation capability, and poor robustness to long-term environmental changes. We propose MVSE-Net, a feature extraction network with embedded semantic information for multi-view feature fusion. MVSE-Net can convert point cloud data acquired by LiDAR in realtime into global descriptors for retrieval. processing a point cloud by projecting it onto a 2D image can greatly improve computational efficiency. We projected the point cloud into a range-view (RV) image and a bird's-eye-view (BEV) image in forward and top view, respectively. The semantic segmentation network is then used to process the RV image, and the feature extraction part of the semantic model is connected to the transformer attention module to further refine the features for the place recognition task. The point cloud containing the semantic segmentation results is then converted into a semantic BEV image, and the multi-channel BEV image is processed using a group convolutional network. Finally, the features of the two branches are fused into a global feature representation by post-fusion. Our experiments on three publicly available datasets demonstrate that MVSE-Net exhibits high recall and strong generalization in LiDAR place recognition, outperforming previous state-of-the-art methods.
deeplearning methods have shown promising results in camera-based object detection. However, their effectiveness is significantly hindered in real-world scenes with poor illumination. In contrast, the trend of fusing...
详细信息
deeplearning methods have shown promising results in camera-based object detection. However, their effectiveness is significantly hindered in real-world scenes with poor illumination. In contrast, the trend of fusing millimeter-wave (mmWave) radar with camera for object detection is gaining momentum because radar can effectively compensate for the limitations of camera under poor illumination conditions. To this end, we propose ODSen, a lightweight, real-time, and robust fusion detection system. Specifically, ODSen presents several key advantages over existing sensor fusion methods: 1) despite fusing two sensing modalities in a deeplearning-based approach, it requires only a small amount of multimodal data for new scenes to achieve real-time and robust object detection;2) it employs a decoupled architecture that can switch between different image detectors to improve detection accuracy;3) it performs spatiotemporal fusion of radar and camera features and employs a box refinement model to enhance computational efficiency, thereby ensuring real-time performance without compromising detection robustness. We collect a radar and camera dataset with diverse scenes on a university campus, and conduct extensive experiments, verifying that the proposed ODSen boosts over state-of-the-art methods in terms of average precision (AP), while yielding low computational cost.
real-time instance segmentation in urban environments remains a critical challenge for autonomous driving systems, where occluded objects, cluttered backgrounds, and dynamic scales demand both high accuracy and comput...
详细信息
real-time instance segmentation in urban environments remains a critical challenge for autonomous driving systems, where occluded objects, cluttered backgrounds, and dynamic scales demand both high accuracy and computational efficiency. Traditional methods often sacrifice precision for speed or vice versa, failing to address the dual demands of urban scene understanding. Motivated by the need to bridge this gap, we propose PSC-YOLO, a lightweight framework driven by two core design principles: (1) enhancing multi-scale feature learning to resolve occlusion ambiguities and (2) enabling real-time interaction without compromising segmentation quality. Simultaneously, inspired by the adaptability of the Segment Anything Model (SAM), we streamline its mask decoding via architectural, enabling efficient pixel-level reasoning crucial for real-time urban perception. Experiments on urban road datasets demonstrate that PSC-YOLO outperforms YOLOv8n-seg by 2.0% in mask average precision while operating at 91 FPS-4 x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} faster than FastSAM. This work prioritizes the intrinsic requirements of urban perception systems: balancing precision for safety-critical tasks and speed for real-time decision-making, thereby advancing deployable solutions for autonomous vehicles and smart city infrastructure.
Diagnosing plant diseases is a vital issue in maintaining and developing of agricultural products. These diseases occur with changes in the tissue of different parts of plants. While the previous researches have only ...
详细信息
Diagnosing plant diseases is a vital issue in maintaining and developing of agricultural products. These diseases occur with changes in the tissue of different parts of plants. While the previous researches have only been conducted on certain species of plants and specific parts, we propose a comprehensive approach that has reached a high accuracy in diagnosing and classifying the disease of offending plant species by examining their different parts, including leaves, fruits, tree trunks, and seeds. We extract features from different layers of pre-trained AlexNet, ResNet50, VGG16, EfficientNetB0, EfficientNetB3 and EfficientNetB7 deep models with a spatial attention module to classify samples with an SVM classifier with RBF kernel. In order to automate the detection of plant diseases by manned or unmanned agricultural machines, our proposed Agry requires only 0.04089 seconds for imageprocessing and decision making in realtime. Also, due to the use of transfer learning, the cost of building the proposed model, including time and resources, is minimized. The results of the tests show a significant improvement compared to the previous works, and in most cases the classification is done without errors.
暂无评论