Dense panoptic prediction is a key ingredient in many existing applications such as autonomous driving, automated warehouses, or remote sensing. Many of these applications require fast inference over large input resol...
详细信息
Dense panoptic prediction is a key ingredient in many existing applications such as autonomous driving, automated warehouses, or remote sensing. Many of these applications require fast inference over large input resolutions on affordable or even embedded hardware. We proposed to achieve this goal by trading off backbone capacity for multi-scale feature extraction. In comparison with contemporaneous approaches to panoptic segmentation, the main novelties of our method are efficient scale-equivariant feature extraction, cross-scale upsampling through pyramidal fusion and boundary-aware learning of pixel-to-instance assignment. The proposed method is very well suited for remote sensing imagery due to the huge number of pixels in typical city-wide and region-wide datasets. We present panoptic experiments on Cityscapes, Vistas, COCO, and the BSB-Aerial dataset. Our models outperformed the state-of-the-art on the BSB-Aerial dataset while being able to process more than a hundred 1MPx images per second on an RTX3090 GPU with FP16 precision and TensorRT optimization.
Age-invariant face recognition has many real-world applications. Despite significant advances in this field, it is still challenging to accurately predict faces across various ages as a person’s age changes the face ...
Age-invariant face recognition has many real-world applications. Despite significant advances in this field, it is still challenging to accurately predict faces across various ages as a person’s age changes the face significantly over time, leading to a lot of intra-class variations. In this paper, we have attempted to use sub-pixel interpolation over an extracted face, as a pre-processing step for increasing image resolution, for age invariant face recognition. We have employed the Xception deeplearning architecture over the Casia - Webface dataset for our experiments. We show that there is a significant enhancement in recognition accuracy when sub-pixel interpolation is used compared to when images are given to the deeplearning model without any pre-processing.
High-quality badminton shuttlecock heads are made from cork discs derived from natural oak bark, which exhibit complex cracks and wrinkles, causing inefficiency and inconsistency in quality screening. To address this,...
ISBN:
(数字)9781837241910
High-quality badminton shuttlecock heads are made from cork discs derived from natural oak bark, which exhibit complex cracks and wrinkles, causing inefficiency and inconsistency in quality screening. To address this, we propose a deeplearning-based detection algorithm using YOLOv5. The YOLOv5 model was optimized for cork disc characteristics by introducing an attention mechanism to enhance feature representation and designing a post-processing algorithm to improve detection accuracy. The optimized model, trained on a custom cork disc dataset using an NVIDIA RTX3080 GPU, achieved 86.7% mF1 and 81.5% mAP, outperforming other mainstream algorithms. Finally, the system utilizes Nvidia's edge computing device Jetson Nano as the computational core, deploying the YOLOv5 model and designing the graphical interface on Ubuntu 18.04. real-time cork disc image acquisition is achieved using binocular industrial cameras and fiber optic sensors, while a uniformly rotating turntable and diversion device are designed to facilitate the transfer of cork disc. Experimental results show that the system achieves a 92.75% classification accuracy for four quality grades with an inference time of 87.6ms, meeting the requirements for real-time cork disc quality inspection.
Purpose Book sorting system is one of specific application in smart library scenarios, and it now has been widely used in most libraries based on RFID (radio-frequency identification devices) technology. Book identifi...
详细信息
Purpose Book sorting system is one of specific application in smart library scenarios, and it now has been widely used in most libraries based on RFID (radio-frequency identification devices) technology. Book identification processing is one of the core parts of a book sorting system, and the efficiency and accuracy of book identification are extremely critical to all libraries. In this paper, the authors propose a new image recognition method to identify books in libraries based on barcode decoding together with deeplearning optical character recognition (OCR) and describe its application in library book identification processing. Design/methodology/approach The identification process relies on recognition of the images or videos of the book cover moving on a conveyor belt. Barcode is printed on or attached to the surface of each book. deeplearning OCR program is applied to improve the accuracy of recognition, especially when the barcode is blurred or faded. The approach the authors proposed is robust with high accuracy and good performance, even though input pictures are not in high resolution and the book covers are not always vertical. Findings The proposed method with deeplearning OCR achieves best accuracy in different vertical, skewed and blurred image conditions. Originality/value Experiment demonstrates that the accuracy of the proposed method is high in real-time test and achieves good accuracy even when the barcode is blurred. deeplearning is very effective in analyzing image content, and a corresponding series of methods have been formed in video content understanding, which can be a greater advantage and play a role in the application scene of intelligent library.
Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deep learnin...
详细信息
Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deeplearning model designed for automated quality control of these surfaces during both construction and maintenance phases. The model employs per-pixel segmentation and per-image classification, integrating both local and broader context information. Additionally, we utilize the classification results to improve segmentation during both training and inference stages. We evaluated the proposed model on a publicly available dataset containing more than 7,000 images of pavement and concrete cracks. The model achieved a Dice score of 81% and an intersection-over-union of 71%, surpassing publicly available state-of-the-art methods by at least 6-7 percentage points. An ablation study confirms that leveraging classification information enhances overall segmentation performance. Furthermore, our model is computationally efficient, processing over 30 FPS for 512 x 512 images, making it suitable for real-time applications on medium-resolution images. Code and the corrected dataset ground truths are publicly available: https://***/vicoslab/***.
Machine learning (ML) and deeplearning (DL) have achieved great success in different tasks. These include computer vision, image segmentation, natural language processing, predicting classification, evaluating time s...
详细信息
Machine learning (ML) and deeplearning (DL) have achieved great success in different tasks. These include computer vision, image segmentation, natural language processing, predicting classification, evaluating time series, and predicting values based on a series of variables. As artificial intelligence progresses, new techniques are being applied to areas like optical spectroscopy and its uses in specific fields, such as the agrifood industry. The performance of ML and DL techniques generally improves with the amount of data available. However, it is not always possible to obtain all the necessary data for creating a robust dataset. In the particular case of agrifood applications, dataset collection is generally constrained to specific periods. Weather conditions can also reduce the possibility to cover the entire range of classifications with the consequent generation of imbalanced datasets. To address this issue, data augmentation (DA) techniques are employed to expand the dataset by adding slightly modified copies of existing data. This leads to a dataset that includes values from laboratory tests, as well as a collection of synthetic data based on the real data. This review work will present the application of DA techniques to optical spectroscopy datasets obtained from real agrifood industry applications. The reviewed methods will describe the use of simple DA techniques, such as duplicating samples with slight changes, as well as the utilization of more complex algorithms based on deeplearning generative adversarial networks (GANs), and semi-supervised generative adversarial networks (SGANs).
Object Detection from real world scenario is a subset of Computer Vision, that uses state-of-the-art algorithms and techniques in deeplearning to identify and locate the objects in an image or video. Latest advanceme...
详细信息
deep neural network (DNN) methods play an essential role in hyperspectral classification. However, the massive parameters and vast computing overhead of DNN needs to be reduced when facing the deployment with limited ...
详细信息
deep neural network (DNN) methods play an essential role in hyperspectral classification. However, the massive parameters and vast computing overhead of DNN needs to be reduced when facing the deployment with limited storage and computing resources for real-time response applications, especially considering the high dimensionality of hyperspectral image. So, applying dimension reduction (DR) methods is a crucial pre-processing method in various studies. Still, most of them ignore the feature restoration after the data transformation by DR. In neural networks, many works still involve sophisticated skip connections and dense feature reuse, which can lead to feature redundancy and increase computational complexity, especially when DR methods have been applied first. Motivated by these issues, an efficient joint framework assisted by embedded feature smoother (FS) and sparse skip connection (SSC) is proposed in this article. Instead of directly feeding DR data into the subsequent network, we embedded a computing-cheap FS based on isotropic total variation to restore and enhance the spatial features. Furthermore, we proposed a SSC 3D convolution neural network to complete spatial-spectral feature representation and classification. The SSC is embodied in the design of log2n-skip connection to concatenate feature maps instead of dense connection, pruning the number of channels and reducing the model parameters. Experimental results show that the embedded FS significantly improves classification accuracy and is superior to other processing methods. Our framework offers much superior to other state-of-the-art deeplearning-based methods considering classification performance and lightweight aspects, especially when using few training samples. Moreover, considering detailed processing steps, our framework has a competitively cheaper time consumption.
Reconstructing dynamic MRI image sequences from undersampled accelerated measurements is crucial for faster and higher spatiotemporal resolution real-time imaging of cardiac motion, free breathing motion and many othe...
详细信息
ISBN:
(数字)9798331520526
ISBN:
(纸本)9798331520533
Reconstructing dynamic MRI image sequences from undersampled accelerated measurements is crucial for faster and higher spatiotemporal resolution real-time imaging of cardiac motion, free breathing motion and many other applications. Classical paradigms, such as gated cine MRI, assume periodicity, disallowing imaging of true motion. Supervised deeplearning methods are fundamentally flawed as, in dynamic imaging, ground truth fully-sampled videos are impossible to truly obtain. We propose an unsupervised framework to learn to reconstruct dynamic MRI sequences from undersampled measurements alone by leveraging natural geometric spatiotemporal equivariances of MRI. Dynamic Diffeomorphic Equivariant Imaging (DDEI) significantly outperforms state-of-the-art unsupervised methods such as SSDU on highly accelerated dynamic cardiac imaging. Our method is agnostic to the underlying neural network architecture and can be used to adapt the latest models and post-processing approaches. Our code and video demos are at https://***/Andrewwango/ddei.
Due to the obvious diversity and complexity of damage patterns, geometries, and spatial scales of urban building complex earthquake hazards, conventional identification and assessment methods are less generalizable in...
详细信息
ISBN:
(数字)9798331515706
ISBN:
(纸本)9798331515713
Due to the obvious diversity and complexity of damage patterns, geometries, and spatial scales of urban building complex earthquake hazards, conventional identification and assessment methods are less generalizable in real post-earthquake scenarios. Compared with time-range signals such as kinetic acceleration, image/video data provide a new source of perceptual information for accurately assessing the post-earthquake damage of urban building complexes. To realize the integrated, comprehensive, and rapid identification and assessment of the structural damage of urban buildings after an earthquake, this paper proposes a geometrically constrained deeplearning framework for seismic damage identification and assessment of buildings based on computer vision. It systematically carries out a multi-scale seismic damage identification and assessment method that associates the “building group, building unit, and structural component” with the “building group, building unit, and structural component”. This paper proposes a geometrically constrained deeplearning framework for seismic damage identification and assessment based on computer vision and systematically researches the identification and assessment of “building groups-building units-structural components”. A method for finely identifying densely distributed small-target buildings and rapid assessment of the collapse state after an earthquake based on satellite remote sensing images at high altitudes is proposed. A semantic segmentation network for post-earthquake building cluster identification and assessment is built, the influence law of the weight coefficients of the GCE loss on the segmentation performance of the model is systematically investigated, and the geometric feature optimization performance of the GCE loss in the training process and the multi-level feature extraction ability are analyzed, which verifies the effectiveness and accuracy of the geometrically constrained deeplearning method for multi-scale
暂无评论