Accurately identifying field weeds is crucial for selecting appropriate agricultural machinery and herbicides. This paper focuses on nine common weed species during the seedling stage in natural backgrounds of field v...
详细信息
This work proposes a novel concept for tree and plant reconstruction by directly inferring a Lindenmayer-System (L-System) word representation fromimagedata in an image captioning approach. We train a model end-to-e...
详细信息
We present a novel multi-modal data fusion technique using topological features. The method, TopFusion, leverages the flexibility of topological data analysis tools (namely persistent homology and persistence images) ...
详细信息
ISBN:
(纸本)9798350302493
We present a novel multi-modal data fusion technique using topological features. The method, TopFusion, leverages the flexibility of topological data analysis tools (namely persistent homology and persistence images) to map multi-modal datasets into a common feature space by forming a new multi-channel persistence image. Each channel in the image is representative of a view of the datafrom a modality-dependent filtration. We demonstrate that the topological perspective we take allows for more effective datareconstruction, i.e. imputation. In particular, by performing imputation in topological feature space we are able to outperform the same imputation techniques applied to raw data or alternatively derived features. We show that TopFusion representations can be used as input to downstream deep learning-based computer vision models and doing so achieves comparable performance to other fusion methods for classification on two multi-modal datasets.
Within the realm of imagedata manipulation, enhancing resolution and reconstructing facial visuals stand as pivotal methods designed to restore poorly defined images impacted by unidentified deterioration mechanisms....
详细信息
Demosaicking is a critical process in the digital imaging pipeline, tasked with reconstructing full-color images from sampled data captured by R/G/B color sensors. The challenge arises from two-thirds of the pixel dat...
详细信息
Remote sensing plays a crucial role in various fields. However, challenges associated with acquiring high-resolution datafrom satellite cameras significantly limit their practical applications. The high semantic dens...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Remote sensing plays a crucial role in various fields. However, challenges associated with acquiring high-resolution datafrom satellite cameras significantly limit their practical applications. The high semantic density per pixel in satellite images makes it challenging for existing methods to extract adequate geometric and semantic information from extremely low-resolution inputs for super-resolution reconstruction. This paper introduces a satellite imagery super-resolution architecture guided by ground-view images, framing the problem as neural pixel synthesis with satellite camera height as a variable factor. This approach proposes a hypergraph-based cross-view mapper module that achieves low-order geometric registration and highorder feature fusion by capturing cross-view visual correlations, accompanied by a height-based pixel synthesizer for continuous multi-level super-resolution, conceptualized as neural rendering. Furthermore, we have developed a multi-level resolution satellite imagedataset, complete with ground images from corresponding locations. Extensive experiments on diverse datasets validate the effectiveness of our proposed method in a range of application scenarios.
3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augme...
详细信息
ISBN:
(纸本)9798350377712;9798350377705
3D detection is a critical task that enables machines to identify and locate objects in three-dimensional space. It has a broad range of applications in several fields, including autonomous driving, robotics and augmented reality. Monocular 3D detection is attractive as it requires only a single camera, however, it lacks the accuracy and robustness required for real world applications. High resolution LiDAR on the other hand, can be expensive and lead to interference problems in heavy traffic given their active transmissions. We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection. Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor. Specifically, we use only 512 points, which is just 1% of a full LiDAR frame in the KITTI dataset. Our method reconstructs a complete 3D point cloud from this limited 3D information combined with a single image. The reconstructed 3D point cloud and corresponding image can be used by any multi-modal off-the-shelf detector for 3D object detection. By using the proposed network architecture with an off-the-shelf multi-modal 3D detector, the accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods and 6% to 9% compare to the baseline multi-modal methods on KITTI and JackRabbot datasets.
Medical image classification plays a vital role in disease diagnosis, tumor staging, and various clinical applications. Deep learning (DL) methods have become increasingly popular for medical image classification. How...
详细信息
ISBN:
(纸本)9781510671577;9781510671560
Medical image classification plays a vital role in disease diagnosis, tumor staging, and various clinical applications. Deep learning (DL) methods have become increasingly popular for medical image classification. However, medical images have unique characteristics that pose challenges for training DL-based models, including limited annotated data, imbalanced distribution of classes, and large variations in lesion structures. Self-supervised learning (SSL) methods have emerged as a promising solution to alleviate these issues through directly learning useful representations from large-scale unlabeled data. In this study, a new generative self-supervised learning method based on the StyleGAN generator is proposed for medical image classification. The style generator, pre-trained on large-scale unlabeled data, is integrated into the classification framework to effectively extract style features that encapsulate essential semantic information from input images through imagereconstruction. The extracted style feature serves as an auxiliary regularization term to leverage knowledge learned from unlabeled data to support the training of the classification network and enhance model performance. To enable efficient feature fusion, a self-attention module is designed for this integration of the style generator and classification framework, dynamically focusing on important feature elements related to classification performance. Additionally, a sequential training strategy is designed to train the classification model on a limited number of labeled images while leveraging large-scale unlabeled data to improve classification performance. The experimental results on a chest X-ray imagedataset demonstrate superior classification performance and robustness compared to traditional DL-based methods. The effectiveness and potential of the model were discussed as well.
The paper presents a workflow for the reconstruction of a SiC-SiC ceramic matrix composite (CMC) microstructure using advanced image processing techniques and deep learning. The objective of this research is to develo...
详细信息
In clinical applications, medical imagereconstruction is crucial for extracting complementary information and restoring image quality. However, existing deep learning-based reconstruction methods suffer from the foll...
详细信息
暂无评论