neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeR...
详细信息
neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeRF) frameworks that generalize to unseen scenes on-the-fly by combining multi-view stereo with differentiable volume rendering. These generalizable NeRF methods synthesize the colors of 3D ray points by learning the consistency of image features projected from given nearby views. Since the consistency is computed on the 2D projected image space, it is vulnerable to occlusion and local shape variation by viewing direction. To solve this problem, we present dense depth-guided generalizable NeRF that leverages the depth as the signed distance between the ray point and the object surface of the scene. We first generate the dense depth maps from sparse 3D points of structure from motion (SfM) which is an inevitable step to obtain camera poses. Next, the dense depth maps are exploited as complementary features invariant to the sparsity of nearby views and mask for occlusion handling. Experiments demonstrate that our approach outperforms existing generalizable NeRF methods for widely used real and synthetic datasets.
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image...
详细信息
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image formation is a pivotal step in radar imaging, essential for transforming complex raw radar data into interpretable and utilizable imagery. Nowadays, advancements in SAR sensor design, resulting in very wide swaths, generate a massive volume of data, necessitating extensive processing. Traditional methods of SAR image formation often involve resource-intensive and time-consuming postprocessing. There is a vital need to automate this process in near-real-time, enabling fast responses for various applications, including image classification and object detection. We present an SAR processing pipeline comprising a complex 2D autofocus SARNet, followed by a CNN-based classification model. The complex 2D autofocus SARNet is employed for image formation, utilizing an encoder-decoder architecture, such as U-Net and a modified version of ResU-Net. Meanwhile, the image classification task is accomplished using a CNN-based classification model. This framework allows us to obtain near real-time results, specifically for quick image viewing and scene classification. Several experiments were conducted using real-SAR raw data collected by the European remote sensing satellite to validate the proposed pipeline. The performance evaluation of the processing pipeline is conducted through visual assessment as well as quantitative assessment using standard metrics, such as the structural similarity index and the peak-signal-to-noise ratio. The experimental results demonstrate the processing pipeline's robustness, efficiency, reliability, and responsivity in providing an integrated neural network-based SAR processing pipeline.
In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficul...
详细信息
ISBN:
(纸本)9798350343557
In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficulties were identified regarding the forms of the symbols, the complexity of the maps or the placement of the symbols on the map. Observations about the success or failure of the methods against the difficulties defined according to the experiments are presented. It has been observed that methods involving artificial neural networks are more successful when performing symbol recognition on color maps. The highest result was obtained with Faster RCNN as 91%.
Spectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. N...
详细信息
Spectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. Nevertheless, because of their limited modeling capabilities, especially when analyzing real scenes with unknown complex physical properties, these methods may not be accurate. On the other hand, data-driven methods using deep learning in particular have developed rapidly in recent years, thanks to their superior capability in modeling complex nonlinear systems. Simply transferring these methods as black boxes to perform unmixing may lead to low interpretability and poor generalization ability. To bring together the best of two worlds, recent research efforts have focused on combining the advantages of both physics-based models and data-driven methods. In this article, we present an overview of recent advances on this topic from various perspectives, including deep neural network (DNN) design, prior capturing, and loss selection. We summarize these methods within a common optimization framework and discuss ways of enhancing our understanding of these methods. The related source codes are made publicly available at http://***/xiuheng-wang/awesome-hyperspectral-image-unmixing.
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these c...
详细信息
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these challenges, this paper introduces a novel network model named Multi-Modal Speech Emotion Recognition Network (MMSERNet). This model leverages the power of multimodal and multiscale feature fusion to significantly enhance the accuracy of speech emotion recognition. MMSERNet is composed of three specialized sub-networks, each dedicated to the extraction of distinct feature types: cepstral coefficients, spectrogram features, and textual features. It integrates audio features derived from Mel-frequency cepstral coefficients and Mel spectrograms with textual features obtained from word vectors, thereby creating a rich, comprehensive representation of emotional content. The fusion of these diverse feature sets facilitates a robust multimodal approach to emotion recognition. Extensive empirical evaluations of the MMSERNet model on benchmark datasets such as IEMOCAP and MELD demonstrate not only significant improvements in recognition accuracy but also an efficient use of model parameters, ensuring scalability and practical applicability.
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investig...
详细信息
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neural networks (CNNs) with conventional radar signalprocessingmethods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signalprocessing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.
When processing text images with traditional binarization methods, the image background noise often causes the results to become blurred or leads to the loss of edge details. To solve this problem, this paper proposes...
详细信息
With the rapid urbanization process, waste management has become a significant environmental issue globally. Waste sorting, as an effective method of resource recycling and environmental protection, has gradually beco...
详细信息
With the rapid urbanization process, waste management has become a significant environmental issue globally. Waste sorting, as an effective method of resource recycling and environmental protection, has gradually become a key solution to the waste pollution problem. Traditional waste classification methods rely on manual labor, which is inefficient and prone to errors, making them inadequate for modern urban waste management. In recent years, image recognition and artificial intelligence (AI)-based methods for waste classification have gained widespread attention, with deep learning techniques, particularly Convolutional neural Networks (CNNs), showing great potential in waste sorting. However, existing research on waste classification models faces challenges such as imperfect network structures, insufficient training data, and poor environmental adaptability, which limit their application in complex environments. This study proposes a waste classification model based on image recognition and AI to enhance classification accuracy and efficiency. First, an improved PCANet and SDenseNet network structure is combined to propose aAnew feature extraction and representation method, enhancing the model's feature learning ability. Secondly, a layered learning strategy, combined with the traditional backpropagation algorithm, is used to optimize the training process and improve learning efficiency. Finally, experimental results demonstrate that the proposed waste classification model significantly outperforms traditional models in classification accuracy and processing capability in various environments, providing aAnew solution for the advancement of waste classification technologies.
Electromagnetic imaging methods mainly utilize converted sampling, dimensional transformation, and coherent processing to obtain spatial images of targets, which often suffer from accuracy and efficiency problems. Dee...
详细信息
Electromagnetic imaging methods mainly utilize converted sampling, dimensional transformation, and coherent processing to obtain spatial images of targets, which often suffer from accuracy and efficiency problems. Deep neural network (DNN)-based high-resolution imaging methods have achieved impressive results in improving resolution and reducing computational costs. However, previous works exploit single modality information from electromagnetic data;thus, the performances are limited. In this article, we propose an electromagnetic image generation network (EMIG-Net), which translates electromagnetic data of multiview 1-D range profiles (1DRPs), directly into bird-view 2-D high-resolution images under cross-modal supervision. We construct an adversarial generative framework with visual images as supervision to significantly improve the imaging accuracy. Moreover, the network structure is carefully designed to optimize computational efficiency. Experiments on self-built synthetic data and experimental data in the anechoic chamber show that our network has the ability to generate high-resolution images, whose visual quality is superior to that of traditional imaging methods and DNN-based methods, while consuming less computational cost. Compared with the backprojection (BP) algorithm, the EMIG-Net gains a significant improvement in entropy (72%), peak signal-to-noise ratio (PSNR;150%), and structural similarity (SSIM;153%). Our work shows the broad prospects of deep learning in radar data representation and high-resolution imaging and provides a path for researching electromagnetic imaging based on learning theory.
With the rapid development of entity recognition technology, animal recognition has gradually become essential in modern society, supporting labour-intensive agriculture and animal husbandry tasks. Severe problems suc...
详细信息
With the rapid development of entity recognition technology, animal recognition has gradually become essential in modern society, supporting labour-intensive agriculture and animal husbandry tasks. Severe problems such as maintaining biodiversity can also benefit from animal identification technology. However, certain invasive recognition systems have resulted in permanent harm to animals, while noninvasive identification methods also exhibit certain drawbacks. This paper conducts a systematic literature review (SLR), presenting a comprehensive overview of various animal recognition technologies and their applications. Specifically, it examines methodologies such as deep learning, imageprocessing and acoustic analysis used for different animal characteristics and identification purposes. The contribution of machine learning to animal feature extraction is highlighted, emphasising its significance for animal taxonomy and wild species monitoring. Additionally, this review addresses the challenges and limitations of current technologies, including data scarcity, model accuracy and computational requirements, and suggests opportunities for future research to overcome these obstacles.
暂无评论