Spectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. N...
详细信息
Spectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. Nevertheless, because of their limited modeling capabilities, especially when analyzing real scenes with unknown complex physical properties, these methods may not be accurate. On the other hand, data-driven methods using deep learning in particular have developed rapidly in recent years, thanks to their superior capability in modeling complex nonlinear systems. Simply transferring these methods as black boxes to perform unmixing may lead to low interpretability and poor generalization ability. To bring together the best of two worlds, recent research efforts have focused on combining the advantages of both physics-based models and data-driven methods. In this article, we present an overview of recent advances on this topic from various perspectives, including deep neural network (DNN) design, prior capturing, and loss selection. We summarize these methods within a common optimization framework and discuss ways of enhancing our understanding of these methods. The related source codes are made publicly available at http://***/xiuheng-wang/awesome-hyperspectral-image-unmixing.
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investig...
详细信息
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neural networks (CNNs) with conventional radar signalprocessingmethods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signalprocessing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these c...
详细信息
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these challenges, this paper introduces a novel network model named Multi-Modal Speech Emotion Recognition Network (MMSERNet). This model leverages the power of multimodal and multiscale feature fusion to significantly enhance the accuracy of speech emotion recognition. MMSERNet is composed of three specialized sub-networks, each dedicated to the extraction of distinct feature types: cepstral coefficients, spectrogram features, and textual features. It integrates audio features derived from Mel-frequency cepstral coefficients and Mel spectrograms with textual features obtained from word vectors, thereby creating a rich, comprehensive representation of emotional content. The fusion of these diverse feature sets facilitates a robust multimodal approach to emotion recognition. Extensive empirical evaluations of the MMSERNet model on benchmark datasets such as IEMOCAP and MELD demonstrate not only significant improvements in recognition accuracy but also an efficient use of model parameters, ensuring scalability and practical applicability.
This work deals with the task of land use and land cover (LULC) change detection using multi-temporal multispectral remote sensing images. In the last few years, deep learning-based change detection methods have been ...
详细信息
ISBN:
(纸本)9798350351491;9798350351484
This work deals with the task of land use and land cover (LULC) change detection using multi-temporal multispectral remote sensing images. In the last few years, deep learning-based change detection methods have been successfully implemented for automatic LULC change detection. Although these elaborated methodologies achieved a very high score in the detection accuracy, they are able to provide only binary change mapping where are illustrated change regions and non-change regions without any specification of the ground classes nature. In order to get a multiclass change detection leading to a more accurate change mapping, we proposed in this paper a fully unsupervised three steps methodology where during the first step, a binary change mapping is obtained using k-MAD (kernel multivariate alteration detection) components combined with a Chi-2 test thresholding, during the second step, the change and non-change regions are iteratively classified due to the AP (affinity propagation) clustering algorithm to reach multiclass non-change area and "from-to" change area, finally in the third step, samples from the changed and unchanged classes are involved in a DNN (deep neural network) architecture to provide a multiclass land change mapping. Co-registered bi-temporal multispectral images taken over the northeastern region of Algiers, Algeria, between 1997 and 2001 by the American LANDSAT-TM satellite are used to verify the effectiveness of the proposed scheme. The obtained "from to" land change map validated by means of the spectral signature analysis is more informative on the change and non-change pixels comparing to the binary land change map.
With the rapid urbanization process, waste management has become a significant environmental issue globally. Waste sorting, as an effective method of resource recycling and environmental protection, has gradually beco...
详细信息
With the rapid urbanization process, waste management has become a significant environmental issue globally. Waste sorting, as an effective method of resource recycling and environmental protection, has gradually become a key solution to the waste pollution problem. Traditional waste classification methods rely on manual labor, which is inefficient and prone to errors, making them inadequate for modern urban waste management. In recent years, image recognition and artificial intelligence (AI)-based methods for waste classification have gained widespread attention, with deep learning techniques, particularly Convolutional neural Networks (CNNs), showing great potential in waste sorting. However, existing research on waste classification models faces challenges such as imperfect network structures, insufficient training data, and poor environmental adaptability, which limit their application in complex environments. This study proposes a waste classification model based on image recognition and AI to enhance classification accuracy and efficiency. First, an improved PCANet and SDenseNet network structure is combined to propose aAnew feature extraction and representation method, enhancing the model's feature learning ability. Secondly, a layered learning strategy, combined with the traditional backpropagation algorithm, is used to optimize the training process and improve learning efficiency. Finally, experimental results demonstrate that the proposed waste classification model significantly outperforms traditional models in classification accuracy and processing capability in various environments, providing aAnew solution for the advancement of waste classification technologies.
Despite the great success of deep neural networks in brain tumor segmentation, it is challenging to obtain sufficient annotated images due to the requirement of clinical expertise. Masked image modeling recently achie...
详细信息
Despite the great success of deep neural networks in brain tumor segmentation, it is challenging to obtain sufficient annotated images due to the requirement of clinical expertise. Masked image modeling recently achieved competitive performance compared with supervised training by learning rich representations from unlabeled data. However, it is originally designed for vision transformers and its effectiveness has not been well -studied in the medical domain, usually for limited unlabeled data and small convolutional network scenarios. In this paper, we propose a self -supervised learning framework to pre -train U -Net for brain tumor segmentation. Our goal is to learn modality -specific and modality -invariant representations from multimodality magnetic resonance images. This is motivated by the fact that different modalities indicate the same organs and tissues but have various appearances. To achieve this, we design a new pretext task that reconstructs the masked patches of each modality based on the partial observation of other modalities. We evaluate our method by transfer performance on BraTS 2020 dataset. The experimental results demonstrate our method outperforms other self -supervised learning methods and improves the performance of a strong fully supervised baseline. The source codes are available at https://***/mobiletomb/IS-MIM.
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image...
详细信息
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image formation is a pivotal step in radar imaging, essential for transforming complex raw radar data into interpretable and utilizable imagery. Nowadays, advancements in SAR sensor design, resulting in very wide swaths, generate a massive volume of data, necessitating extensive processing. Traditional methods of SAR image formation often involve resource-intensive and time-consuming postprocessing. There is a vital need to automate this process in near-real-time, enabling fast responses for various applications, including image classification and object detection. We present an SAR processing pipeline comprising a complex 2D autofocus SARNet, followed by a CNN-based classification model. The complex 2D autofocus SARNet is employed for image formation, utilizing an encoder-decoder architecture, such as U-Net and a modified version of ResU-Net. Meanwhile, the image classification task is accomplished using a CNN-based classification model. This framework allows us to obtain near real-time results, specifically for quick image viewing and scene classification. Several experiments were conducted using real-SAR raw data collected by the European remote sensing satellite to validate the proposed pipeline. The performance evaluation of the processing pipeline is conducted through visual assessment as well as quantitative assessment using standard metrics, such as the structural similarity index and the peak-signal-to-noise ratio. The experimental results demonstrate the processing pipeline's robustness, efficiency, reliability, and responsivity in providing an integrated neural network-based SAR processing pipeline.
Electromagnetic imaging methods mainly utilize converted sampling, dimensional transformation, and coherent processing to obtain spatial images of targets, which often suffer from accuracy and efficiency problems. Dee...
详细信息
Electromagnetic imaging methods mainly utilize converted sampling, dimensional transformation, and coherent processing to obtain spatial images of targets, which often suffer from accuracy and efficiency problems. Deep neural network (DNN)-based high-resolution imaging methods have achieved impressive results in improving resolution and reducing computational costs. However, previous works exploit single modality information from electromagnetic data;thus, the performances are limited. In this article, we propose an electromagnetic image generation network (EMIG-Net), which translates electromagnetic data of multiview 1-D range profiles (1DRPs), directly into bird-view 2-D high-resolution images under cross-modal supervision. We construct an adversarial generative framework with visual images as supervision to significantly improve the imaging accuracy. Moreover, the network structure is carefully designed to optimize computational efficiency. Experiments on self-built synthetic data and experimental data in the anechoic chamber show that our network has the ability to generate high-resolution images, whose visual quality is superior to that of traditional imaging methods and DNN-based methods, while consuming less computational cost. Compared with the backprojection (BP) algorithm, the EMIG-Net gains a significant improvement in entropy (72%), peak signal-to-noise ratio (PSNR;150%), and structural similarity (SSIM;153%). Our work shows the broad prospects of deep learning in radar data representation and high-resolution imaging and provides a path for researching electromagnetic imaging based on learning theory.
When processing text images with traditional binarization methods, the image background noise often causes the results to become blurred or leads to the loss of edge details. To solve this problem, this paper proposes...
详细信息
Fault diagnosis in rotating machinery faces significant challenges in strong noise environments. Especially under extremely high noise intensity and unknown noise types, existing methods struggle to maintain accuracy....
详细信息
Fault diagnosis in rotating machinery faces significant challenges in strong noise environments. Especially under extremely high noise intensity and unknown noise types, existing methods struggle to maintain accuracy. We propose the Improved Residual Attention Convolutional neural Network (IRA-CNN) to address strong noise problem. IRA-CNN integrates the interconnected multi-branch structure and the mixed attention mechanism specially designed for vibration signals. Unlike previous studies that only consider Gaussian noise and signal-to-ratio > - 6, we evaluate the model's noise robustness by extensive experiments across three datasets, three noise types, and six noise intensity levels. The results reveal that the noise type significantly impacts model performance which has often been overlooked in previous studies. IRA-CNN outperforms state-of-the-art models in both accuracy and generalization. These findings establish a highly effective solution for fault diagnosis in challenging strong noise environments.
暂无评论