Few-shot learning (FSL) has been introduced to hyperspectral image (HSI) classification due to the scarcity of labeled samples. Graph neural Network (GNN) based FSL methods show excellent performance. Nevertheless, ex...
详细信息
Few-shot learning (FSL) has been introduced to hyperspectral image (HSI) classification due to the scarcity of labeled samples. Graph neural Network (GNN) based FSL methods show excellent performance. Nevertheless, existing methods neglect that the graph data of different topological structures may require various aggregation iterations. It is difficult to extract global topological information in different domains with a fixed number of layers. A reinforced graph aggregation cross-domain FSL (RGA-CFSL) method is proposed, integrating FSL and deep reinforcement learning (DRL) into a unified framework. Specifically, supervised contrastive learning with multi-metric constraints (MSCL) is designed to provide stable prototypes. Meanwhile, we introduce a DRL model into a designed global information extraction (GIE) module, which alleviates domain shift at the topological structure level. The DRL model facilitates the extraction of global topological information, which dynamically predicts the optimal architectures of GNNs required for given graph data. Furthermore, an inter-scale feature fusion (IFF) module is designed to capture representative distribution information in domains and reduce domain shift at the distribution level, which aggregates global topological information and local spatial–spectral information. Experimental results on four target HSI datasets demonstrate the our RGA-CFSL obtains superior performance.
With the rapid development of satellite remote sensing technology, the resolution of satellite image is getting higher and higher, and more and more satellite data can be obtained on the ground. Traditional artificial...
详细信息
ISBN:
(纸本)9789811371233;9789811371226
With the rapid development of satellite remote sensing technology, the resolution of satellite image is getting higher and higher, and more and more satellite data can be obtained on the ground. Traditional artificial image translation methods can not deal with massive data, and can not efficiently, quickly and accurately obtain the information of interested objects. In view of this problem, considering that the depth convolution neural network technology has achieved good results in the natural image target recognition, this paper uses the typical depth neural frame Faster R-CNN as the basic frame, and uses the image augmentation method to enhance the accuracy and generalization ability of the neural network model, and multi-resolution optical remote sensing image data to achieve automatic target recognition processing. The results show that the proposed method can translate images automatically and quickly, the recognition rate of ship and other targets is better than 75%.
Most of the methods for repairing old photos today are to manually process them using image editing software, such as Photoshop. The time of manual repairing is directly proportional to the damage degree of the photo,...
详细信息
ISBN:
(纸本)9781728130385
Most of the methods for repairing old photos today are to manually process them using image editing software, such as Photoshop. The time of manual repairing is directly proportional to the damage degree of the photo, which is time consuming and laborious. Therefore, this paper proposes a two-stage convolution network to automatically repair damaged old photos. The first stage will detect the damaged areas of the photos, and the second stage will repair these areas. The experiment results demonstrates our method can successfully detect and repair the damage of the photos.
In medical imaging, denoising is very important for analysis of images, diagnosis and treatment of diseases. Currently, image denoising methods based on deep learning are effective, where the methods are however limit...
详细信息
In medical imaging, denoising is very important for analysis of images, diagnosis and treatment of diseases. Currently, image denoising methods based on deep learning are effective, where the methods are however limited for the requirement of training sample size (i.e., not successful enough for small data size). Using small sample size, we design deep feed forward denoising convolutional neural networks by studying the model in deep framework, learning approach and regularization approach for medical image denoising. More specifically, we use residual learning as a learning approach and batch normalization as regularization in the deep model. Unlike most of the other image denoising approaches which directly learn the latent clean images, the residual learning approach learns the noise from the noisy images instead of the latent clean images where the denoised images are obtained by subtracting the learned residual from the noisy image. Moreover, batch normalization is integrated with residual learning to improve model learning accuracy and training time. We compute the quality of the reconstructed or denoised image in standard image quality metrics, peak signal to noise ratio and structural similarity and compare our model performance with some medical image denoising techniques. Experimental results reveal that our approach has better performance than some other methods.
Natural Language processing (NLP) applications have difficulties in dealing with automatically transcribed spoken documents recorded in noisy conditions, due to high Word Error Rates (WER), or in dealing with textual ...
详细信息
Natural Language processing (NLP) applications have difficulties in dealing with automatically transcribed spoken documents recorded in noisy conditions, due to high Word Error Rates (WER), or in dealing with textual documents from the Internet, such as forums or micro-blogs, due to misspelled or truncated words, bad grammatical form... To improve the robustness against document errors, hitherto-proposed methods map these noisy documents in a latent space such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. In comparison to LDA, the AT model considers not only the document content (words), but also the class related to the document. In addition to these high-level representation models, an original compact representation, called c-vector, has recently been introduced avoid the tricky choice of the number of latent topics in these topic-based representations. The main drawback in the c-vector space building process is the number of sub-tasks required. Recently, we proposed both improving the performance of this c-vector compact representation of spoken documents and reducing the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Subspace" (LTS). This paper goes further by comparing the original LTS-based representation with the c-vector technique as well as with the state-of-the-art compression approach based on neural networks Encoder-Decoder (Autoencoder) and classification methods called deep neural networks (DNN) and long short-term memory (LSTM), on two classification tasks using noisy documents taking the form of speech conversations but also with textual documents from the 20-Newsgroups corpus. Results show that the original LTS representation outperforms the best previous compact representations with a substantial gain of more than 2.1 and 3.3 points in terms of correctly labeled documents compared to c-vector and Autoencode
Given the increase in the consumption of packaging and plastic products that generate great amounts of municipal waste, efforts are needed to recycle these materials correctly. For the process to be accessible and env...
详细信息
Given the increase in the consumption of packaging and plastic products that generate great amounts of municipal waste, efforts are needed to recycle these materials correctly. For the process to be accessible and environmentally friendly, it is crucial to separate the different types of resins before feeding the recycling chain. This classification is usually carried out manually by the employees of the cooperatives. However, most of the time, it is difficult to identify the separated plastic pieces due to the lack of the symbology of the resin used. The plastic ends up having an inappropriate destination. All usual techniques, such as density and burning tests, and others more refined, require specific equipment, such as Fourier transform infrared spectrometry (FTIR), magnetic resonance imaging (MRI), and image and color identification. Standard methods may fail to provide flexibility, and sophisticated ones may be expensive. The present study proposes to establish a methodology to separate the types of plastic through sound when they are crushed. Audio signals and sound waves differ from one resin to another. Therefore, the audio signals from the crumpled plastic samples of each category were recorded using a smartphone, to create a database. The audio data characteristics were extracted using the Mel-Frequency Cepstral Coefficients (MFCC) technique. Two types of neural networks were used for the classification of the coefficients: the convolutional neural network (CNN) and the recurrent network Long short-term memory (LSTM). The performance of both networks was analyzed through the metrics, accuracy, loss, confusion matrix, and inserting new data not seen in the training to observe if the class correctly classified. The LSTM network performed better than the CNN network, reaching an accuracy of 85% due to working with sequential data. However, when inserting new data in the data set crumpled more slowly than the training data, simulating an unfavorable condition
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their ...
详细信息
ISBN:
(纸本)9781538662496
In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Previous neural network based approaches to video denoising have been unsuccessful as their performance cannot compete with the performance of patch-based methods. However, our approach outperforms other patch-based competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as a small memory footprint, and the ability to handle a wide range of noise levels with a single network model. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications. We compare our method with different state-of-art algorithms, both visually and with respect to objective quality metrics. The experiments show that our algorithm compares favorably to other state-of-art methods.
Cervical cancer is the fourth most common gynecological malignant cancer in the world. It presents one of the principal causes of cancer death in women. Treatment planning depends on the cancer stage (tumor size, noda...
详细信息
ISBN:
(纸本)9781665414937;9781665430579
Cervical cancer is the fourth most common gynecological malignant cancer in the world. It presents one of the principal causes of cancer death in women. Treatment planning depends on the cancer stage (tumor size, nodal status, and local extension), which can be identified using magnetic resonance imaging. For effective cancer diagnosis and prognosis, automated segmentation methods of cervical tumor are highly desired as they can alleviate the burden of manual segmentation. However, typical automatic segmentation methods, including deep learning methods (e.g., Convolutional neural Networks - CNN), might fail due to intensity inhomogeneity, poor contrast, and noise present in medical images. As a solution, the performance of such methods can be boosted by using an automated image preprocessing framework. This paper first proposes an ensemble preprocessing method to improve the performance of a CNN for cervical cancer segmentation. Specifically, we propose a histogram-based, smoothing and sharpening-based, and morphological imageprocessingmethods. Then, we devise three CNNs with the same architecture. Each CNN is trained independently using each of the three proposed preprocessed datasets. For evaluation, we used leave-one-out cross-validation, where a left-out testing image sample passes through each CNN to output a probability segmentation map. Ultimately, through applying majority voting to the three outputted segmentation maps, we get the final label map. Our method significantly ( ) outperformed benchmark methods with a classification accuracy increasing from 74.1% by single CNN with no-processing to 76.8% by applying the majority voting to the three CNNs with different preprocessingmethods.
Edge detection for dermoscopic images has always been a crucial task for automatic lesion delineation processes. A skin lesion is an area of the skin that takes the form an abnormal growth or appearance when compared ...
详细信息
Edge detection for dermoscopic images has always been a crucial task for automatic lesion delineation processes. A skin lesion is an area of the skin that takes the form an abnormal growth or appearance when compared to the skin surrounding it. The abnormal appearance is the colored area of the skin that is advised for urgent referral and treatment. The manual way of diagnosing the disease is time-consuming and not quantifiable. However, computer-aided diagnosis (CADx)-based treatment can provide aid to manual delineation by the experts in diagnosing the disease with more proficiency. To advance the digital process of segmentation, a deep learning-based end-to-end framework is proposed for automatic dermoscopic image segmentation. The framework has the modified form of U-Net, which effectively uses Group Normalization (GN) in the encoder and the decoder layers. Attention Gates (AG) focusing on minute details in the skip connection later incorporates with Tversky Loss (TL) as the output loss function are added. Instead of Batch Normalization (BN), GN is used to extract the feature maps generated by the encoding path efficiently. To distinguish high dimensional information from low-level irrelevant background regions in the input image, AGs are used. Tversky Index (TI)-based TL is applied to accomplish better alliance between recall and precision. To further strengthen feature propagation and encourage feature reuse, atrous convolutions are applied in the connecting bridge between the encoder path and the decoder path of the network. The proposed model is evaluated on the ISIC 2018 image dataset, outshone the state-of-the-art segmentation methods.
methods of estimating heart rate without the use of sensor devices provides essential benefits in both the medical field as well as the other computing applications. Smartphones are the handiest devices available to e...
详细信息
ISBN:
(纸本)9781665429825
methods of estimating heart rate without the use of sensor devices provides essential benefits in both the medical field as well as the other computing applications. Smartphones are the handiest devices available to everyone today. By using videos of fingertip captured with smartphone camera, heart rate (HR) can be estimated using the photoplethysmography (PPG) technique. It is based on tracking subtle color changes on the skin owing to cardiovascular activities. These color changes are invisible to the human eye but can be detected by digital cameras. The method is divided into three main steps: first, reading the video frames and processing them to obtain the PPG data, next, extracting the Blood Volume Pulse (BVP) signal, and finally, estimating the HR from the signal. In this project, the color intensity of the skin pixels is used, and filters are applied to eliminate the noise and retain only the pulses of interest. The extracted signal is fed into a convolutional regression neural network which outputs the estimated HR. The results obtained are compared with the ground truth HR obtained by using a contact PPG sensor. We obtained a Mean Absolute Error (MAE) of 7.01 beats per minute (bpm) and an error percentage of 8.3% on test data.
暂无评论