A problem of image denoising, when images are corrupted by a non-stationary noise, is considered in this paper. Since, in practice, no a priori information on noise is available, noise statistics should be pre-estimat...
详细信息
A problem of image denoising, when images are corrupted by a non-stationary noise, is considered in this paper. Since, in practice, no a priori information on noise is available, noise statistics should be pre-estimated prior to image denoising. In this paper, deep convolutional neural network (CNN) based method for estimation of a map of local, patch-wise, standard deviations of noise (so-called sigma-map) is proposed. It achieves the state-of-the-art performance in accuracy of estimation of sigma-map for the case of non-stationary noise, as well as estimation of a noise variance for the case of an additive white Gaussian noise. Extensive experiments on image denoising using estimated sigma-maps demonstrate that our method outperforms recent CNN-based blind image denoising methods by up to 6 dB in PSNR, as well as other state-of-the-art methods based on sigma-map estimation by up to 0.5 dB, providing, at the same time, better usage flexibility. A comparison with the ideal case, when denoising is applied using ground-truth sigma-map, shows that a difference of corresponding PSNR values for the most of noise levels is within 0.1-0.2 dB, and does not exceed 0.6 dB.
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to eluc...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to elucidate their decision-making processes. The rise in popularity of XAI has led to the advent of different strategies to produce explanations, all of which only occasionally agree. Thus several objective evaluation metrics have been devised to decide which of thesemodules give the best explanation for specific scenarios. The goal of the paper is twofold: (i) we employ the notions of necessity and sufficiency from causal literature to come up with a novel explanatory technique called SHifted Adversaries using Pixel Elimination(SHAPE) which satisfies all the theoretical and mathematical criteria of being a valid explanation, (ii) we show that SHAPE is, infact, an adversarial explanation that fools causal metrics that are employed to measure the robustness and reliability of popular importance based visual XAI methods. Our analysis shows that SHAPE outperforms popular explanatory techniques like GradCAM and GradCAM++ in these tests and is comparable to RISE, raising questions about the sanity of these metrics and the need for human involvement for an overall better evaluation.
Data-enabled predictive control (DeePC) for linear systems utilizes data matrices of recorded trajectories to directly predict new system trajectories, which is very appealing for real-life applications. In this paper...
详细信息
Data-enabled predictive control (DeePC) for linear systems utilizes data matrices of recorded trajectories to directly predict new system trajectories, which is very appealing for real-life applications. In this paper we leverage the universal approximation properties of neural networks (NNs) to develop neural DeePC algorithms for nonlinear systems. Firstly, we point out that the outputs of the last hidden layer of a deep NN implicitly construct a basis in a so-called neural (feature) space, while the output linear layer performs affine interpolation in the neural space. As such, we can train of-line a deep NN using large data sets of trajectories to learn the neural basis and compute on-line a suitable affine interpolation using DeePC. Secondly, methods for guaranteeing consistency of neural DeePC and for reducing computational complexity are developed. Several neural DeePC formulations are illustrated on a nonlinear pendulum example. Copyright (c) 2024 The Authors.
image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks ...
详细信息
image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs. (c) 2022 Published by Elsevier B.V.
Cervical cancer is a common type of tumor that occurs in the cervix. The cervical cells in the cervix contain millions of cells with various orientations and overlaps. It is an extensive process to segment and annotat...
详细信息
Cervical cancer is a common type of tumor that occurs in the cervix. The cervical cells in the cervix contain millions of cells with various orientations and overlaps. It is an extensive process to segment and annotate the cytoplasm and nuclei from the unsegmented cell images for better classification. In this paper, we propose an automated computerized system to classify unsegmented cervical cell images, which is achieved by using convolutional neural networks (CNN) and vision transformer (ViT) models. CNN automatically learns the spatial hierarchy of features, improving medical image classification. ViT captures long-range dependencies in extensive image recognition applications with a sophisticated encoder and global self-attention mechanisms. A novel cervix feature fusion method (CFF) that fuses the features of the pre-trained DenseNet201 and vision transformer: shifted patch tokenization (SPT) and locality self-attention (LSA) models. This fusion helps to get both local and global features from the cervical cell images. The fuzzy feature selection (FFS) method is used to select discriminative features from the fused feature vector for better classification of the cell abnormalities. The proposed method uses unsegmented cervical cell images from the publicly available SIPaKMeD dataset. The accuracy of the proposed model achieved 96.13% greater accuracy than the state-of-the-art methods despite having a smaller dataset for unsegmented cervical cell images.
Single-angle plane wave has a huge potential in ultrasound high frame rate imaging, which, however, has a number of difficulties, such as low imaging quality and poor segmentation results. To overcome these difficulti...
详细信息
Single-angle plane wave has a huge potential in ultrasound high frame rate imaging, which, however, has a number of difficulties, such as low imaging quality and poor segmentation results. To overcome these difficulties, an end-to-end convolutional neural network (CNN) structure from single-angle channel data was proposed to segment images in this article. The network removed the traditional beamforming process and used raw radio frequency (RF) data as input to directly obtain segmented image. The signal features at each depth were extracted and concatenated to obtain the feature map by a special depth signal extraction module, and the feature map was then put into the residual encoder and decoder to obtain the output. A simulated hypoechoic cysts dataset of 2000 and an actual industrial defect dataset of 900 were used for training separately. Good results have been achieved in both simulated medical cysts segmentation and actual industrial defects segmentation. Experiments were conducted on both datasets with phase array sparse element data as input, and segmentation results were obtained for both. On the whole, this work achieved better quality segmented images with shorter processing time from single-angle plane wave channel data using CNNs;compared with other methods, our network has been greatly improved in intersection over union (IOU), F1 score, and processing time. Also, it indicated that the feasibility of applying deep learning in image segmentation can be improved using phase array sparse element data as input.
Multi -center cervical cytology images have various image styles due to the differences in staining and imaging techniques, which pose a significant challenge to the performance of automated cervical cancer diagnosis ...
详细信息
Multi -center cervical cytology images have various image styles due to the differences in staining and imaging techniques, which pose a significant challenge to the performance of automated cervical cancer diagnosis tools. We propose a dual -head network architecture that explicitly disentangles image features into content and style features, and applies contrastive self -supervised learning to a large number of unlabeled images, achieving enhanced generalization across various styles. We pretrain our model on 1,024,855 images cropped from 3,561 whole slide images (WSIs), and visualize the features using t -distributed stochastic neighbor embedding (t-SNE) method, demonstrating the effectiveness of our method in distinguishing between content and style features. In the downstream task, we evaluate our model on 192,123 binary -classified images with 10 styles, and achieve the best accuracy among all methods for every style. Across the 10 different data sources, our method attained an average accuracy of 80.4%, outperforming all other comparative methods by 3% to 17%, demonstrating our method's potential to enhance the performance and robustness of automated cytology image analysis in multi -center settings.
The development of smart homes, equipped with devices connected to the Internet of Things (IoT), has opened up new possibilities to monitor and control energy consumption. In this context, non-intrusive load monitorin...
详细信息
The development of smart homes, equipped with devices connected to the Internet of Things (IoT), has opened up new possibilities to monitor and control energy consumption. In this context, non-intrusive load monitoring (NILM) techniques have emerged as a promising solution for the disaggregation of total energy consumption into the consumption of individual appliances. The classification of electrical appliances in a smart home remains a challenging task for machine learning algorithms. In the present study, we propose comparing and evaluating the performance of two different algorithms, namely Multi-Label K-Nearest Neighbors (MLkNN) and Convolutional neural Networks (CNN), for NILM in two different scenarios: without and with data augmentation (DAUG). Our results show how the classification results can be better interpreted by generating a scalogram image from the power consumption signal data and processing it with CNNs. The results indicate that the CNN model with the proposed data augmentation performed significantly higher, obtaining a mean F1-score of 0.484 (an improvement of +0.234), better than the other methods. Additionally, after performing the Friedman statistical test, it indicates that it is significantly different from the other methods compared. Our proposed system can potentially reduce energy waste and promote more sustainable energy use in homes and buildings by providing personalized feedback and energy savings tips.
Self-knowledge distillation does not require a pre-trained teacher network like traditional knowledge distillation. Existing methods either require additional parameters or require additional memory consumption. To al...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Self-knowledge distillation does not require a pre-trained teacher network like traditional knowledge distillation. Existing methods either require additional parameters or require additional memory consumption. To alleviate this problem, this paper proposes a more efficient self-knowledge distillation method, named LRMS (learning from role-model samples). In every mini-batch, LRMS selects out a rolemodel sample for each sampled category, and takes its prediction as the proxy semantic for the corresponding category. Then, predictions of the other samples are constrained to be consistent with the proxy semantics, which makes the distribution of predictions for samples within the same category more compact. Meanwhile, the regularization targets corresponding to proxy semantics are set with a higher distillation temperature to better utilize the classificatory information about the categories. Experimental results show that diverse architectures achieve improvements on four image classification datasets by using LRMS. Code is acaliable: https://***/KAI1179/LRMS
We study how to represent a video with implicit neural representations (INRs). Classical INRs methods generally utilize MLPs to map input coordinates to output pixels. While some recent works have tried to directly re...
详细信息
ISBN:
(纸本)9781728198354
We study how to represent a video with implicit neural representations (INRs). Classical INRs methods generally utilize MLPs to map input coordinates to output pixels. While some recent works have tried to directly reconstruct the whole image with CNNs. However, we argue that both the above pixel-wise and image-wise strategies are not favorable to video data. Instead, we propose a patch-wise solution, PS-NeRV, which represents videos as a function of patches and the corresponding patch coordinate. It naturally inherits the advantages of image-wise methods, and achieves excellent reconstruction performance with fast decoding speed. The whole method includes conventional modules, like positional embedding, MLPs and CNNs. We also introduce AdaIN to enhance intermediate features. Extensive experiments have demonstrated its effectiveness in several video-related tasks, such as video compression and video inpainting.
暂无评论