In this paper, we propose an interpretable denoising method for graph signals using regularization by denoising (RED). RED is a technique developed for image restoration that uses an efficient (and sometimes black-box...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
In this paper, we propose an interpretable denoising method for graph signals using regularization by denoising (RED). RED is a technique developed for image restoration that uses an efficient (and sometimes black-box) denoiser in the regularization term of the optimization problem. By using RED, optimization problems can be designed with the explicit use of the denoiser, and the gradient of the regularization term can easily be computed under mild conditions. We adapt RED for denoising of graph signals beyond imageprocessing. We show that many graph signal denoisers, including graph neural networks, theoretically or practically satisfy the conditions for RED. As a result, we can use various high-performance graph signal denoisers for regularization that are expected to improve restoration qualities. We further reveal the effectiveness of RED from a graph filter perspective. Denoising experiments for the synthetic and 3D point cloud datasets show that our proposed method improved the signal denoising accuracy in MSE compared to existing graph signal denoising methods.
作者:
Zhang, YechuanZheng, Jian-QingChappell, MichaelUniv Oxford
Inst Biomed Engn Dept Engn Sci Oxford OX1 3PJ England Univ Oxford
Kennedy Inst Rheumatol Nuffield Dept Orthopaed Rheumatol & Musculoskeleta Oxford OX3 7FY England Univ Nottingham
Sir Peter Mansfield Imaging Ctr Sch Med Nottingham NG7 2RD England Univ Nottingham
Sch Med Mental Hlth & Clin Neurosci Nottingham NG7 2RD England Univ Oxford
Wellcome Ctr Integrat Neuroimaging Nuffield Dept Clin Neurosci FMRIB Oxford OX3 9DU England
In this paper, a Variational Autoencoder (VAE) based framework is introduced to solve parameter estimation problems for non-linear forward models. In particular, we focus on applications in the field of medical imagin...
详细信息
In this paper, a Variational Autoencoder (VAE) based framework is introduced to solve parameter estimation problems for non-linear forward models. In particular, we focus on applications in the field of medical imaging where many thousands of model-based inference analyses might be required to populate a single parametric map. We adopt the concept from Variational Bayes (VB) of using an approximate representation of the posterior, and the concept from the VAE of using the latent space representation to encode the parameters of a forward model. Our work develops the idea of mapping between time-series data and latent parameters using a neural network in variational way. A loss function that differs from the classic VAE formulation and a new sampling strategy are proposed to enable uncertainty estimation as part of the forward model inference. The VAE-based structure is evaluated using simulation experiments on a simple example and two perfusion MRI forward models. Compared with analytical VB (aVB) and Markov Chain Monte Carlo (MCMC), our VAE-based model achieves comparable accuracy, and hundredfold improvement in computational time (100ms/image). We believe this VAE-like framework can be generalized to imaging modularities with higher complexity and thus benefit clinical adoption where otherwise long processing time associated with conventional inference methods is prohibitive.
Regular cameras and cell phones are able to capture limited luminosity. In terms of quality, most of the produced images by such devices are not similar to the real world. Various methods, which fall under the name of...
详细信息
ISBN:
(纸本)9798350302615
Regular cameras and cell phones are able to capture limited luminosity. In terms of quality, most of the produced images by such devices are not similar to the real world. Various methods, which fall under the name of High Dynamic Range (HDR) Imaging, can be utilised to cope with this problem and produce an image with more details. However, most methods for generating an HDR image from Multi-Exposure images only focus on how to combine different exposures and do not consider the choice the best details of each image. By convers, in this research it is strived to detect the most visible areas of each image with the help of image segmentation. Two methods of producing the Ground Truth are considered, as manual and Otsu thresholding, and two similar neural networks are used to train segment these areas. Finally, it is shown that the neural network is able to segment the visible parts of pictures acceptably.
This work tackles the issue of noise removal from images, focusing on the well-known DCT image denoising algorithm. The latter, stemming from signalprocessing, has been well studied over the years. Though very simple...
详细信息
This work tackles the issue of noise removal from images, focusing on the well-known DCT image denoising algorithm. The latter, stemming from signalprocessing, has been well studied over the years. Though very simple, it is still used in crucial parts of state-of-the-art "traditional" denoising algorithms such as BM3D. For a few years however, deep convolutional neural networks (CNN), especially DnCNN, have outperformed their traditional counterparts, making signalprocessingmethods less attractive. In this paper, we demonstrate that a DCT denoiser can be seen as a shallow CNN and thereby its original linear transform can be tuned through gradient descent in a supervised manner, improving considerably its performance. This gives birth to a fully interpretable CNN called DCT2net. To deal with remaining artifacts induced by DCT2net, an original hybrid solution between DCT and DCT2net is proposed combining the best that these two methods can offer;DCT2net is selected to process non-stationary image patches while DCT is optimal for piecewise smooth patches. Experiments on artificially noisy images demonstrate that two-layer DCT2net provides comparable results to BM3D and is as fast as DnCNN algorithm.
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investig...
详细信息
Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neural networks (CNNs) with conventional radar signalprocessingmethods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signalprocessing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.
neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeR...
详细信息
neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeRF) frameworks that generalize to unseen scenes on-the-fly by combining multi-view stereo with differentiable volume rendering. These generalizable NeRF methods synthesize the colors of 3D ray points by learning the consistency of image features projected from given nearby views. Since the consistency is computed on the 2D projected image space, it is vulnerable to occlusion and local shape variation by viewing direction. To solve this problem, we present dense depth-guided generalizable NeRF that leverages the depth as the signed distance between the ray point and the object surface of the scene. We first generate the dense depth maps from sparse 3D points of structure from motion (SfM) which is an inevitable step to obtain camera poses. Next, the dense depth maps are exploited as complementary features invariant to the sparsity of nearby views and mask for occlusion handling. Experiments demonstrate that our approach outperforms existing generalizable NeRF methods for widely used real and synthetic datasets.
In recent years, Convolutional neural Networks (CNNs) and Visual Transformers have shown remarkable performance in image deraining tasks. However, these state-of-the-art (SOTA) methods exhibit high computational costs...
详细信息
In recent years, Convolutional neural Networks (CNNs) and Visual Transformers have shown remarkable performance in image deraining tasks. However, these state-of-the-art (SOTA) methods exhibit high computational costs in addition to excellent performance. This would hinder the analytical comparison of methods and limit their practical application. We argue that the high computational cost mainly stems from the explosion in the number of parameters due to the surge of feature dimensions. To achieve better results with fewer parameters. By reconstructing the multi-head attention mechanism and feed-forward network, we propose a multi-scale hierarchical Transformer network with a change of width resembling a pyramid, called CPTransNet. The key idea of CPTransNet is to slowly increase the feature dimension during the feature extraction process. This avoids parameter wastage due to feature dimension surge. CPTransNet achieves 33.25 dB PSNR on the classical dataset of image deraining, exceeding the previous state-of-the-art 0.22 dB PSNR with only 19.4% of its computational cost.
In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficul...
详细信息
ISBN:
(纸本)9798350343557
In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficulties were identified regarding the forms of the symbols, the complexity of the maps or the placement of the symbols on the map. Observations about the success or failure of the methods against the difficulties defined according to the experiments are presented. It has been observed that methods involving artificial neural networks are more successful when performing symbol recognition on color maps. The highest result was obtained with Faster RCNN as 91%.
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these c...
详细信息
Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these challenges, this paper introduces a novel network model named Multi-Modal Speech Emotion Recognition Network (MMSERNet). This model leverages the power of multimodal and multiscale feature fusion to significantly enhance the accuracy of speech emotion recognition. MMSERNet is composed of three specialized sub-networks, each dedicated to the extraction of distinct feature types: cepstral coefficients, spectrogram features, and textual features. It integrates audio features derived from Mel-frequency cepstral coefficients and Mel spectrograms with textual features obtained from word vectors, thereby creating a rich, comprehensive representation of emotional content. The fusion of these diverse feature sets facilitates a robust multimodal approach to emotion recognition. Extensive empirical evaluations of the MMSERNet model on benchmark datasets such as IEMOCAP and MELD demonstrate not only significant improvements in recognition accuracy but also an efficient use of model parameters, ensuring scalability and practical applicability.
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image...
详细信息
A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image formation is a pivotal step in radar imaging, essential for transforming complex raw radar data into interpretable and utilizable imagery. Nowadays, advancements in SAR sensor design, resulting in very wide swaths, generate a massive volume of data, necessitating extensive processing. Traditional methods of SAR image formation often involve resource-intensive and time-consuming postprocessing. There is a vital need to automate this process in near-real-time, enabling fast responses for various applications, including image classification and object detection. We present an SAR processing pipeline comprising a complex 2D autofocus SARNet, followed by a CNN-based classification model. The complex 2D autofocus SARNet is employed for image formation, utilizing an encoder-decoder architecture, such as U-Net and a modified version of ResU-Net. Meanwhile, the image classification task is accomplished using a CNN-based classification model. This framework allows us to obtain near real-time results, specifically for quick image viewing and scene classification. Several experiments were conducted using real-SAR raw data collected by the European remote sensing satellite to validate the proposed pipeline. The performance evaluation of the processing pipeline is conducted through visual assessment as well as quantitative assessment using standard metrics, such as the structural similarity index and the peak-signal-to-noise ratio. The experimental results demonstrate the processing pipeline's robustness, efficiency, reliability, and responsivity in providing an integrated neural network-based SAR processing pipeline.
暂无评论