检索结果-内蒙古大学图书馆

32nd European signal processing Conference (EUSIPCO)

作者： Kojima, Hayate Higashi, Hiroshi Tanaka, Yuichi Tokyo Univ Agr & Technol Tokyo Japan Osaka Univ Osaka Japan

ISBN: (纸本)9789464593617;9798331519773

In this paper, we propose an interpretable denoising method for graph signals using regularization by denoising (RED). RED is a technique developed for image restoration that uses an efficient (and sometimes black-box) denoiser in the regularization term of the optimization problem. By using RED, optimization problems can be designed with the explicit use of the denoiser, and the gradient of the regularization term can easily be computed under mild conditions. We adapt RED for denoising of graph signals beyond image processing. We show that many graph signal denoisers, including graph neural networks, theoretically or practically satisfy the conditions for RED. As a result, we can use various high-performance graph signal denoisers for regularization that are expected to improve restoration qualities. We further reveal the effectiveness of RED from a graph filter perspective. Denoising experiments for the synthetic and 3D point cloud datasets show that our proposed method improved the signal denoising accuracy in MSE compared to existing graph signal denoising methods.

关键词： Graph signal processing regularization by denoising

来源：评论

学校读者我要写书评

暂无评论

Bayesian Inference for Non-Linear Forward Model by Using a VAE-Based neural Network Structure

引用

IEEE TRANSACTIONS ON signal processing 2024年 72卷 1400-1411页

作者： Zhang, Yechuan Zheng, Jian-Qing Chappell, Michael Univ Oxford Inst Biomed Engn Dept Engn Sci Oxford OX1 3PJ England Univ Oxford Kennedy Inst Rheumatol Nuffield Dept Orthopaed Rheumatol & Musculoskeleta Oxford OX3 7FY England Univ Nottingham Sir Peter Mansfield Imaging Ctr Sch Med Nottingham NG7 2RD England Univ Nottingham Sch Med Mental Hlth & Clin Neurosci Nottingham NG7 2RD England Univ Oxford Wellcome Ctr Integrat Neuroimaging Nuffield Dept Clin Neurosci FMRIB Oxford OX3 9DU England

In this paper, a Variational Autoencoder (VAE) based framework is introduced to solve parameter estimation problems for non-linear forward models. In particular, we focus on applications in the field of medical imaging where many thousands of model-based inference analyses might be required to populate a single parametric map. We adopt the concept from Variational Bayes (VB) of using an approximate representation of the posterior, and the concept from the VAE of using the latent space representation to encode the parameters of a forward model. Our work develops the idea of mapping between time-series data and latent parameters using a neural network in variational way. A loss function that differs from the classic VAE formulation and a new sampling strategy are proposed to enable uncertainty estimation as part of the forward model inference. The VAE-based structure is evaluated using simulation experiments on a simple example and two perfusion MRI forward models. Compared with analytical VB (aVB) and Markov Chain Monte Carlo (MCMC), our VAE-based model achieves comparable accuracy, and hundredfold improvement in computational time (100ms/image). We believe this VAE-like framework can be generalized to imaging modularities with higher complexity and thus benefit clinical adoption where otherwise long processing time associated with conventional inference methods is prohibitive.

关键词： neural networks Computational modeling Bayes methods Data models Estimation Biomedical imaging Optimization Variational autoencoder variational Bayes parameter estimation problems medical imaging

来源：评论

学校读者我要写书评

暂无评论

SUPERVISED image SEGMENTATION FOR HIGH DYNAMIC RANGE IMAGING

SUPERVISED IMAGE SEGMENTATION FOR HIGH DYNAMIC RANGE IMAGING

引用

IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Omrani, Ali Reza Moroni, Davide Natl Res Council Italy Inst Informat Sci & Technol ISTI Pisa Italy Univ Campus Biomed Roma Dept Engn Rome Italy

ISBN: (纸本)9798350302615

Regular cameras and cell phones are able to capture limited luminosity. In terms of quality, most of the produced images by such devices are not similar to the real world. Various methods, which fall under the name of High Dynamic Range (HDR) Imaging, can be utilised to cope with this problem and produce an image with more details. However, most methods for generating an HDR image from Multi-Exposure images only focus on how to combine different exposures and do not consider the choice the best details of each image. By convers, in this research it is strived to detect the most visible areas of each image with the help of image segmentation. Two methods of producing the Ground Truth are considered, as manual and Otsu thresholding, and two similar neural networks are used to train segment these areas. Finally, it is shown that the neural network is able to segment the visible parts of pictures acceptably.

关键词： image Segmentation Otsu Threshold Multi-Exposure High Dynamic Range Deep Learning

来源：评论

学校读者我要写书评

暂无评论

DCT2net: An Interpretable Shallow CNN for image Denoising

引用

IEEE TRANSACTIONS ON image processing 2022年 31卷 4292-4305页

作者： Herbreteau, Sebastien Kervrann, Charles Inria Rennes Bretagne Atlantique F-35042 Rennes France PSL Res Univ UMR144 Insitut Curie CNRS F-75005 Paris France

This work tackles the issue of noise removal from images, focusing on the well-known DCT image denoising algorithm. The latter, stemming from signal processing, has been well studied over the years. Though very simple, it is still used in crucial parts of state-of-the-art "traditional" denoising algorithms such as BM3D. For a few years however, deep convolutional neural networks (CNN), especially DnCNN, have outperformed their traditional counterparts, making signal processing methods less attractive. In this paper, we demonstrate that a DCT denoiser can be seen as a shallow CNN and thereby its original linear transform can be tuned through gradient descent in a supervised manner, improving considerably its performance. This gives birth to a fully interpretable CNN called DCT2net. To deal with remaining artifacts induced by DCT2net, an original hybrid solution between DCT and DCT2net is proposed combining the best that these two methods can offer;DCT2net is selected to process non-stationary image patches while DCT is optimal for piecewise smooth patches. Experiments on artificially noisy images demonstrate that two-layer DCT2net provides comparable results to BM3D and is as fast as DnCNN algorithm.

关键词： Discrete cosine transforms Noise reduction Convolutional neural networks Transforms Kernel Convolution signal processing algorithms Convolutional neural network image denoising Canny edge detector artifact removal

来源：评论

学校读者我要写书评

暂无评论

Radar signal processing and Its Impact on Deep Learning-Driven Human Activity Recognition

引用

SENSORS 2025年第3期25卷 724-724页

作者： Ayaz, Fahad Alhumaily, Basim Hussain, Sajjad Imran, Muhammad Ali Arshad, Kamran Assaleh, Khaled Zoha, Ahmed Univ Glasgow James Watt Sch Engn Glasgow City G12 8QQ England Ajman Univ Coll Engn & Informat Technol Dept Elect & Comp Engn POB 346 Ajman U Arab Emirates Ajman Univ Artificial Intelligence Res Ctr POB 346 Ajman 346 U Arab Emirates

Human activity recognition (HAR) using radar technology is becoming increasingly valuable for applications in areas such as smart security systems, healthcare monitoring, and interactive computing. This study investigates the integration of convolutional neural networks (CNNs) with conventional radar signal processing methods to improve the accuracy and efficiency of HAR. Three distinct, two-dimensional radar processing techniques, specifically range-fast Fourier transform (FFT)-based time-range maps, time-Doppler-based short-time Fourier transform (STFT) maps, and smoothed pseudo-Wigner-Ville distribution (SPWVD) maps, are evaluated in combination with four state-of-the-art CNN architectures: VGG-16, VGG-19, ResNet-50, and MobileNetV2. This study positions radar-generated maps as a form of visual data, bridging radar signal processing and image representation domains while ensuring privacy in sensitive applications. In total, twelve CNN and preprocessing configurations are analyzed, focusing on the trade-offs between preprocessing complexity and recognition accuracy, all of which are essential for real-time applications. Among these results, MobileNetV2, combined with STFT preprocessing, showed an ideal balance, achieving high computational efficiency and an accuracy rate of 96.30%, with a spectrogram generation time of 220 ms and an inference time of 2.57 ms per sample. The comprehensive evaluation underscores the importance of interpretable visual features for resource-constrained environments, expanding the applicability of radar-based HAR systems to domains such as augmented reality, autonomous systems, and edge computing.

关键词： human activity classification radar domain representations deep learning computational cost transfer learning

来源：评论

学校读者我要写书评

暂无评论

Dense Depth-Guided Generalizable NeRF

引用

IEEE signal processing LETTERS 2023年 30卷 75-79页

作者： Lee, Dongwoo Lee, Kyoung Mu Seoul Natl Univ Dept Elect Engn & Comp Sci ASRI Seoul South Korea Seoul Natl Univ Dept Elect Engn & Comp Sci ASRI Seoul South Korea

neural rendering approaches enable photo-realistic rendering on novel view synthesis tasks while their per-scene optimization remains an issue for scalability. Recent methods introduce novel neural radiance field (NeRF) frameworks that generalize to unseen scenes on-the-fly by combining multi-view stereo with differentiable volume rendering. These generalizable NeRF methods synthesize the colors of 3D ray points by learning the consistency of image features projected from given nearby views. Since the consistency is computed on the 2D projected image space, it is vulnerable to occlusion and local shape variation by viewing direction. To solve this problem, we present dense depth-guided generalizable NeRF that leverages the depth as the signed distance between the ray point and the object surface of the scene. We first generate the dense depth maps from sparse 3D points of structure from motion (SfM) which is an inevitable step to obtain camera poses. Next, the dense depth maps are exploited as complementary features invariant to the sparsity of nearby views and mask for occlusion handling. Experiments demonstrate that our approach outperforms existing generalizable NeRF methods for widely used real and synthetic datasets.

关键词： Three-dimensional displays Rendering (computer graphics) image color analysis Cameras Feature extraction Training Optimization Deep learning neural radiance field novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

Channel Pyramidal Transformer Network for Single image Deraining

引用

IEEE signal processing LETTERS 2023年 30卷 1757-1761页

作者： Xu, Yifei Long, Zourong Tang, Bin Lei, Siyue Chongqing Univ Technol Chongqing 400054 Peoples R China

In recent years, Convolutional neural Networks (CNNs) and Visual Transformers have shown remarkable performance in image deraining tasks. However, these state-of-the-art (SOTA) methods exhibit high computational costs in addition to excellent performance. This would hinder the analytical comparison of methods and limit their practical application. We argue that the high computational cost mainly stems from the explosion in the number of parameters due to the surge of feature dimensions. To achieve better results with fewer parameters. By reconstructing the multi-head attention mechanism and feed-forward network, we propose a multi-scale hierarchical Transformer network with a change of width resembling a pyramid, called CPTransNet. The key idea of CPTransNet is to slowly increase the feature dimension during the feature extraction process. This avoids parameter wastage due to feature dimension surge. CPTransNet achieves 33.25 dB PSNR on the classical dataset of image deraining, exceeding the previous state-of-the-art 0.22 dB PSNR with only 19.4% of its computational cost.

关键词： image restoration channels pyramid encoder-decoder structure

来源：评论

学校读者我要写书评

暂无评论

Test Automation for Symbol Recognition on the Map 31

Test Automation for Symbol Recognition on the Map

引用

31st IEEE Conference on signal processing and Communications Applications (SIU)

作者： Turhan, Fatmanur Carkacioglu, Levent Toreyin, Behcet Ugur Aselsan AS Ankara Turkiye Istanbul Tech Univ Bilisim Enstitusu Istanbul Turkiye

ISBN: (纸本)9798350343557

In this study, various machine learning and image analysis approaches such as Template Matching, HOG, SVM, Faster RCNN and YOLO are examined and compared for the symbol recognition problem in color maps. Some difficulties were identified regarding the forms of the symbols, the complexity of the maps or the placement of the symbols on the map. Observations about the success or failure of the methods against the difficulties defined according to the experiments are presented. It has been observed that methods involving artificial neural networks are more successful when performing symbol recognition on color maps. The highest result was obtained with Faster RCNN as 91%.

关键词： Symbol Recognition Feature Extraction Support Vector Machines Template Matching Convolutional neural Network Object Detection Software Testing

来源：评论

学校读者我要写书评

暂无评论

Speech emotion recognition based on multimodal and multiscale feature fusion

引用

signal image AND VIDEO processing 2025年第1期19卷 1-9页

作者： Hu, Huangshui Wei, Jie Sun, Hongyu Wang, Chuhang Tao, Shuo Changchun Univ Technol Coll Comp Sci & Engn Changchun Peoples R China Changchun Normal Univ Coll Comp Sci & Technol Changchun Peoples R China

Conventional feature extraction methods for speech emotion recognition often suffer from unidimensionality and inadequacy in capturing the full range of emotional cues, limiting their effectiveness. To address these challenges, this paper introduces a novel network model named Multi-Modal Speech Emotion Recognition Network (MMSERNet). This model leverages the power of multimodal and multiscale feature fusion to significantly enhance the accuracy of speech emotion recognition. MMSERNet is composed of three specialized sub-networks, each dedicated to the extraction of distinct feature types: cepstral coefficients, spectrogram features, and textual features. It integrates audio features derived from Mel-frequency cepstral coefficients and Mel spectrograms with textual features obtained from word vectors, thereby creating a rich, comprehensive representation of emotional content. The fusion of these diverse feature sets facilitates a robust multimodal approach to emotion recognition. Extensive empirical evaluations of the MMSERNet model on benchmark datasets such as IEMOCAP and MELD demonstrate not only significant improvements in recognition accuracy but also an efficient use of model parameters, ensuring scalability and practical applicability.

关键词： Speech emotion recognition Feature extraction Dilated convolutional neural network Feature fusion

来源：评论

学校读者我要写书评

暂无评论

Enhanced synthetic aperture radar image autofocus and classification using 2D SARNet framework

引用

JOURNAL OF APPLIED REMOTE SENSING 2024年第2期18卷

作者： Sakr, Mohamed Saleh, Ahmed AbdElkader, Fathy Amer, Ghada AboElenean, Mohamed Mil Tech Coll Elect Engn Cairo Egypt October 6 Univ Fac Informat & Comp Sci Giza Egypt MUST Univ Fac Engn Dept Elect Engn Giza Egypt

A synthetic aperture radar (SAR) system is a notable source of information, recognized for its capability to operate day and night and in all weather conditions, making it essential for various applications. SAR image formation is a pivotal step in radar imaging, essential for transforming complex raw radar data into interpretable and utilizable imagery. Nowadays, advancements in SAR sensor design, resulting in very wide swaths, generate a massive volume of data, necessitating extensive processing. Traditional methods of SAR image formation often involve resource-intensive and time-consuming postprocessing. There is a vital need to automate this process in near-real-time, enabling fast responses for various applications, including image classification and object detection. We present an SAR processing pipeline comprising a complex 2D autofocus SARNet, followed by a CNN-based classification model. The complex 2D autofocus SARNet is employed for image formation, utilizing an encoder-decoder architecture, such as U-Net and a modified version of ResU-Net. Meanwhile, the image classification task is accomplished using a CNN-based classification model. This framework allows us to obtain near real-time results, specifically for quick image viewing and scene classification. Several experiments were conducted using real-SAR raw data collected by the European remote sensing satellite to validate the proposed pipeline. The performance evaluation of the processing pipeline is conducted through visual assessment as well as quantitative assessment using standard metrics, such as the structural similarity index and the peak-signal-to-noise ratio. The experimental results demonstrate the processing pipeline's robustness, efficiency, reliability, and responsivity in providing an integrated neural network-based SAR processing pipeline.

关键词： synthetic aperture radar (SAR) deep learning CNN-based model ResU-Net SAR autofocus

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：