检索结果-内蒙古大学图书馆

Building Ensemble of Deep Networks: Convolutional Networks and Transformers

IEEE ACCESS 2023年 11卷 124962-124974页

作者： Nanni, Loris Loreggia, Andrea Barcellona, Leonardo Ghidoni, Stefano Univ Padua Dept Informat Engn DEI I-35122 Padua Italy Univ Brescia Dept Informat Engn DII I-25123 Brescia Italy

This paper presents a study on an automated system for image classification, which is based on the fusion of various deep learning methods. The study explores how to create an ensemble of different Convolutional neural Network (CNN) models and transformer topologies that are fine-tuned on several datasets to leverage their diversity. The research question addressed in this work is whether different optimization algorithms can help in developing robust and efficient machine learning systems to be used in different domains for classification purposes. To do that, we introduce novel Adam variants. We employed these new approaches, coupled with several CNN topologies, for building an ensemble of classifiers that outperforms both other Adam-based methods and stochastic gradient descent. Additionally, the study combines the ensemble of CNNs with an ensemble of transformers based on different topologies, such as Deit, Vit, Swin, and Coat. To the best of our knowledge, this is the first work in which an in-depth study of a set of transformers and convolutional neural networks in a large set of small/medium-sized images is carried out. The experiments performed on several datasets demonstrate that the combination of such different models results in a substantial performance improvement in all tested problems. All resources are available at https://***/LorisNanni.

关键词： Transformers Convolutional neural networks Optimization Solid modeling Topology image classification Computational modeling neural networks Ensemble learning transformers optimization ensemble

来源：评论

学校读者我要写书评

暂无评论

Physics-Inspired Compressive Sensing: Beyond deep unrolling

引用

IEEE signal processing MAGAZINE 2023年第1期40卷 58-72页

作者： Zhang, Jian Chen, Bin Xiong, Ruiqin Zhang, Yongbing Peking Univ Shenzhen Grad Sch Shenzhen 518055 Peoples R China Peking Univ Comp Applicat Technol Shenzhen Grad Sch Shenzhen 518055 Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China Harbin Inst Technol Shenzhen Comp Sci & Technol Shenzhen 518055 Peoples R China

As an emerging paradigm for signal acquisition and reconstruction, compressive sensing (CS) achieves high-speed sampling and compression jointly and has found its way into many applications. With the fast growth of deep learning in computer vision, various methods of applying neural networks (NNs) in CS imaging tasks have been proposed. One category of them, named the deep unrolling network, is inspired by the physical sampling model and combines the merits of both optimization model- and data-driven methods, becoming the mainstream of this realm. In this review article, we first review the inverse imaging model and optimization algorithms encountered in the CS research and then provide the recent representative developments of CS networks, which are grouped into deep physics-free and physics-inspired approaches with respect to the utilization of sampling matrix and measurement information. Following this, we analyze the conceptual connections and relationships among various existing methods and present our perspectives on recent advances and trends for future research.

关键词： image coding Computational modeling signal processing algorithms Transforms Market research image restoration Task analysis

来源：评论

学校读者我要写书评

暂无评论

Super-Resolution neural Radiance Field via Learning High Frequency Details for High-Fidelity Novel View Synthesis

引用

IEEE signal processing LETTERS 2024年 31卷 466-470页

作者： Lee, Han-nyoung Kim, Hak Gu Chung Ang Univ Dept Image Sci & Arts GSAIM Seoul 06974 South Korea

While neural rendering approaches facilitate photo-realistic rendering in novel view synthesis tasks, the challenge of high-resolution rendering persists due to the substantial costs associated with acquiring and training data. Recently, several studies have been proposed that render high-resolution scenes by either super-sampling points or using reference images, aiming to restore details in low-resolution (LR) images. However, super-sampling is computationally expensive, and methods with reference images require high-resolution (HR) images for inference. In this letter, we propose a novel super-resolution (SR) neural radiance field (NeRF) framework for high-fidelity novel view synthesis. To address the representation of high-fidelity HR images from the captured LR images, we learn a mapping function that maps LR rendering images to the Fourier space to restore insufficient high frequency details and render HR images at higher resolution. Experiments demonstrate that our results are quantitatively and qualitatively better than those of the existing SR methods in novel view synthesis. By visualizing the estimated dominant frequency components, we provide visual interpretations of the performance improvement.

关键词： NeRF novel view synthesis super-resolution

来源：评论

学校读者我要写书评

暂无评论

A Review of Single image Super Resolution Techniques using Convolutional neural Networks

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2024年第10期83卷 29741-29775页

作者： Dixit, Monika Yadav, Ram Narayan Maulana Azad Natl Inst Technol Dept Elect & Commun Engn Bhopal Madhya Pradesh India

Single image Super- Resolution (SISR) is a complex restoration method to recover high-resolution (HR) image from degraded low-resolution (LR) form. SISR is used in many applications, such as microscopic image analysis, medical imaging, security and surveillance, astronomical observation, hyperspectral imaging, and text image super-resolution. Convolutional neural Networks (CNNs) are most widely used technique to solve Super-Resolution (SR) problems. This paper presents review of SISR methods based on CNN. The SISR CNN models are analyzed based on the design and their performance on benchmark datasets: Set 5, Set 14, BSD 100, and Urban 100. Peak signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) are used for quantitative analysis. ESRGAN model shows the best results on all benchmark datasets and reconstructs images with good visual quality at large upscaling factors. The model performs excellently with PSNR 27.03 dB and SSIM 0.8153 on the Urban 100 dataset for x4 upscaling factor. The models are further analyzed on the basis of the loss function, scalability, processing time, and number of parameters. The framework and implementation setup of SISR CNN models are also discussed. Perceptual loss function can help to boost the network performance by increasing the visual quality of the reconstructed images. Hence, it has emerged as a new research trend in recent years. It is also observed that there is tremendous growth in the field of blind or unsupervised SISR. The research has shifted to developing reference less performance evaluation parameters for unsupervised SISR.

关键词： Single image Super-Resolution (SISR) Convolutional neural Networks (CNNs) Microscopic image Analysis Medical Imaging Security and Surveillance Astronomical Observation Hyperspectral Imaging Text image Super-Resolution

来源：评论

学校读者我要写书评

暂无评论

Elastic Supernet with Dynamic Training for JPEG steganalysis

引用

signal processing 2025年 236卷

作者： Li, Qiushi Tan, Shunquan Li, Bin Huang, Jiwu Shenzhen MSU BIT Univ Fac Engn Guangdong Lab Machine Percept & Intelligent Comp Shenzhen 518116 Peoples R China Shenzhen Univ Guangdong Prov Key Lab Intelligent Informat Proc Shenzhen 518060 Peoples R China Shenzhen Univ Shenzhen Key Lab Media Secur Shenzhen 518060 Peoples R China

JPEG is the predominant image format across social networks, serving as a prime cover medium for image steganography. However, previous deep learning models for JPEG steganalysis heavily rely on domain expertise and tedious trial-and-error methods. In this paper, we propose a two-stage neural architecture search scheme for JPEG steganalysis, based on Elastic Supernet with Dynamic Training (ESDT). The method involves constructing a weight-nesting supernet, with the largest subnetwork pretrained on imageNet (a large-scale visual database widely used for pretraining deep learning models) and finetuning for JPEG steganalysis. Based on this pretrained network, we aim to enhance the model's performance in downstream tasks while reducing reliance on domain knowledge. A progressive shrinking strategy is introduced during supernet training to accommodate the need of elastic kernel sizes, depths, and widths. In the final stage, we utilize a performance predictor to identify the optimal subnetwork within the refined supernet. Extensive experiments showcase the method's superiority over state-of-the-art methods in JPEG steganalysis, achieving lower computational costs and superior generalization performance.

关键词： Steganalysis Steganography neural architecture search Automated machine learning

来源：评论

学校读者我要写书评

暂无评论

Noise2Variance: Dual networks with variance constraint for self-supervised real-world image denoising

引用

IET image processing 2024年第12期18卷 3251-3261页

作者： Tan, Hanlin Liu, Yu Zhang, Maojun Natl Univ Def Technol Coll Syst Engn Changsha 410073 Peoples R China

image denoising aims to restore a clean image from a noisy image. Traditional methods utilizing convolutional neural networks (CNN) for denoising are trained using pairs of noisy and clean images to comprehend the transformation from a noisy image to a clean one. However, the acquisition of such image pairs in real-world scenarios presents a challenge. Hence, numerous self-supervised denoising techniques have been developed that do not require clean images for training. This study demonstrates that a straightforward loss design, concentrating on variance, can effectively train a standard CNN denoiser in a self-supervised fashion. A novel theoretical framework is introduced for training a basic CNN denoising model using three constraints: mean, variance, and augmentation. The variance constraint is crucial as it prevents the trained model from converging to trivial solutions such as identity or zero mapping. This theory provides valuable insights for the development of new self-supervised denoising methods. Furthermore, a method that applies this theory to proposed dual networks is developed, which consist of two standard CNN models predicting both the clean image and the noise. This approach enhances model capacity during training while minimizing computational costs during inference. This method exemplifies the implementation of the variance constraint and introduces a data constraint for dual networks. Notably, the proposed method only assumes the presence of additive white noise, irrespective of the noise distribution. This minimal assumption enhances the model's robustness against noise with complex or unknown distributions in real-world distorted images. Experimental results indicate that the proposed Noise2Variance method exhibits commendable performance on peak signal noise ratio and structural similarity metrics compared to existing self-supervised denoising techniques. Visual comparison of results further substantiates the efficacy of the proposed method. A

关键词： convolutional neural nets image denoising unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Design Space Exploration of CNN Accelerators based on GSA Algorithm

Design Space Exploration of CNN Accelerators based on GSA Al...

引用

9th International Conference on signal and image processing (ICSIP)

作者： Xie, Zheren Dai, Kui Wu, Zhilin Wang, Jinyue Lu, Xin Liu, Shuanglong Hunan Normal Univ Key Lab Low Dimens Quantum Struct & Quantum Contr Key Lab Phys & Devices Postmoore Era Coll Hunan Prov Changsha Peoples R China

ISBN: (纸本)9798350350920

Convolutional neural Networks (CNNs) exhibit exceptional performance within the image processing domain. The acceleration of convolutions for CNNs has consistently represented a focal point within machine learning hardware accelerators. However, with the continuous development of CNNs, the design costs and project workloads of hardware accelerators have significantly increased. To enhance accelerator performance while reducing time-related expenses, it is necessary to determine a series of optimal design parameters during the early stages of accelerator design. To achieve this objective, the concept of design space exploration (DSE) for CNN accelerators is proposed. However, as neural networks become increasingly complex, the demands for DSE methods have also grown, rendering the existing methods unsuitable for meeting the real-time requirements of accelerators, and unable to discover the optimal design. In this paper, we introduce a DSE framework based on the Genetic Simulated Annealing (GSA) algorithm. The proposed framework autonomously generates the hardware design parameters such as parallelism degrees based on the resource constraint and CNN model. Our method is evaluated with two typical CNN accelerators. Experimental results show that our method largely improves the DSE efficiency, reducing the exploration time by up to 73.7x when compared to existing DSE methods.

关键词： Convolutional neural Networks (CNNs) Design Space Exploration (DSE) Hardware Accelerator Genetic Simulated Annealing (GSA)

来源：评论

学校读者我要写书评

暂无评论

JOINT DEMOSAICING AND DENOISING WITH DOUBLE DEEP image PRIORS 49

JOINT DEMOSAICING AND DENOISING WITH DOUBLE DEEP IMAGE PRIOR...

引用

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Li, Taihui Lahiri, Anish Dai, Yutong Mayer, Owen Univ Minnesota Comp Sci & Engn Minneapolis MN 55455 USA Sony Corp Amer R&D US Lab San Jose CA USA

ISBN: (纸本)9798350344868;9798350344851

Demosaicing and denoising of RAW images are crucial steps in the image signal processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAWimages and accumulate errors from one step to another. Recent deep neural-network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise. In this paper, we propose a novel joint demosaicing and denoising method, dubbed JDD-DoubleDIP, which operates directly on a single RAW image without requiring any training data. We validate the effectiveness of our method on two popular datasets-Kodak and McMaster-with various noises and noise intensities. The experimental results show that our method consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception.

关键词： image signal processing Deep image Prior RAW images Demosaicing Denoising

来源：评论

学校读者我要写书评

暂无评论

MDCCM: a lightweight multi-scale model for high-accuracy pavement crack detection

引用

signal image AND VIDEO processing 2025年第6期19卷 1-16页

作者： Gu, Zhonglin Li, Tao Xiao, Qiang Chen, Jing Ding, Guangen Ding, Hongwei Yunnan Univ Sch Informat Kunming 650504 Yunnan Peoples R China Yunnan Prov Highway Network Toll Management Co Ltd Kunming 650103 Yunnan Peoples R China

Effective crack detection is vital for pavement safety and durability. In recent years, deep learning methods have achieved promising results in automated crack detection. However, advanced large-scale convolutional neural networks (CNNs) often rely on numerous trainable parameters for deep feature extraction, therefore, these models are computationally expensive, the complexity of these advanced models makes them impractical for deployment on small Internet of Things devices. In this study, we introduce a novel model specifically designed for pavement crack detection, named Multi-Scale and Detail-Attention-based Crack Classification Model, we adopts a novel multi-scale dual-branch structure for effective feature extraction, the focus is on improving the model's ability to perceive local and global information at different semantic scales, using a decoupled attention mechanism to achieve more effective focus on key information. In addition, we introduce a Stem Block to reduce the feature representation dimension, making the model more lightweight. We tested our proposed model on two standard datasets, the experimental results indicate that our model achieves a parameter count of only 0.41 M, while maintaining a crack detection accuracy exceeding 99%. Compared to existing CNN models, our model outperforms current methods in terms of both complexity and detection accuracy. These results demonstrate the proposed model offers superior performance for pavement crack detection, making it highly suitable for practical applications.

关键词： Road surface crack detection Convolutional neural network Attention mechanism Lightweight design

来源：评论

学校读者我要写书评

暂无评论

Object detection in unfavourable weather conditions using CNN-diffusion neural networks

引用

signal image AND VIDEO processing 2025年第7期19卷 1-12页

作者： Madhan, K. Shanmugapriya, N. Dhanalakshmi Srinivasan Univ Dept Comp Sci & Engn Trichy Tamilnadu India

Object detection in unfavourable weather conditions presents significant challenges due to reduced visibility, increased noise, and frequent occlusions, limiting the effectiveness of conventional methods. This paper introduces a novel hybrid model combining Convolutional neural networks (CNNs) with Diffusion neural networks (Diffusion NNs) to address these issues. The proposed model synergistically integrates the feature extraction strengths of CNNs with the robust generative modeling capabilities of Diffusion NNs, enabling enhanced object detection under challenging environmental conditions. The hybrid architecture leverages CNNs to efficiently capture spatial and contextual features, while Diffusion NNs improve robustness by generating refined representations in noisy and incomplete scenarios. This approach is evaluated against state-of-the-art deep learning techniques, including YOLOv5, Faster R-CNN, and Vision Transformers. The proposed model achieves 91.8% accuracy, outperforming existing architectures. It also exhibits superior robustness (89.3%) and computational efficiency (70 FPS), making it a promising solution for real-time applications. These findings highlight the potential of generative enhancements in improving object detection reliability, particularly in adverse conditions. This paper contributes to the growing field of hybrid neural network architectures and their practical implementation for challenging computer vision tasks.

关键词： Object detection Adverse weather Hybrid neural networks Diffusion neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：