This paper presents a study on an automated system for image classification, which is based on the fusion of various deep learning methods. The study explores how to create an ensemble of different Convolutional Neura...
详细信息
This paper presents a study on an automated system for image classification, which is based on the fusion of various deep learning methods. The study explores how to create an ensemble of different Convolutional neural Network (CNN) models and transformer topologies that are fine-tuned on several datasets to leverage their diversity. The research question addressed in this work is whether different optimization algorithms can help in developing robust and efficient machine learning systems to be used in different domains for classification purposes. To do that, we introduce novel Adam variants. We employed these new approaches, coupled with several CNN topologies, for building an ensemble of classifiers that outperforms both other Adam-based methods and stochastic gradient descent. Additionally, the study combines the ensemble of CNNs with an ensemble of transformers based on different topologies, such as Deit, Vit, Swin, and Coat. To the best of our knowledge, this is the first work in which an in-depth study of a set of transformers and convolutional neural networks in a large set of small/medium-sized images is carried out. The experiments performed on several datasets demonstrate that the combination of such different models results in a substantial performance improvement in all tested problems. All resources are available at https://***/LorisNanni.
As an emerging paradigm for signal acquisition and reconstruction, compressive sensing (CS) achieves high-speed sampling and compression jointly and has found its way into many applications. With the fast growth of de...
详细信息
As an emerging paradigm for signal acquisition and reconstruction, compressive sensing (CS) achieves high-speed sampling and compression jointly and has found its way into many applications. With the fast growth of deep learning in computer vision, various methods of applying neural networks (NNs) in CS imaging tasks have been proposed. One category of them, named the deep unrolling network, is inspired by the physical sampling model and combines the merits of both optimization model- and data-driven methods, becoming the mainstream of this realm. In this review article, we first review the inverse imaging model and optimization algorithms encountered in the CS research and then provide the recent representative developments of CS networks, which are grouped into deep physics-free and physics-inspired approaches with respect to the utilization of sampling matrix and measurement information. Following this, we analyze the conceptual connections and relationships among various existing methods and present our perspectives on recent advances and trends for future research.
While neural rendering approaches facilitate photo-realistic rendering in novel view synthesis tasks, the challenge of high-resolution rendering persists due to the substantial costs associated with acquiring and trai...
详细信息
While neural rendering approaches facilitate photo-realistic rendering in novel view synthesis tasks, the challenge of high-resolution rendering persists due to the substantial costs associated with acquiring and training data. Recently, several studies have been proposed that render high-resolution scenes by either super-sampling points or using reference images, aiming to restore details in low-resolution (LR) images. However, super-sampling is computationally expensive, and methods with reference images require high-resolution (HR) images for inference. In this letter, we propose a novel super-resolution (SR) neural radiance field (NeRF) framework for high-fidelity novel view synthesis. To address the representation of high-fidelity HR images from the captured LR images, we learn a mapping function that maps LR rendering images to the Fourier space to restore insufficient high frequency details and render HR images at higher resolution. Experiments demonstrate that our results are quantitatively and qualitatively better than those of the existing SR methods in novel view synthesis. By visualizing the estimated dominant frequency components, we provide visual interpretations of the performance improvement.
Single image Super- Resolution (SISR) is a complex restoration method to recover high-resolution (HR) image from degraded low-resolution (LR) form. SISR is used in many applications, such as microscopic image analysis...
详细信息
Single image Super- Resolution (SISR) is a complex restoration method to recover high-resolution (HR) image from degraded low-resolution (LR) form. SISR is used in many applications, such as microscopic image analysis, medical imaging, security and surveillance, astronomical observation, hyperspectral imaging, and text image super-resolution. Convolutional neural Networks (CNNs) are most widely used technique to solve Super-Resolution (SR) problems. This paper presents review of SISR methods based on CNN. The SISR CNN models are analyzed based on the design and their performance on benchmark datasets: Set 5, Set 14, BSD 100, and Urban 100. Peak signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) are used for quantitative analysis. ESRGAN model shows the best results on all benchmark datasets and reconstructs images with good visual quality at large upscaling factors. The model performs excellently with PSNR 27.03 dB and SSIM 0.8153 on the Urban 100 dataset for x4 upscaling factor. The models are further analyzed on the basis of the loss function, scalability, processing time, and number of parameters. The framework and implementation setup of SISR CNN models are also discussed. Perceptual loss function can help to boost the network performance by increasing the visual quality of the reconstructed images. Hence, it has emerged as a new research trend in recent years. It is also observed that there is tremendous growth in the field of blind or unsupervised SISR. The research has shifted to developing reference less performance evaluation parameters for unsupervised SISR.
JPEG is the predominant image format across social networks, serving as a prime cover medium for image steganography. However, previous deep learning models for JPEG steganalysis heavily rely on domain expertise and t...
详细信息
JPEG is the predominant image format across social networks, serving as a prime cover medium for image steganography. However, previous deep learning models for JPEG steganalysis heavily rely on domain expertise and tedious trial-and-error methods. In this paper, we propose a two-stage neural architecture search scheme for JPEG steganalysis, based on Elastic Supernet with Dynamic Training (ESDT). The method involves constructing a weight-nesting supernet, with the largest subnetwork pretrained on imageNet (a large-scale visual database widely used for pretraining deep learning models) and finetuning for JPEG steganalysis. Based on this pretrained network, we aim to enhance the model's performance in downstream tasks while reducing reliance on domain knowledge. A progressive shrinking strategy is introduced during supernet training to accommodate the need of elastic kernel sizes, depths, and widths. In the final stage, we utilize a performance predictor to identify the optimal subnetwork within the refined supernet. Extensive experiments showcase the method's superiority over state-of-the-art methods in JPEG steganalysis, achieving lower computational costs and superior generalization performance.
image denoising aims to restore a clean image from a noisy image. Traditional methods utilizing convolutional neural networks (CNN) for denoising are trained using pairs of noisy and clean images to comprehend the tra...
详细信息
image denoising aims to restore a clean image from a noisy image. Traditional methods utilizing convolutional neural networks (CNN) for denoising are trained using pairs of noisy and clean images to comprehend the transformation from a noisy image to a clean one. However, the acquisition of such image pairs in real-world scenarios presents a challenge. Hence, numerous self-supervised denoising techniques have been developed that do not require clean images for training. This study demonstrates that a straightforward loss design, concentrating on variance, can effectively train a standard CNN denoiser in a self-supervised fashion. A novel theoretical framework is introduced for training a basic CNN denoising model using three constraints: mean, variance, and augmentation. The variance constraint is crucial as it prevents the trained model from converging to trivial solutions such as identity or zero mapping. This theory provides valuable insights for the development of new self-supervised denoising methods. Furthermore, a method that applies this theory to proposed dual networks is developed, which consist of two standard CNN models predicting both the clean image and the noise. This approach enhances model capacity during training while minimizing computational costs during inference. This method exemplifies the implementation of the variance constraint and introduces a data constraint for dual networks. Notably, the proposed method only assumes the presence of additive white noise, irrespective of the noise distribution. This minimal assumption enhances the model's robustness against noise with complex or unknown distributions in real-world distorted images. Experimental results indicate that the proposed Noise2Variance method exhibits commendable performance on peak signal noise ratio and structural similarity metrics compared to existing self-supervised denoising techniques. Visual comparison of results further substantiates the efficacy of the proposed method. A
Convolutional neural Networks (CNNs) exhibit exceptional performance within the imageprocessing domain. The acceleration of convolutions for CNNs has consistently represented a focal point within machine learning har...
详细信息
ISBN:
(纸本)9798350350920
Convolutional neural Networks (CNNs) exhibit exceptional performance within the imageprocessing domain. The acceleration of convolutions for CNNs has consistently represented a focal point within machine learning hardware accelerators. However, with the continuous development of CNNs, the design costs and project workloads of hardware accelerators have significantly increased. To enhance accelerator performance while reducing time-related expenses, it is necessary to determine a series of optimal design parameters during the early stages of accelerator design. To achieve this objective, the concept of design space exploration (DSE) for CNN accelerators is proposed. However, as neural networks become increasingly complex, the demands for DSE methods have also grown, rendering the existing methods unsuitable for meeting the real-time requirements of accelerators, and unable to discover the optimal design. In this paper, we introduce a DSE framework based on the Genetic Simulated Annealing (GSA) algorithm. The proposed framework autonomously generates the hardware design parameters such as parallelism degrees based on the resource constraint and CNN model. Our method is evaluated with two typical CNN accelerators. Experimental results show that our method largely improves the DSE efficiency, reducing the exploration time by up to 73.7x when compared to existing DSE methods.
Demosaicing and denoising of RAW images are crucial steps in the imagesignalprocessing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Demosaicing and denoising of RAW images are crucial steps in the imagesignalprocessing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAWimages and accumulate errors from one step to another. Recent deep neural-network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise. In this paper, we propose a novel joint demosaicing and denoising method, dubbed JDD-DoubleDIP, which operates directly on a single RAW image without requiring any training data. We validate the effectiveness of our method on two popular datasets-Kodak and McMaster-with various noises and noise intensities. The experimental results show that our method consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception.
Effective crack detection is vital for pavement safety and durability. In recent years, deep learning methods have achieved promising results in automated crack detection. However, advanced large-scale convolutional n...
详细信息
Effective crack detection is vital for pavement safety and durability. In recent years, deep learning methods have achieved promising results in automated crack detection. However, advanced large-scale convolutional neural networks (CNNs) often rely on numerous trainable parameters for deep feature extraction, therefore, these models are computationally expensive, the complexity of these advanced models makes them impractical for deployment on small Internet of Things devices. In this study, we introduce a novel model specifically designed for pavement crack detection, named Multi-Scale and Detail-Attention-based Crack Classification Model, we adopts a novel multi-scale dual-branch structure for effective feature extraction, the focus is on improving the model's ability to perceive local and global information at different semantic scales, using a decoupled attention mechanism to achieve more effective focus on key information. In addition, we introduce a Stem Block to reduce the feature representation dimension, making the model more lightweight. We tested our proposed model on two standard datasets, the experimental results indicate that our model achieves a parameter count of only 0.41 M, while maintaining a crack detection accuracy exceeding 99%. Compared to existing CNN models, our model outperforms current methods in terms of both complexity and detection accuracy. These results demonstrate the proposed model offers superior performance for pavement crack detection, making it highly suitable for practical applications.
Object detection in unfavourable weather conditions presents significant challenges due to reduced visibility, increased noise, and frequent occlusions, limiting the effectiveness of conventional methods. This paper i...
详细信息
Object detection in unfavourable weather conditions presents significant challenges due to reduced visibility, increased noise, and frequent occlusions, limiting the effectiveness of conventional methods. This paper introduces a novel hybrid model combining Convolutional neural networks (CNNs) with Diffusion neural networks (Diffusion NNs) to address these issues. The proposed model synergistically integrates the feature extraction strengths of CNNs with the robust generative modeling capabilities of Diffusion NNs, enabling enhanced object detection under challenging environmental conditions. The hybrid architecture leverages CNNs to efficiently capture spatial and contextual features, while Diffusion NNs improve robustness by generating refined representations in noisy and incomplete scenarios. This approach is evaluated against state-of-the-art deep learning techniques, including YOLOv5, Faster R-CNN, and Vision Transformers. The proposed model achieves 91.8% accuracy, outperforming existing architectures. It also exhibits superior robustness (89.3%) and computational efficiency (70 FPS), making it a promising solution for real-time applications. These findings highlight the potential of generative enhancements in improving object detection reliability, particularly in adverse conditions. This paper contributes to the growing field of hybrid neural network architectures and their practical implementation for challenging computer vision tasks.
暂无评论