Deep-learning (DL)-based image deconvolution (ID) has exhibited remarkable recovery performance, surpassing traditional linear methods. However, unlike traditional ID approaches that rely on analytical properties of t...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
Deep-learning (DL)-based image deconvolution (ID) has exhibited remarkable recovery performance, surpassing traditional linear methods. However, unlike traditional ID approaches that rely on analytical properties of the point spread function (PSF) to achieve high recovery performance-such as specific spectrum properties or small conditional numbers in the convolution matrix-DL techniques lack quantifiable metrics for evaluating PSF suitability for DL-assisted recovery. Aiming to enhance deconvolution quality, we propose a metric that employs a non-linear approach to learn the invertibility of an arbitrary PSF using a neural network by mapping it to a unit impulse. A lower discrepancy between the mapped PSF and a unit impulse indicates a higher likelihood of successful inversion by a DL network. Our findings reveal that this metric correlates with high recovery performance in DL and traditional methods, thereby serving as an effective regularizer in deconvolution tasks. This approach reduces the computational complexity over conventional condition number assessments and is a differentiable process. These useful properties allow its application in designing diffractive optical elements through end-to-end (E2E) optimization, achieving invertible PSFs, and outperforming the E2E baseline framework.
High-resolution (HR) medical images can provide rich details, which are important for discovering subtle lesions to make diagnoses. Convolutional neural networks (CNNs) are widely used in this field, but struggle to m...
详细信息
High-resolution (HR) medical images can provide rich details, which are important for discovering subtle lesions to make diagnoses. Convolutional neural networks (CNNs) are widely used in this field, but struggle to model long-range dependencies. Although transformer-based methods have improved in this respect, this method requires large quantities of data. Unfortunately, large quantities of low -resolution (LR) and HR medical image pairs may not always be available. In addition, most medical image superresolution (SR) methods are deterministic, while the degradation in real scenarios is stochastic. To address these problems, we introduce a probabilistic degradation model that combines natural and medical images for training. This design alleviates the problem of insufficient medical image pairs and learns the degradation process of the natural scene. In addition, we propose a new medical image SR model that consists of CNNs and the Swin Transformer structure to excavate both local and global semantic features. Moreover, to reduce computational stress, the spherical locality -sensitive hashing (SLSH) module is employed in the nonlocal attention (NLA) mechanism to form the ENLA module. This design enables the proposed Sparse Swin Transformer (SSFormer) model to generate HR medical images without extensive training images. Experiments on diverse datasets (natural images and medical images) demonstrate that the proposed method is robust and effective, qualitatively and quantitatively outperforming other medical image SR methods. Code is available at https://***/codehxj/SSFormer.& COPY;2023 Elsevier Ltd. All rights reserved.
The effectiveness of positioning techniques that utilize the receiver signal strength (RSS) is highly dependent on the instability of the received signal strength indicator (RSSI). Up to now, there is no strategy that...
详细信息
Joint low-light enhancement and deblurring is a challenging imaging inverse problem that estimates clean images from photography corrupted by both low-light and blurring artifacts. To address this task, we propose FEL...
ISBN:
(纸本)9798350344868;9798350344851
Joint low-light enhancement and deblurring is a challenging imaging inverse problem that estimates clean images from photography corrupted by both low-light and blurring artifacts. To address this task, we propose FELI, a Fast and physically Enriched deep neural network for joint Low-light enhancement and image deblurring. In a departure from recently proposed end-to-end networks, FELI employs a learnable Decomposer during training based on Retinex theory that helps with low-light scene recovery. FELI's encoded features are further enriched by an input reconstruction task cognizant of the blur model leading to effective deblurring. We introduce a new customized contrastive regularization (CCR) term that pulls the restored clean image closer to the ground truth while pushing it far away from both the input and reconstructed input. Experiments performed on challenging synthetic and real-world datasets demonstrate that FELI outperforms state-of-the-art methods at a lower computational cost.
High Dynamic Range (HDR) images can be reconstructed from multiple Low Dynamic Range (LDR) images using existing deep neural network (DNN) techniques. Despite notable advancements, DNN-based methods still exhibit ghos...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
High Dynamic Range (HDR) images can be reconstructed from multiple Low Dynamic Range (LDR) images using existing deep neural network (DNN) techniques. Despite notable advancements, DNN-based methods still exhibit ghosting artifacts when handling LDR images with saturation and significant motion. Recent Diffusion models (DMs) have been introduced in HDR imaging, showcasing promising performance, especially in achieving visually perceptible results. However, DMs typically require numerous inference iterations to recover the clean image from Gaussian noise, demanding substantial computational resources. Additionally, DM only learns a probability distribution of the added noise in each step but neglects image space constraints on HDR images, limiting distortion-based metrics. To tackle these challenges, we propose an efficient network that integrates DM modules into existing regression-based models, providing reliable content reconstruction for HDR while avoiding limitations in distortion-based metrics.
Spatial information learning, temporal modeling and channel relationships capturing are important for action recognition in videos. In this work, an attention-based multi-feature aggregation (AMA) module that encodes ...
详细信息
Spatial information learning, temporal modeling and channel relationships capturing are important for action recognition in videos. In this work, an attention-based multi-feature aggregation (AMA) module that encodes the above features in a unified module is proposed, which contains a spatial-temporal aggregation (STA) structure and a channel excitation (CE) structure. STA mainly employs two convolutions to model spatial and temporal features, respectively. The matrix multiplication in STA has the ability of capturing long-range dependencies. The CE learns the importance of each channel, so as to bias the allocation of available resources toward the informative features. AMA module is simple yet efficient enough that can be inserted into a standard ResNet architecture without any modification. In this way, the representation of the network can be enhanced. We equip ResNet-50 with AMA module to build an effective AMA Net with limited extra computation cost, only 1.002 times that of ResNet-50. Extensive experiments indicate that AMA Net outperforms the state-of-the-art methods on UCF101 and HMDB51, which is 6.2% and 10.0% higher than the baseline. In short, AMA Net achieves the high accuracy of 3D convolutional neural networks and maintains the complexity of 2D convolutional neural networks simultaneously.
Computer vision plays a crucial role in current technological development, understanding a scene from the properties of 2D images. This research line becomes valuable in sports applications, where the scenario can be ...
详细信息
Computer vision plays a crucial role in current technological development, understanding a scene from the properties of 2D images. This research line becomes valuable in sports applications, where the scenario can be challenging to take technical decisions only from the observation. This work aims to develop a system based on computer vision for analyzing tennis games. The implemented method captures videos during the game through cameras installed on the court. Machine learning methods and morphological operations will be used over the images to locate the ball position, the court lines and the players location. In addition, the algorithm determines the moment the ball bounces during the game and analyzes whether it occurred in or out of the field. These data are available to players and judges through an Android application, allowing all processed data to be accessed from mobile devices, providing the results quickly and accessible to the user. From the results obtained, the system demonstrated robustness and reliability.
The detection and localization of anomalous objects in video sequences remain a challenging task in video analysis. Recent years have witnessed a surge in deep learning approaches, especially with recurrent neural net...
详细信息
The detection and localization of anomalous objects in video sequences remain a challenging task in video analysis. Recent years have witnessed a surge in deep learning approaches, especially with recurrent neural networks (RNNs). However, RNNs have limitations that vision transformers (ViTs) can address. We propose a novel solution that leverages ViTs, which have recently achieved remarkable success in various computer vision tasks. Our approach involves a two-step process. First, we utilize a pre-trained ViT model to generate an intermediate representation containing an attention map, highlighting areas critical for anomaly detection. In the second step, this attention map is concatenated with the original video frame, creating a richer representation that guides the U-Net model towards anomaly-prone regions. This enriched data is then fed into a U-Net model for precise localization of the anomalous objects. The model achieved a mean Intersection over Union (IoU) of 0.70, indicating a strong overlap between the predicted bounding boxes and the ground truth annotations. In the field of anomaly detection, a higher IoU score signifies better performance. Moreover, the pixel accuracy of 0.99 demonstrates a high level of precision in classifying individual pixels. Concerning localization accuracy, we conducted a comparison of our method with other approaches. The results obtained show that our method outperforms most of the previous methods and achieves a very competitive performance in terms of localization accuracy.
Training deep neural networks has become a common approach for addressing image restoration problems. An alternative for training a "task-specific" network for each observation model is to use pretrained dee...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Training deep neural networks has become a common approach for addressing image restoration problems. An alternative for training a "task-specific" network for each observation model is to use pretrained deep denoisers for imposing only the signal's prior within iterative algorithms, without additional training. Recently, this approach has become increasingly popular with the rise of diffusion/score-based generative models, whose core is iterative denoising. Using denoisers for general purpose restoration requires guiding the iterations to ensure agreement of the signal with the observations. In low-noise settings, guidance that is based on back-projection (BP) has been shown to be a promising strategy (used recently in the context of diffusion models also under the names "pseudoinverse" or "range/null-space" guidance). However, the presence of noise in the observations hinders the gains from this approach. In this paper, we propose a novel guidance technique, based on preconditioning that allows traversing from BP-based guidance to least squares based guidance along the restoration scheme. The proposed approach is robust to noise while still having much simpler implementation than alternative methods (e.g., no SVD is required). We demonstrate its advantages for image deblurring and superresolution.
Algorithms for multisignals detection using imageprocessing are investigated. Approaches based on digital imageprocessing, as well as on the use of neural networks and deep learning are considered. A comparative ana...
详细信息
暂无评论