Pixel recovery with deep learning has shown to be very effective for a variety of low-level vision tasks like image super-resolution, denoising, and deblurring. Most existing works operate in the spatial domain, and t...
详细信息
ISBN:
(纸本)9781728185514
Pixel recovery with deep learning has shown to be very effective for a variety of low-level vision tasks like image super-resolution, denoising, and deblurring. Most existing works operate in the spatial domain, and there are few works that exploit the transform domain for image restoration tasks. In this paper, we present a transform domain approach for image deblocking using a deep neural network called DCTResNet. Our application is compressed video motion deblur, where the input video frame has blocking artifacts that make the deblurring task very challenging. Specifically, we use a block-wise Discrete Cosine Transform (DCT) to decompose the image into its low and high-frequency sub-band images and exploit the strong subband specific features for more effective deblocking solutions. Since JPEG also uses DCT for image compression, using DCT sub-band images for image deblocking helps to learn the JPEG compression prior to effectively correct the blocking artifacts. Our experimental results show that both PSNR and SSIM for DCTResNet perform more favorably than other state-of-the-art (SOTA) methods, while significantly faster in inference time.
The recently approved digital still image standard known as JPEG2000 promises to be an excellent image and video format for use with a large range of applications. For adoption of the standard to take place in the con...
详细信息
The recently approved digital still image standard known as JPEG2000 promises to be an excellent image and video format for use with a large range of applications. For adoption of the standard to take place in the consumer marketplace, implementations supporting real-time encoding and decoding of popular image and video formats must be created. It is a well-known fact that the major bottleneck of a JPEG2000 system is the bit/context modeling and arithmetic coding tasks (also known as tier-1 coding). This paper discusses a hardware implementation of a tier-1 coder that exploits available parallelisms. The proposed technique described in this paper is approximately 50% faster than the best technique described in the literature(1).
For effective noise removal prior to video processing, noise power or noise variance of an input video sequence needs to be found exactly, but it is actually a very difficult process. This paper presents an accurate n...
详细信息
ISBN:
(纸本)9780819469946
For effective noise removal prior to video processing, noise power or noise variance of an input video sequence needs to be found exactly, but it is actually a very difficult process. This paper presents an accurate noise variance estimation algorithm based on motion compensation between two adjacent noisy pictures. Firstly, motion estimation is performed for each block in a picture, and the residue. variance of the best motion-compensated block is calculated. Then, a noise variance estimate of the picture is obtained by adaptively averaging and properly scaling the variances close to the best variance. The simulation results show that the proposed noise estimation algorithm is very accurate and stable irrespective of noise level.
Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding...
详细信息
ISBN:
(纸本)9781665475921
Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low and high frequencies. This enables our decoder to selectively decode bitstreams in a quality-scalable manner. Hence, the decoder can produce an enhanced image by using an enhancement bitstream in addition to the base bitstream. Furthermore, our method is able to enhance only a specific region of interest (ROI) by using a corresponding part of the enhancement latent representation. Our experiments demonstrate that the proposed method shows competitive rate-distortion performance compared to several non-scalable image codecs. We also showcase the effectiveness of our two-level quality scalability, as well as its practicality in ROI quality enhancement.
The exponential increase of digital data and the limited capacity of current storage devices have made clear the need for exploring new storage solutions. Thanks to its biological properties, DNA has proven to be a po...
详细信息
ISBN:
(纸本)9781728185514
The exponential increase of digital data and the limited capacity of current storage devices have made clear the need for exploring new storage solutions. Thanks to its biological properties, DNA has proven to be a potential candidate for this task, allowing the storage of information at a high density for hundreds or even thousands of years. With the release of nanopore sequencing technologies, DNA data storage is one step closer to become a reality. Many works have proposed solutions for the simulation of this sequencing step, aiming to ease the development of algorithms addressing nanopore-sequenced reads. However, these simulators target the sequencing of complete genomes, whose characteristics differ from the ones of synthetic DNA. This work presents a nanopore sequencing simulator targeting synthetic DNA on the context of DNA data storage.
Compressed image quality assessment (IQA) has been a crucial part of a wide range of image services such as storage and transmission. Due to the effect of different bit rates and compression methods, the compressed im...
详细信息
ISBN:
(纸本)9781728185514
Compressed image quality assessment (IQA) has been a crucial part of a wide range of image services such as storage and transmission. Due to the effect of different bit rates and compression methods, the compressed images usually have different levels of quality. Nowadays, the mainstream full-reference (FR) metrics are effective to predict the quality of compressed images at coarse-grained levels, however, they may perform poorly when quality differences of the compressed images are quite subtle. To better improve the Quality of Experience (QoE) and provide useful guidance for compression algorithms, we propose an FR-IQA metric for fine-grained compressed images, which estimates the image quality by analyzing the difference of structure and texture. Our metric is mainly validated on the fine-grained compression IQA (FGIQA) database and is tested on other commonly used compression IQA databases as well. The experimental results show that our metric outperforms mainstream FR-IQA metrics on the fine-grained compression IQA database and also obtains competitive performance on the coarse-grained compression IQA databases.
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of q...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of quantization, inherent to lossy image compression, introduces errors that can degrade the quality of the reconstructed image. To address this challenge, we propose a novel Quantization Error Compensator (QEC), which leverages spatial context within latent representations and hyperprior information to effectively mitigate the impact of quantization error. Moreover, we propose a tailored quantization error optimization training strategy to further improve rate-distortion performance. Notably, QEC serves as a lightweight, plug-and-play module, offering high flexibility and seamless integration into various learned image compression methods. Extensive experimental results consistently demonstrate significant coding efficiency improvements achievable by incorporating the proposed QEC into state-of-the-art methods, with a slight increase in runtime.
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In...
详细信息
ISBN:
(纸本)9781728185514
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-andexcitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the f...
详细信息
ISBN:
(纸本)9781728185514
In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the full mining of intra-view information, which limits the reconstruction performance of these methods. Since single image SR (SISR) methods are powerful in intra-view information exploitation, we propose to introduce the knowledge distillation strategy to transfer the knowledge of a SISR network (teacher network) to a stereo image SR network (student network). With the help of the teacher network, the student network can easily learn more intra-view information. Specifically, we propose pixel-wise distillation as the implementation method, which not only improves the intra-view information extraction ability of student network, but also ensures the effective learning of cross-view information. Moreover, we propose a lightweight student network named Adaptive Residual Feature Aggregation network (ARFAnet). Its main unit, the ARFA module, can aggregate informative residual features and produce more representative features for image reconstruction. Experimental results demonstrate that our teacher-student network achieves state-of-the-art performance on all benchmark datasets.
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM)...
详细信息
ISBN:
(纸本)9781728180687
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.
暂无评论