Lossy compression is widely used for video compression, but it often introduces compression artifacts that degrade the visual quality of compressed videos. Consequently, numerous deep learning-based methods have been ...
详细信息
ISBN:
(纸本)9798350367331;9798350367348
Lossy compression is widely used for video compression, but it often introduces compression artifacts that degrade the visual quality of compressed videos. Consequently, numerous deep learning-based methods have been developed to post-process compressed videos. However, previous post-processing models often encounter difficulties when there is a domain gap between the training and test datasets. Test-time optimization (TTO), a technique that finetunes the model during the test stage, has been considered an effective solution to address the domain gap problem. In this paper, we introduce a novel TTO method specialized for compression artifacts reduction. Specifically, we propose using image pairs available on the decoder-side, i.e., the images before and after the adaptive loop filtering of the versatile video coding standard, as input and target of TTO such that the post-processing model can be adapted to the characteristics of test data. Experimental results on several baseline models and test datasets demonstrate the effectiveness of the proposed method in post-processing compressed videos.
The newest video coding standard, Versatile Video Coding (VVC), adopts a quad-Tree (QT) plus multi-Type tree (QTMT) block partition structure and improves the compression performance by about 30%∼50%, compared with t...
详细信息
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many visio...
详细信息
ISBN:
(纸本)9781665475921
Compared to RGB images non-linearly mapped from RAW data through the image Signal Processor (ISP), RAW data are linear to scene radiance and contain more native information, which is better to be modeled in many vision tasks. This work proposes to enhance low-light images in the RAW domain via a cross-scale framework using paired Fast Fourier Convolution (FFC) and Transformer, driving the network to characterize images effectively. The entire framework has three scales to abstract low-level, mid-level, and high-level representations of input images. We embed paired FFC and Transformer in each scale to attain spatial-spectral information extraction and aggregation. Specifically, by transforming features from the spatial domain into the spectral domain with FFC, pixel correlations can be effectively exploited locally and globally, generating representative features for the input image. Immediately, the Transformer using multi-head self-attention mechanism is applied to aggregate and embed important features. Experimental results demonstrate that our method significantly outperforms state-of-the-art low-light enhancement works in both full reference assessment metrics, including PSNR, MPSNR, and SSIM, and no-reference metrics, such as NIMA. Meanwhile, the perceptual quality of the proposed method is more visually pleasing than that of other methods.
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge o...
详细信息
ISBN:
(纸本)9781665475921
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge of novel categories with few training samples. In FSL task, Meta-learning and metric learning have achieved impressive results. However, the performance of this task is still limited by large intra-class variance and small inter-class distance caused by limited number of few samples. To solve this problem, In this paper, we propose a new method, which integrates meta-learning and metric learning techniques. Specifically, we first propose a feature representation module (FR) to construct representative support class prototypes and query features. Then, we design bias loss to minimize the bias between support and query samples. Furthermore, we design an intra-class loss to minimize the distance between query class prototype and each query sample. We denote this model as ML-FDA and validate it on standard few-shot classification benchmark datasets (MiniimageNet, CIFAR-FS, FC100). The results show that our method improves the performance over other same paradigm methods and achieves the best performance on most benchmarks. The ablation study and visulization analysis also demonstrate the effectiveness of our method.
Proliferative Diabetic Retinopathy (PDR) is a serious retinal disease threatening diabetic patients. Intense retinal neovascularization in the retinal image is the most important clinical symptom of PDR, leading to vi...
详细信息
ISBN:
(纸本)9781665475921
Proliferative Diabetic Retinopathy (PDR) is a serious retinal disease threatening diabetic patients. Intense retinal neovascularization in the retinal image is the most important clinical symptom of PDR, leading to visual distortion if not controlled. Accurate and timely detection of neovascularization from retinal images allows patients to receive adequate treatment to avoid further vision loss. In this work, we propose a retinal neovascularization automatic segmentation model based on improved Pyramid Scene Parsing Network (PSP-Net). To improve the accuracy of the model, we introduce the proposed channel attention module into the model. The network is evaluated with color fundus images from practice. Evaluation results show the network is superior to FCN, SegNet, U-Net and PSP-Net in accuracy and sensitivity. The model could achieve accuracy, sensitivity, specificity, precision and Jaccard similarity score of 0.9832, 0.9265, 0.9897, 0.9116 and 0.8501, respectively. This paper proves through plenty of experimental results that the network model is able to improve the accuracy of segmentation, relieve the workload of doctors, and is worthy of further clinical promotion.
In this paper, we propose a high-frequency guided CNN for video compression artifacts reduction. In the proposed method, high frequency component in Y channel is extracted and used to guide the quality enhancement of ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a high-frequency guided CNN for video compression artifacts reduction. In the proposed method, high frequency component in Y channel is extracted and used to guide the quality enhancement of all Y, U, V channels. As high frequency component contains the edge and contour information of the objects in the image, which is of vital importance to both subjective and objective quality. In general, the proposed method consists of two modules: the high frequency guidance module and the quality enhancement module. The high-frequency guidance module uses multiple octave convolutions to extract the high-frequency component in Y channel and then fuse it into the features of Y, U, and V channels. While in the quality enhancement module, multiple CNN residual blocks are used for the quality enhancement of Y, U, and V channels. The proposed method was integrated into both HM-16.22 and VTM-16.0. The results on the JVET test sequence under All Intra configuration shows the effectiveness of the proposed method. Compared with HEVC, the proposed method achieves the average BD-rate reductions of -12.3%, -22.7% and -23.5% for Y, U and V channels respectively. Compared with VVC, the average BD-rate reductions are -6.7%, -12.3% and -13.2% correspondingly.
This paper explores the potential of a learned two-layer B-frame codec, known as TLZMC. TLZMC is one of the few early attempts that deviate from the hybrid-based coding architecture by skipping motion coding. With TLZ...
详细信息
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a convolutional neural network (CNN)-based post-processing filter for video compression with multi-scale feature representation. The discrete wavelet transform (DWT) decomposes an image into multi-frequency and multi-directional sub-bands, and can figure out artifacts caused by video compression with multi-scale feature representation. Thus, we combine DWT with CNN and construct two sub-networks: Step-like sub-band network (SLSB) and mixed enhancement network (ME). SLSB takes the wavelet subbands as input, and feeds them into the Res2Net group (R2NG) from high frequency to low frequency. R2NG consists of Res2Net modules and adopts spatial and channel attentions to adaptively enhance features. We combine the high frequency sub-band output with the low frequency sub-band in R2NG to capture multi-scale features. ME uses mixed convolution composed of dilated convolution and standard convolution as the basic block to expand the receptive field without blind spots in dilated convolution and further improve the reconstruction quality. Experimental results demonstrate that the proposed CNN filter achieves average 2.13 %, 2.63 %, 2.99 %, 4.8 %, 3.72 % and 4.5 % BD-rate reductions over VTM 11.0-NNVC anchor for Y channel on A1, A2, B, C, D and E classes of the common test conditions (CTC) in AI, RA and LDP configurations, respectively.
Near-infrared (NIR) imaging can acquire more details and textures with less noise in low-light environments compared to RGB. As a result, it has been widely used in low-light vision scenarios such as CCTV, autonomous ...
详细信息
Magnetic Resonance Imaging (MRI) is widely used for medical diagnosis, staging and follow-up of disease. However, MRI images may have artifacts due to various reasons such as patient movement or machine distortion, wh...
详细信息
ISBN:
(纸本)9781665475921
Magnetic Resonance Imaging (MRI) is widely used for medical diagnosis, staging and follow-up of disease. However, MRI images may have artifacts due to various reasons such as patient movement or machine distortion, which may be unintentionally introduced during the procedure of medical image acquisition, processing, etc. These artifacts may affect the effectiveness of diagnosis or even cause false diagnosis. To solve this problem, we propose a general medical image quality assessment (MIQA) methodology, including subjective MIQA procedures and objective MIQA algorithms. We further apply this methodology to MRI images in this paper due to its widespread use in practical applications. We first establish a magnetic resonance imaging quality assessment (MRIQA) database, which contains 3809 MRI images. Then a subjective image quality assessment experiment is conducted by expert doctors according to the diagnostic value of these images, which split all MRI images into 1285 low quality images and 2524 high quality images. We then conduct a baseline deep learning experiment, and propose an attention based MIQANet model to automatically separate MRI images into high quality and low quality based on their diagnosis value. Our proposed method achieves a great quality assessment accuracy of 96.59%. The constructed MRIQA database and proposed MIQA model will be public available to further promote medical IQA research.
暂无评论