In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images ar...
详细信息
ISBN:
(纸本)9781728185514
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-bas...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-based applications, we have established the first multi-source composite image (MSCI) database for image quality assessment (IQA). Our MSCI database contains 80 reference images and 1600 distorted images, generated by four advanced compression standards with five distortion levels. In particular, these five distortion levels are determined based on the first five just noticeable difference (JND) levels. Moreover, we verify the IQA performance of some representative methods on our MSCI database. The experimental results show that the performance of the existing methods on the MSCI database needs to be further improved.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sh...
详细信息
ISBN:
(纸本)9781728180687
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sharing the same distortion type but not well when processingimages in different distortion types, which is inconsistent with human visual system (HVS). Although a large number of metrics based on image error has been proposed, there are still difficulties and limitations. To solve this problem, a full reference image quality assessment (FR-IQA) method based on MAE is proposed in this paper. The metric divides the image error (difference between distorted image and reference image) map into smooth region and texture-edge region, calculates their mean values respectively, and then gives them different weights considering the masking effect. The key innovation of this paper is to propose a distortion significance measurement, which is a visual quality coefficient that can effectively indicate the influence of different distortion types on perceptual quality and unify them with HVS. The segmented image error maps are weighted by the distortion significance coefficient. The experimental results on four largest benchmark databases show that the most of the distortions are successfully evaluated and the results are consistent with HVS.
Recently, deep learning-based video compression algorithms have achieved competitive performance in Bjontegaard delta (BD) rate, especially those adopting super-resolution networks as post-processing modules in downsa...
详细信息
ISBN:
(纸本)9781665475921
Recently, deep learning-based video compression algorithms have achieved competitive performance in Bjontegaard delta (BD) rate, especially those adopting super-resolution networks as post-processing modules in downsampling-based video compression (DBC) frameworks. However, limited by the non-differentiable characteristics of traditional codecs, DBC frameworks mainly focus on improving the performance of super-resolution modules while ignoring optimizing downscaling modules. It is crucial to improve video compression performance without introducing additional modifications to the decoder client in practical application scenarios. We propose a context-aware processing network (CPN) compatible with standard codecs with no computational burden introduced to the client, which preserves the critical information and essential structures during downscaling. The proposed CPN works as a precoder cascaded by standard codecs to improve the compression performance on the server before encoding and transmission. Besides, a surrogate codec is employed to simulate the degradation process of the standard codecs and backpropagate the gradient to optimize the CPN. Experimental results show that the proposed method outperforms latest pre-processing networks and achieves considerable performance compared with the latest DBC frameworks.
VCIP 2022 "Tire pattern image classification based on lightweight network challenge" aims to design lightweight networks that correctly classify tire surface tread patterns and indentation images using less ...
详细信息
ISBN:
(纸本)9781665475921
VCIP 2022 "Tire pattern image classification based on lightweight network challenge" aims to design lightweight networks that correctly classify tire surface tread patterns and indentation images using less overhead. To this end, we present a novel lightweight tire tread classification network. Concretely, we adopt the ShuffleNet-V2-x0.5 network as our backbone. To reduce the computation complexity, we introduce the Space-To-Depth and Anti-Alias Downsampling modules to pre-process the input image. Moreover, to enhance the classification ability of our model, we adopt the knowledge distillation strategy by considering Vision Transformer as the teacher network. To ensure the robustness of our model, we pre-train it on imageNet and fine-tune the training set of the challenge. Experiments on the challenge dataset demonstrate that our model achieves superior performance, with 99.00% classification accuracy, 25.51M FLOPs, and 0.20M parameters.
Automod, the content moderation system, is an artificial intelligence solution that enables the detection of similarities and inconsistencies in visual content (image, video, etc.). It is designed as a content moderat...
详细信息
ISBN:
(纸本)9798350343557
Automod, the content moderation system, is an artificial intelligence solution that enables the detection of similarities and inconsistencies in visual content (image, video, etc.). It is designed as a content moderation system to detect the similarity and inconsistencies of user-generated visual content (images and videos). With the similarity module installed, labor savings of 15% were achieved, and F1 score results of 90% and higher were achieved for nonconformity detection models. More than 100.000 images can be evaluated daily, and the system's load was tested. Similarly, keyframes obtained from at least 65.000 video content that can be evaluated daily were passed through nonconformity models, and load test was applied.
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a c...
详细信息
ISBN:
(纸本)9781728185514
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naive subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporatin...
详细信息
ISBN:
(纸本)9781665475921
Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporating image-based audio representations of actions in a task. To this end, we propose Multimodal Audio-image and Video Action Recognizer (MAiVAR), a CNN-based audio-image to video fusion model that accounts for video and audio modalities to achieve superior action recognition performance. MAiVAR extracts meaningful image representations of audio and fuses it with video representation to achieve better performance as compared to both modalities individually on a large-scale action recognition dataset.
Super-resolution enhancement algorithms are used to estimate a high-resolution video still (HRVS) from several low-resolution frames, provided that objects within the digital image sequence move with subpixel incremen...
详细信息
Super-resolution enhancement algorithms are used to estimate a high-resolution video still (HRVS) from several low-resolution frames, provided that objects within the digital image sequence move with subpixel increments, A Bayesian multiframe enhancement algorithm is presented to compute an HRVS using the spatial information present within each frame as well as the temporal information present due to object motion between frames, However, the required subpixel-resolution motion vectors must be estimated from low-resolution and noisy video frames, resulting in an inaccurate motion held which can adversely impact the quality of the enhanced image. Several subpixel motion estimation techniques are incorporated into the Bayesian multiframe enhancement algorithm to determine their efficacy in the presence of global data transformations between frames (i.e., camera pan, rotation, tilt, and zoom) and independent object motion. visual and quantitative comparisons of the resulting high-resolution video stills computed from two video frames and the corresponding estimated motion fields show that the eight-parameter projective motion model is appropriate for global scene changes, while block matching and Horn-Schunck optical flow estimation each have their own advantages and disadvantages when used to estimate independent object motion. (C) 1998 Academic Press.
暂无评论