image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely ma...
详细信息
ISBN:
(纸本)9781728185514
image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images ar...
详细信息
ISBN:
(纸本)9781728185514
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.
The emergence of Foundation Vision-Language Models (VLMs) has ignited a surge of research in the computer vision field due to their robust baseline performance. Inspired by this, we propose the Anchoring Vision-Langua...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The emergence of Foundation Vision-Language Models (VLMs) has ignited a surge of research in the computer vision field due to their robust baseline performance. Inspired by this, we propose the Anchoring Vision-Language Network (AnViL-Net), which integrates a vision language model for the challenging task of Weakly-Supervised Group Activity Recognition (WSGAR). Our network effectively incorporates VLMs into WSGAR, addressing the challenges posed by dynamic actor motions and domain-specific activity classes. AnViL-Net leverages highly generalized VLM vision features as anchors for extracting visual features. Additionally, semantically meaningful VLM language features serve as anchors for inferring the semantic relationships between actors and their activities. We demonstrate the effectiveness of AnViL-Net on multiple group activity datasets, achieving competitive state-of-the-art results.
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-bas...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-based applications, we have established the first multi-source composite image (MSCI) database for image quality assessment (IQA). Our MSCI database contains 80 reference images and 1600 distorted images, generated by four advanced compression standards with five distortion levels. In particular, these five distortion levels are determined based on the first five just noticeable difference (JND) levels. Moreover, we verify the IQA performance of some representative methods on our MSCI database. The experimental results show that the performance of the existing methods on the MSCI database needs to be further improved.
visual complexity can be defined as the difficulty of defining an image or the number of details in the image. Studying visual complexity is important for understanding human visual system and developing systems for h...
详细信息
ISBN:
(纸本)9781728172064
visual complexity can be defined as the difficulty of defining an image or the number of details in the image. Studying visual complexity is important for understanding human visual system and developing systems for human use. In this study, a deep learning model developed for object identification was used in the context of transfer learning for developing two models to predict human visual complexity judgments. The model that was trained for making prediction within the same domain is named as within-domain, while the model that was trained for making prediction between different domains is named as cross-domain model. After the training phase, within-domain model can predict which image is more complex than the other with 95% accuracy and the cross-domain model with 78% accuracy. visual complexity scores of images that are not in the test set were calculated by means of image complexity comparisons of models. It was found that the correlation between real visual complexity scores and calculated scores was strong, even for object (0.77) and scene (0.83) categories.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sh...
详细信息
ISBN:
(纸本)9781728180687
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sharing the same distortion type but not well when processingimages in different distortion types, which is inconsistent with human visual system (HVS). Although a large number of metrics based on image error has been proposed, there are still difficulties and limitations. To solve this problem, a full reference image quality assessment (FR-IQA) method based on MAE is proposed in this paper. The metric divides the image error (difference between distorted image and reference image) map into smooth region and texture-edge region, calculates their mean values respectively, and then gives them different weights considering the masking effect. The key innovation of this paper is to propose a distortion significance measurement, which is a visual quality coefficient that can effectively indicate the influence of different distortion types on perceptual quality and unify them with HVS. The segmented image error maps are weighted by the distortion significance coefficient. The experimental results on four largest benchmark databases show that the most of the distortions are successfully evaluated and the results are consistent with HVS.
As 2-D image communication systems come into use widely, 3-D imaging technology enhancing the reality of visual communication is getting to be considered as a promising next-generation medium that can revolutionize in...
详细信息
ISBN:
(纸本)0819424358
As 2-D image communication systems come into use widely, 3-D imaging technology enhancing the reality of visual communication is getting to be considered as a promising next-generation medium that can revolutionize information systems. To date, 3-D image communication has not been discussed at a comprehensive level because several kinds of promising 3-D display technologies are still making rapid progress. Considering such a situation, this paper introduces the concept of the ''Integrated 3-D visual Communication System''. The key feature in this new concept is a display-independent neutral representation of visual data. The flexibility of this concept will promote the progress of 3-D image communication systems before the 3-D display technology reaches maturity. In this paper, for this purpose, ray-based approach is examined. In the present representation method, the whole ray data is equally treated as a set of orthogonal views of the scene objects. The advantage of this approach is to allow the synthesis of any perspective view by gathering appropriate ray data from the set of orthogonal views independently of any geometric representation. A real-time progressive transmission method has been also examined. The experimental results show how the present representation method could be applied to the next-generation 3-D image communication system.
Recently, deep learning-based video compression algorithms have achieved competitive performance in Bjontegaard delta (BD) rate, especially those adopting super-resolution networks as post-processing modules in downsa...
详细信息
ISBN:
(纸本)9781665475921
Recently, deep learning-based video compression algorithms have achieved competitive performance in Bjontegaard delta (BD) rate, especially those adopting super-resolution networks as post-processing modules in downsampling-based video compression (DBC) frameworks. However, limited by the non-differentiable characteristics of traditional codecs, DBC frameworks mainly focus on improving the performance of super-resolution modules while ignoring optimizing downscaling modules. It is crucial to improve video compression performance without introducing additional modifications to the decoder client in practical application scenarios. We propose a context-aware processing network (CPN) compatible with standard codecs with no computational burden introduced to the client, which preserves the critical information and essential structures during downscaling. The proposed CPN works as a precoder cascaded by standard codecs to improve the compression performance on the server before encoding and transmission. Besides, a surrogate codec is employed to simulate the degradation process of the standard codecs and backpropagate the gradient to optimize the CPN. Experimental results show that the proposed method outperforms latest pre-processing networks and achieves considerable performance compared with the latest DBC frameworks.
The Asynchronous Transfer Mode (ATM) appears as the standard protocol for image and video transmision. There is virtually no bandwidth limitations neither a restricted size of operating area. However, the main problem...
详细信息
ISBN:
(纸本)0819424358
The Asynchronous Transfer Mode (ATM) appears as the standard protocol for image and video transmision. There is virtually no bandwidth limitations neither a restricted size of operating area. However, the main problem stands in the non secured transmission when ATM native applications are implemented. This induced a new way of encoding images where the redundancy (classically managed by the network protocol) is generated into the CoDec. In this paper, we present the Mojette transform that generates the redundancy at the higher level of the coder in order to safely transmit image data. Block and wavelet implementations associated with the Mojette transform are presented and compared not only from the coder point of view but for the source and the channel characteristics. For this specific case we also present the asynchronous Mojette reconstruction. An adapted object oriented model has been developped accordingly.
暂无评论