Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communicati...
详细信息
ISBN:
(纸本)9781728185514
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.
In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the f...
详细信息
ISBN:
(纸本)9781728185514
In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the full mining of intra-view information, which limits the reconstruction performance of these methods. Since single image SR (SISR) methods are powerful in intra-view information exploitation, we propose to introduce the knowledge distillation strategy to transfer the knowledge of a SISR network (teacher network) to a stereo image SR network (student network). With the help of the teacher network, the student network can easily learn more intra-view information. Specifically, we propose pixel-wise distillation as the implementation method, which not only improves the intra-view information extraction ability of student network, but also ensures the effective learning of cross-view information. Moreover, we propose a lightweight student network named Adaptive Residual Feature Aggregation network (ARFAnet). Its main unit, the ARFA module, can aggregate informative residual features and produce more representative features for image reconstruction. Experimental results demonstrate that our teacher-student network achieves state-of-the-art performance on all benchmark datasets.
作者:
Gazzeh, SoulaymaDouik, AliENISo
University of Sousse Laboratory of Networked Objects Control and Communications Systems Sousse Tunisia
visual perception and road scene understanding are critical components of autonomous driving (AD). Detecting objects and predicting surrounding road user behavior are important tasks for robust and safe driving system...
详细信息
The animation market structure continues to be optimized, and the animation economic cake continues to grow. Character image building is a very important work content, this part of the work is the most important part ...
详细信息
ISBN:
(纸本)9783031243660;9783031243677
The animation market structure continues to be optimized, and the animation economic cake continues to grow. Character image building is a very important work content, this part of the work is the most important part of the whole film and TVseries. Animation characters from the character, characteristics, character psychological activities, action and other aspects to show their own character image. All these features are innovative elements in film and television, which provide their own services for the development of film and television plot and make the development of film and television more visual. This paper studies the research on the image reconstruction of Internet of Things artificial intelligence and virtual reality technology in film and television character reconstruction, and points out the related content of the research on the image reconstruction of film and television character reconstruction. The test confirmed that artificial intelligence of the Internet of Things and virtual reality technology had excellent performance in the image reconstruction of film and television character role shaping.
It is of great significance to classify garbage through deep learning and imageprocessing technology to realize garbage recycling and resource reuse. We propose a garbage classification system based on improved Faste...
详细信息
With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval task...
详细信息
ISBN:
(纸本)9781728185514
With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval tasks and has achieved good accuracy. However, it does not focus on the compression target primarily, and its output is a fixed-length quantization index, which is not suitable for subsequent compression. In this paper, we propose an entropy-based deep product quantization algorithm for deep feature compression. Firstly, it introduces entropy into hard and soft quantization strategies, which can adapt to the codebook optimization and codeword determination operations in the training and testing processes respectively. Secondly, the loss functions related to entropy are designed to adjust the distribution of quantization index, so that it can accommodate to the subsequent entropy coding module. Experimental results carried on retrieval tasks show that the proposed method can be generally combined with deep product quantization and its extended schemes, and can achieve a better compression performance under near lossless condition.
In this paper, we will introduce a series of work required for the design model of the insulation ladder identification technology, including the construction of the model training environment, the preparation of data...
详细信息
In computer vision, unsupervised domain adaptation (UDA) is an approach to transferring knowledge from a label-rich source domain to a fully-unlabeled target domain. Conventional UDA approaches have two problems. The ...
详细信息
Understanding and generating natural language descriptions from images is a fundamental challenge in vision-language tasks within artificial intelligence. This paper introduces a novel image captioning framework that ...
详细信息
ISBN:
(数字)9798331524227
ISBN:
(纸本)9798331524234
Understanding and generating natural language descriptions from images is a fundamental challenge in vision-language tasks within artificial intelligence. This paper introduces a novel image captioning framework that integrates scene graph generation to improve the semantic richness of generated captions. The proposed method employs the Relation Transformer (RelTR) model to extract structural representations from visual scenes in the form of subject-predicate-object triplets. A transformer-based captioning model then utilizes these structured scene graphs to produce fluent and contextually accurate captions. Experimental evaluations on the visual Genome dataset demonstrate that our approach yields superior semantic coherence and captioning accuracy compared to traditional image-to-text models. The incorporation of relational scene understanding results in captions that are more contextually informed and descriptive.
We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. Howe...
详细信息
ISBN:
(纸本)9781728185514
We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.
暂无评论