For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficienc...
详细信息
ISBN:
(纸本)9781728185514
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder encodes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.
This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binar...
详细信息
ISBN:
(纸本)9781728185514
This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binary errors per packet. We first present a theoretical analysis of the CRC-EC candidate list while increasing the number of errors considered. We then analyze the candidate list reduction resulting from subsequent checksum validation and video decoding steps. Simulations conducted on two wireless networks show that the network considered has a huge impact on CRC-EC performance. Over a Bluetooth low energy (BLE) channel with Eb/No=8 dB, an average PSNR improvement of 4.4 dB on videos is achieved when CRC-EC corrects up to 5, rather than 3 errors per packet.
In many current videos is necessary to include hidden information. It can be done with the use of steganography. Steganography is based on the limited capabilities of human senses, which is why people are not able to ...
详细信息
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on...
详细信息
ISBN:
(纸本)9781728185514
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACS) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using de...
详细信息
ISBN:
(纸本)9781665435536
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using deep learning has been proposed. When using deep learning, the dataset to be used is important. In this paper, we describe a method for detecting painting defects using imageprocessing, automatically generating data for deep learning, and using these data for classification using deep learning.
Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted ident...
详细信息
ISBN:
(纸本)9781728185514
Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted identity-relevant computer vision devices invading the characters's privacy. Previous de-identification methods rely on designing novel neural networks and processing face videos frame by frame, which ignore the data feature in redundancy and continuity. Besides, these techniques are incapable of well-balancing privacy and utility, and per-frame evaluation is easy to cause flicker. In this paper, we present deep motion flow, which can create remarkable de-identified face videos with a good privacy-utility tradeoff. It calculates the relative dense motion flow between every two adjacent original frames and runs the high quality image anonymization only on the first frame. The de-identified video will be obtained based on the anonymous first frame via the relative dense motion flow. Extensive experiments demonstrate the effectiveness of our proposed de-identification method.
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-time processing...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-time processing of 4K 60fps encoded videos transmitted via 12G-SDI. Experimental results suggest that a minor bitrate increase for the transmission of the neural-network model weights can enhance the quality of the videos encoded by Versatile Video Coding (VVC).
Video coding, a process of compressing and decompressing digital video content, has traditionally been optimized for human visual systems by reducing its size while maintaining the human perceptual quality. However, w...
详细信息
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with ...
详细信息
ISBN:
(纸本)9781728185514
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with other computer vision datasets. Unlike many works that use data augmentation to solve this problem, we adopt a novel strategy, called, label splitting, to deal with limited samples. Specifically, each sample has its original semantic label, we assign a new appearance label via unsupervised clustering for each sample by label splitting. Then an optimized triplet loss learning is applied to distill domain specific knowledge. This is achieved through a binary tree forest partitioning and triplets selection and optimization scheme that controls the triplet quality. Simulation results on NWPU, UCM and AID datasets demonstrate that proposed solution achieves the state-of-the-art performance in the aerial image classification.
The article focuses on the problem of imageprocessing using a discrete data structure for tone images. The problem of conversion of continuous image into discrete form is considered. Two procedures are described: sel...
详细信息
暂无评论