This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binar...
详细信息
ISBN:
(纸本)9781728185514
This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binary errors per packet. We first present a theoretical analysis of the CRC-EC candidate list while increasing the number of errors considered. We then analyze the candidate list reduction resulting from subsequent checksum validation and video decoding steps. Simulations conducted on two wireless networks show that the network considered has a huge impact on CRC-EC performance. Over a Bluetooth low energy (BLE) channel with Eb/No=8 dB, an average PSNR improvement of 4.4 dB on videos is achieved when CRC-EC corrects up to 5, rather than 3 errors per packet.
In many current videos is necessary to include hidden information. It can be done with the use of steganography. Steganography is based on the limited capabilities of human senses, which is why people are not able to ...
详细信息
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on...
详细信息
ISBN:
(纸本)9781728185514
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACS) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using de...
详细信息
ISBN:
(纸本)9781665435536
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using deep learning has been proposed. When using deep learning, the dataset to be used is important. In this paper, we describe a method for detecting painting defects using imageprocessing, automatically generating data for deep learning, and using these data for classification using deep learning.
Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted ident...
详细信息
ISBN:
(纸本)9781728185514
Advances in cameras and web technology have made it easy to capture and share large amounts of face videos over to an unknown audience with uncontrollable purposes. These raise increasing concerns about unwanted identity-relevant computer vision devices invading the characters's privacy. Previous de-identification methods rely on designing novel neural networks and processing face videos frame by frame, which ignore the data feature in redundancy and continuity. Besides, these techniques are incapable of well-balancing privacy and utility, and per-frame evaluation is easy to cause flicker. In this paper, we present deep motion flow, which can create remarkable de-identified face videos with a good privacy-utility tradeoff. It calculates the relative dense motion flow between every two adjacent original frames and runs the high quality image anonymization only on the first frame. The de-identified video will be obtained based on the anonymous first frame via the relative dense motion flow. Extensive experiments demonstrate the effectiveness of our proposed de-identification method.
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-time processing...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-time processing of 4K 60fps encoded videos transmitted via 12G-SDI. Experimental results suggest that a minor bitrate increase for the transmission of the neural-network model weights can enhance the quality of the videos encoded by Versatile Video Coding (VVC).
Video coding, a process of compressing and decompressing digital video content, has traditionally been optimized for human visual systems by reducing its size while maintaining the human perceptual quality. However, w...
详细信息
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with ...
详细信息
ISBN:
(纸本)9781728185514
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with other computer vision datasets. Unlike many works that use data augmentation to solve this problem, we adopt a novel strategy, called, label splitting, to deal with limited samples. Specifically, each sample has its original semantic label, we assign a new appearance label via unsupervised clustering for each sample by label splitting. Then an optimized triplet loss learning is applied to distill domain specific knowledge. This is achieved through a binary tree forest partitioning and triplets selection and optimization scheme that controls the triplet quality. Simulation results on NWPU, UCM and AID datasets demonstrate that proposed solution achieves the state-of-the-art performance in the aerial image classification.
The article focuses on the problem of imageprocessing using a discrete data structure for tone images. The problem of conversion of continuous image into discrete form is considered. Two procedures are described: sel...
详细信息
Events in videos usually contain a variety of factors: objects, environments, actions, and their interaction relations, and these factors as the mid-level semantics can bridge the gap between the event categories and ...
详细信息
ISBN:
(纸本)9781728185514
Events in videos usually contain a variety of factors: objects, environments, actions, and their interaction relations, and these factors as the mid-level semantics can bridge the gap between the event categories and the video clips. In this paper, we present a novel video events recognition method that uses the graph convolution networks to represent and reason the logic relations among the inner factors. Considering that different. kinds of events may focus on different factors, we especially use the transformer networks to extract the spatial-temporal features drawing upon the attention mechanism that can adaptively assign weights to concerned key factors. Although transformers generally rely more on large datasets, we show the effectiveness of applying a 2D convolution backbone before the transformers. We train and test our framework on the challenging video event recognition dataset UCF-Crime and conduct ablation studies. The experimental results show that our method achieves state-of-the-art performance, outperforming previous principal advanced models with a significant margin of recognition accuracy.
暂无评论