coding for machines is an emerging research topic aiming at compressing visual signals for machine analysis. Most existing research on it consider supervised feature compression, i.e., compressing features for a parti...
详细信息
ISBN:
(纸本)9798350387261;9798350387254
coding for machines is an emerging research topic aiming at compressing visual signals for machine analysis. Most existing research on it consider supervised feature compression, i.e., compressing features for a particular task and dataset. This paper explores a more flexible scenario in which the machine vision task (e.g., the classes to predict) is unknown during training and encoding (i.e., only decided during decoding). To achieve this goal, we compress self-supervised learning (SSL) features, which are not tied to a particular dataset but can be used for various tasks without re-training. Empirical studies are provided to analyze and derive an SSL feature compression system. Despite its simplicity, we show that linear transform coding achieves comparable or better rate-accuracy performance for SSL features compared to more advanced techniques. Although SSL feature compression performs slightly worse than its supervised counterpart, it generalizes well for out-of-distribution datasets. We highlight the use cases of various feature compression schemes and provide insights into developing future work. The code is made publicly available(1).
Lossy image coding standards such as JPEG and MPEG have successfully achieved high compression rates for human consumption of multimedia data. However, with the increasing prevalence of IoT devices, drones, and self-d...
详细信息
ISBN:
(纸本)9798350300673
Lossy image coding standards such as JPEG and MPEG have successfully achieved high compression rates for human consumption of multimedia data. However, with the increasing prevalence of IoT devices, drones, and self-driving cars, machines rather than humans are processing a greater portion of captured visual content. Consequently, it is crucial to pursue an efficient compressed representation that caters not only to human vision but also to image processing and machine vision tasks. Drawing inspiration from the efficient coding hypothesis in biological systems and the modeling of the sensory cortex in neural science, we repurpose the compressed latent representation to prioritize semantic relevance while preserving perceptual distance. Our proposed method, Compressed Perceptual Image Patch Similarity (CPIPS), can be derived at a minimal cost from a learned neural codec and computed significantly faster than DNN-based perceptual metrics such as LPIPS and DISTS.
暂无评论