The proceedings contain 139 papers. The topics discussed include: studying the added value of computational saliency in objective image quality assessment;foveation-based image quality assessment;a novel objective qua...
ISBN:
(纸本)9781479961399
The proceedings contain 139 papers. The topics discussed include: studying the added value of computational saliency in objective image quality assessment;foveation-based image quality assessment;a novel objective quality assessment method for perceptual video coding in conversational scenarios;optimized spatial and temporal resolution based on subjective quality estimation without encoding;blind image quality assessment based on a new feature of nature scene statistics;image transformation using limited reference with application to photo-sketch synthesis;a novel metric for efficient video shot boundary detection;hybrid modeling of natural image in wavelet domain;discriminative multi-modality non-negative sparse graph model for action recognition;and intrinsic flexibility exploiting for scalable video streaming over multi-channel wireless networks.
The proceedings contain 139 papers. The topics discussed include: cross-device image saliency detection: database and comparative analysis;performance evaluation of feature detectors and descriptors with close-range s...
ISBN:
(纸本)9798331529543
The proceedings contain 139 papers. The topics discussed include: cross-device image saliency detection: database and comparative analysis;performance evaluation of feature detectors and descriptors with close-range solar panel images;inter Submesh border information coding with skip mode in V-DMC;advancements in Lenslet video coding: insights from MPEG LVC;advanced learning-based inter prediction for future video coding;packed regions information SEI message;content-adaptive rate-quality curve prediction model in media processing system;deep reinforcement learning-based camera autofocus with gaussian process regression;and frame similarity-based screen content video quality enhancement via adaptive long short-term fusion.
The proceedings contain 153 papers. The topics discussed include: towards efficient learned image coding for machines via saliency-driven rate allocation;transformer-based spatial-temporal feature lifting for 3D hand ...
ISBN:
(纸本)9798350359855
The proceedings contain 153 papers. The topics discussed include: towards efficient learned image coding for machines via saliency-driven rate allocation;transformer-based spatial-temporal feature lifting for 3D hand mesh reconstruction;accuracy improvement of depth map estimation from multi-view images using NeRF;learning end-to-end depth maps compression with conditional quality-controllable autoencoder;tangent space sampling of video sequence with locally structured unitary network;efficient lightweight attention based learned image compression;a method for multi-linear TV channels streaming based on non-uniform tiled structure;and subspace learning machine with soft partitioning (SLM/SP): methodology and performance benchmarking.
The proceedings contain 113 papers. The topics discussed include: visual analysis motivated super-resolution model for image reconstruction;hierarchical reinforcement learning based video semantic coding for segmentat...
ISBN:
(纸本)9781665475921
The proceedings contain 113 papers. The topics discussed include: visual analysis motivated super-resolution model for image reconstruction;hierarchical reinforcement learning based video semantic coding for segmentation;distinguishing computer-generated images from photographic images: a texture-aware deep learning-based method;high-speed scene reconstruction from low-light spike streams;one shot object detection via hierarchical adaptive alignment;reduced reference quality assessment for point cloud compression;a fast and effective framework for camera calibration in sport videos;dynamic mesh commonality modeling using the cuboidal partitioning;CNN-based post-processing filter for video compression with multi-scale feature representation;history-parameter-based affine model inheritance;robust dynamic background modeling for foreground estimation;space and level cooperation framework for pathological cancer grading;and semantic attribute guided image aesthetics assessment.
The proceedings contain 138 papers. The topics discussed include: the enhancement of underexposed images with blurred reflectance;geodesic disparity compensation for inter-view prediction in VR180;two recent advances ...
ISBN:
(纸本)9781728180670
The proceedings contain 138 papers. The topics discussed include: the enhancement of underexposed images with blurred reflectance;geodesic disparity compensation for inter-view prediction in VR180;two recent advances on normalization methods for deep neural network optimization;sparse representation-based intra prediction for lossless/near lossless video coding;recent advances in end-to-end learned image and video compression;FishUI: interactive fisheye distortion visualization;orthogonal features fusion network for anomaly detection;4D-DCT hardware architecture for JPEG Pleno light field coding;and deep blind video quality assessment for user generated videos.
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food safety detection. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superi...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality by using image captions as sub-information. This paper demonstrates that using a large multi-modal model (LMM), it is possible to generate captions and compress them within a single model. We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network, resulting in a 41.58% improvement in LPIPS BD-rate compared to existing methods. Our implementation and pre-trained weights are available at https://***/tokkiwa/imageTextCoding.
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter mu. Training with DC loss, we obtain latent features with smaller values of coding symbols and s, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This pape...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This paper presents a method to enhance semantic image communication that combines a novel lossy semantic encoding approach with spatially adaptive semantic image synthesis models. By developing a model-agnostic training augmentation strategy, our approach substantially reduces susceptibility to distortion introduced during encoding, effectively eliminating the need for lossless semantic encoding. Comprehensive evaluation across two spatially adaptive conditioning methods and three popular datasets indicates that this approach enhances semantic image communication at very low bit rate regimes.
Quanta image sensors are a novel paradigm in image sensor technology. Their direct application to quanta image sensors-based imaging systems is challenging because a bit-plane image is a set of binary images. In this ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Quanta image sensors are a novel paradigm in image sensor technology. Their direct application to quanta image sensors-based imaging systems is challenging because a bit-plane image is a set of binary images. In this paper, we introduce spatiotemporal priors based on the intensity invariance and smoothness characteristics of the motion vector. Specifically, we model when the image sequences align with the correct motion vector, the spatiotemporal structure becomes more consistent. Moreover, the spatial smoothness prior is incorporated through the smoothing filtering of the evaluation metrics of motion vector candidates. The experimental results show that the proposed method is more effective than conventional methods.
暂无评论