The proceedings contain 153 papers. The topics discussed include: towards efficient learned image coding for machines via saliency-driven rate allocation;transformer-based spatial-temporal feature lifting for 3D hand ...
ISBN:
(纸本)9798350359855
The proceedings contain 153 papers. The topics discussed include: towards efficient learned image coding for machines via saliency-driven rate allocation;transformer-based spatial-temporal feature lifting for 3D hand mesh reconstruction;accuracy improvement of depth map estimation from multi-view images using NeRF;learning end-to-end depth maps compression with conditional quality-controllable autoencoder;tangent space sampling of video sequence with locally structured unitary network;efficient lightweight attention based learned image compression;a method for multi-linear TV channels streaming based on non-uniform tiled structure;and subspace learning machine with soft partitioning (SLM/SP): methodology and performance benchmarking.
The proceedings contain 139 papers. The topics discussed include: cross-device image saliency detection: database and comparative analysis;performance evaluation of feature detectors and descriptors with close-range s...
ISBN:
(纸本)9798331529543
The proceedings contain 139 papers. The topics discussed include: cross-device image saliency detection: database and comparative analysis;performance evaluation of feature detectors and descriptors with close-range solar panel images;inter Submesh border information coding with skip mode in V-DMC;advancements in Lenslet video coding: insights from MPEG LVC;advanced learning-based inter prediction for future video coding;packed regions information SEI message;content-adaptive rate-quality curve prediction model in media processing system;deep reinforcement learning-based camera autofocus with gaussian process regression;and frame similarity-based screen content video quality enhancement via adaptive long short-term fusion.
The proceedings contain 113 papers. The topics discussed include: visual analysis motivated super-resolution model for image reconstruction;hierarchical reinforcement learning based video semantic coding for segmentat...
ISBN:
(纸本)9781665475921
The proceedings contain 113 papers. The topics discussed include: visual analysis motivated super-resolution model for image reconstruction;hierarchical reinforcement learning based video semantic coding for segmentation;distinguishing computer-generated images from photographic images: a texture-aware deep learning-based method;high-speed scene reconstruction from low-light spike streams;one shot object detection via hierarchical adaptive alignment;reduced reference quality assessment for point cloud compression;a fast and effective framework for camera calibration in sport videos;dynamic mesh commonality modeling using the cuboidal partitioning;CNN-based post-processing filter for video compression with multi-scale feature representation;history-parameter-based affine model inheritance;robust dynamic background modeling for foreground estimation;space and level cooperation framework for pathological cancer grading;and semantic attribute guided image aesthetics assessment.
The proceedings contain 138 papers. The topics discussed include: the enhancement of underexposed images with blurred reflectance;geodesic disparity compensation for inter-view prediction in VR180;two recent advances ...
ISBN:
(纸本)9781728180670
The proceedings contain 138 papers. The topics discussed include: the enhancement of underexposed images with blurred reflectance;geodesic disparity compensation for inter-view prediction in VR180;two recent advances on normalization methods for deep neural network optimization;sparse representation-based intra prediction for lossless/near lossless video coding;recent advances in end-to-end learned image and video compression;FishUI: interactive fisheye distortion visualization;orthogonal features fusion network for anomaly detection;4D-DCT hardware architecture for JPEG Pleno light field coding;and deep blind video quality assessment for user generated videos.
The proceedings contain 134 papers. The topics discussed include: improving person re-identi?cation performance using body mask via cross-learning strategy;privacy-preserving fall detection with deep learning on mmWav...
ISBN:
(纸本)9781728137230
The proceedings contain 134 papers. The topics discussed include: improving person re-identi?cation performance using body mask via cross-learning strategy;privacy-preserving fall detection with deep learning on mmWave radar signal;stereoscopic image quality assessment weighted guidance by disparity map using convolutional neural network;depthwise separable convolutional neural network for image forensics;low resolution recognition of aerial images;fast QTMT partition decision algorithm in VVC intra coding based on variance and gradient;and adaptive CU Split decision with pooling-variable CNN for VVC intra encoding.
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food safety detection. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern o...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern of micro-images. Due to the particular structure of lenslet images, traditional video codecs perform poorly on lenslet video. Previous works have proposed a preprocessing scheme that cuts and realigns the micro-images on each lenslet frame. While effective, this method introduces high frequency components into the processed image. In this paper, we propose an additional step to the aforementioned scheme by applying an invertible smoothing transform. We evaluate the enhanced scheme on lenslet video sequences captured with single-focused and multi-focused plenoptic cameras. On average, the enhanced scheme achieves 9.85% bitrate reduction compared to the existing scheme.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
The proceedings contain 125 papers. The topics discussed include: two-stream federated learning: reduce the communication costs;a new update strategy for blocks with low correlation in 3-D recursive search;eye movemen...
ISBN:
(纸本)9781538644584
The proceedings contain 125 papers. The topics discussed include: two-stream federated learning: reduce the communication costs;a new update strategy for blocks with low correlation in 3-D recursive search;eye movement pattern modeling and visual comfort viewing S3D images;motion trajectory based spatial-temporal degradation measurement for video quality assessment;two-pass rate control for constant quality in high efficiency video coding;adaptive motion vector prediction for omnidirectional video;generative adversarial network-based frame extrapolation for video coding;a CNN-based in-loop filter with CU classification for HEVC;synthesizing 3D acoustic-articulatory mapping trajectories: predicting articulatory movements by long-term recurrent convolutional neural network;analysis of smoothed LHE methods for processingimages with optical illusions;and deep network with spatial and channel attention for person re-identification.
暂无评论