We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing t...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is direc...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, t...
详细信息
作者:
Liu, SenZhao, ShuxinPang, YingxueChen, ZhiboCAS
Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China
There is plenty of human-machine joint decision-making scenarios in the real world applications, such as driving assistant, suspect identification, medical diagnosis, etc. Existing algorithms propose that machine shou...
详细信息
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.
Video stitching remains a challenging problem in computer vision. In this paper, we propose a novel edge-guided method to stitch multiple videos that have small overlapped regions. Our algorithm consists of three step...
详细信息
Automatic medical image segmentation has wide applications for disease diagnosing. However, it is much more challenging than natural optical image segmentation due to the high-resolution of medical images and the corr...
详细信息
Automatic medical image segmentation has wide applications for disease diagnosing. However, it is much more challenging than natural optical image segmentation due to the high-resolution of medical images and the corresponding huge computation cost. The sliding window is a commonly used technique for whole slide image (WSI) segmentation, however, for these methods based on the sliding window, the main drawback is lacking global contextual information for supervision. In this paper, we propose a dual-inputs attention network (denoted as DA-RefineNet) for WSI segmentation, where both local fine-grained information and global coarse information can be efficiently utilized. Sufficient comparative experiments are conducted to evaluate the effectiveness of the proposed method, the results prove that the proposed method can achieve better performance on WSI segmentation compared to methods relying on single-input.
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require di...
详细信息
ISBN:
(数字)9781728163956
ISBN:
(纸本)9781728163963
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require different semantic granularities of the image. Thus, the bitstream is designed to be scalable in the sense that progressive decoding of the bitstream provides coarse-to-fine semantic granularities. As a concrete example, we consider the task of coarse-grained and fine-grained image classification. We present a method to compress the multiple deep feature maps that are intermediate representations of an image passing a trained deep network. The deep-layer feature maps can serve for coarse-grained image classification while the shallow-layer feature maps can serve for fine-grained image classification. Experimental results demonstrate the feasibility of the proposed method, as well as the advantage of the semantically scalable coding paradigm.
As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated ...
详细信息
With the support of Beidou terminal of short message service system, the space science satellite can transmit all-day data that are astronomical alert data such as gravitational waves and Gamma ray bursts and satellit...
详细信息
暂无评论