We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
Video stitching remains a challenging problem in computer vision. In this paper, we propose a novel edge-guided method to stitch multiple videos that have small overlapped regions. Our algorithm consists of three step...
详细信息
Supraglacial lake plays an important role in ice sheet dynamics, mass balance and sea level rise. Therefore, it is of great importance to extract supraglacial lake and obtain its spatial-temporal distribution or chang...
详细信息
ISBN:
(纸本)9781665468893
Supraglacial lake plays an important role in ice sheet dynamics, mass balance and sea level rise. Therefore, it is of great importance to extract supraglacial lake and obtain its spatial-temporal distribution or change. This study provides an automatic extraction model for supraglacial lake, using Synthetic Aperture Radar (SAR) imagery based on deep learning. First, select 19580 Sentinel-1 SAR imagery patches in eight typical areas for manual labeling. Second, the GPU-based U-Net model is used to implement the training of the supraglacial lake, and the results are evaluated in different sites. Finally, the training model is used to perform the supraglacial lake extraction. In addition, this article also introduces ArcticDEM to remove shadow confusion in the margin of the ice sheet. The global-local threshold segmentation method is used to extract the supraglacial lake on the Sentinel-2 MSI imagery, which is a comparative analysis and information supplement for the extracted results in this paper. The results show that: (1) The U-Net network selected in this paper is suitable for processing small sample size SAR data and multi-modal feature extraction. The GPU parallel processing method can achieve rapid extraction of massive data and reduce time cost. (2) The Dice coefficient of the training model reaches 0.98, which can be used for effective extraction of supraglacial lake. (3) Compared with the results of optical image extraction, the algorithm proposed in this paper can identify lakes in areas covered by snow or thin ice, which truly reflects the supraglacial lake temporal and spatial distribution characteristics.
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is direc...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, t...
详细信息
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require di...
详细信息
ISBN:
(数字)9781728163956
ISBN:
(纸本)9781728163963
In this paper, we consider a novel image coding paradigm, termed semantically scalable coding. In the new paradigm, coded bitstream serves for multiple different semantic analysis tasks, and different tasks require different semantic granularities of the image. Thus, the bitstream is designed to be scalable in the sense that progressive decoding of the bitstream provides coarse-to-fine semantic granularities. As a concrete example, we consider the task of coarse-grained and fine-grained image classification. We present a method to compress the multiple deep feature maps that are intermediate representations of an image passing a trained deep network. The deep-layer feature maps can serve for coarse-grained image classification while the shallow-layer feature maps can serve for fine-grained image classification. Experimental results demonstrate the feasibility of the proposed method, as well as the advantage of the semantically scalable coding paradigm.
作者:
Liu, SenZhao, ShuxinPang, YingxueChen, ZhiboCAS
Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China
There is plenty of human-machine joint decision-making scenarios in the real world applications, such as driving assistant, suspect identification, medical diagnosis, etc. Existing algorithms propose that machine shou...
详细信息
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing t...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.
This article takes Liuxi River Basin in Guangzhou as the research object, selects Landsat remote sensing data of 1993, 2000, 2011 and 2018 as data source, to access the five indices, including vegetation coverage, DEM...
详细信息
暂无评论