Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, t...
ISBN:
(数字)9781728123455
ISBN:
(纸本)9781728123462
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, the performance of existing networks is greatly limited. In this paper, we exploit temporal correlation in video frames to improve the performance and robustness of segmentation networks. Two effective learning strategies are proposed to propagate the information from a few labeled frames to their immediate neighbor frames. First, we scale up training dataset for supervised semantic segmentation networks by generating pseudo ground-truth for neighboring frames from a labeled frame using filtered homography transformation. Furthermore, we introduce a self-supervised loss function to ensure temporal consistency between the segmentation results of adjacent frames. The experimental results demonstrate that our proposed method outperforms state-of-the-art techniques for semantic segmentation on NYU-Depth V2 dataset.
The following topics are dealt with: video coding; data compression; image coding; convolutional neural nets; decoding; learning (artificial intelligence); motion compensation; video codecs; image reconstruction; filt...
The following topics are dealt with: video coding; data compression; image coding; convolutional neural nets; decoding; learning (artificial intelligence); motion compensation; video codecs; image reconstruction; filtering theory.
—Photo-realistic point cloud capture and transmission are the fundamental enablers for immersive visual communication. The coding process of dynamic point clouds, especially video-based point cloud compression (V-PCC...
详细信息
Synthetic aperture radar (SAR) has a good ability to detect the microwave scattering characteristics of the target and has a good capability of slant range Doppler positioning. Using multi-view SAR images in combinati...
详细信息
Automatic color enhancement is aimed to adaptively adjust photos to expected styles and tones. For current learned methods in this field, global harmonious perception and local details are hard to be well-considered i...
详细信息
Light field image quality assessment (LF-IQA) plays a significant role due to its guidance to Light Field (LF) contents acquisition, processing and application. The LF can be represented as 4-D signal, and its quality...
详细信息
High-resolution SAR has large transmitting bandwidth and wide synthetic aperture. How to understand and take advantage of the variation characteristics of SAR scattering characteristics with angle and frequency is a t...
详细信息
High-resolution SAR has large transmitting bandwidth and wide synthetic aperture. How to understand and take advantage of the variation characteristics of SAR scattering characteristics with angle and frequency is a topic that worth studying. This article establishes a coherence matrix of sub-band and sub-aperture SAR images, and analyzes its ability to classify scattering mechanism. Experiments are conducted using the TerraSAR-X high-resolution data of different scenarios, and some meaningful results are got, which may provide some support to the analysis and application of high-resolution SAR data.
This paper proposes an extension version of our previous work MS-CC to achieve optical and SAR images change detection. The proposed method introduces a cooperative multitemporal segmentation, whose merging process co...
详细信息
This paper proposes an extension version of our previous work MS-CC to achieve optical and SAR images change detection. The proposed method introduces a cooperative multitemporal segmentation, whose merging process considers the heterogeneity of SAR and optical images as parallel information, making sure that the multitemporal information can be fully utilized without interfering with each other. Then, the change detection strategy based on compound classification is carried out on the segmentation results, obtaining the multi-scale change detection maps. Experimental validation is conducted with GoaFen3 and Google Earth data.
The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This p...
详细信息
The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. According to the newest reports, deep schemes have achieved comparable or even higher compression efficiency than the state-of-the-art traditional schemes, such as High Efficiency Video Coding (HEVC) based scheme, for image coding;deep tools have demonstrated the compression capability beyond HEVC for video coding. However, deep schemes have not yet reached the current height of HEVC for video coding, and deep tools remain largely unexplored at many aspects including the tradeoff between compression efficiency and encoding/decoding complexity, the optimization for perceptual naturalness or semantic quality, the speciality and universality, the federated design of multiple deep tools, and so on. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Vi
Segmentation of multiple anatomical structures is of great importance in medical image analysis. In this study, we proposed a W-net to simultaneously segment both the optic disc (OD) and the exudates in retinal images...
详细信息
暂无评论