检索结果-内蒙古大学图书馆

IEEE Visual Communications and Image processing (VCIP)

作者： Haichuan Ma Dong Liu Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (数字)9781728180687

ISBN: (纸本)9781728180694

We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.

关键词： Image coding Neural networks Decoding Training Feature extraction Computational modeling Transform coding

来源：评论

学校读者我要写书评

暂无评论

LEARNED SCALABLE IMAGE COMPRESSION WITH BIDIRECTIONAL CONTEXT DISENTANGLEMENT NETWORK

arXiv

引用

arXiv 2018年

作者： Zhang, Zhizheng Chen, Zhibo Lin, Jianxin Li, Weiping CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical representations, we first adopt bit-plane decomposition to decompose the information coarsely before the deep-learning-based transformation. However, the information carried by different bit-planes is not only unequal in entropy but also of different importance for reconstruction. We thus take the hidden features corresponding to different bit-planes as the context and design a network topology with bidirectional flows to disentangle the contextual information for more effective compressed representations. Our proposed scheme enables us to obtain the compressed codes with scalable rates via a one-pass encoding-decoding. Experiment results demonstrate that our proposed model outperforms the state-of-the-art DNN-based scalable image compression methods in both PSNR and MS-SSIM metrics. In addition, our proposed model achieves better performance in MS-SSIM metric than conventional scalable image codecs. Effectiveness of our technical components is also verified through sufficient ablation experiments. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Hierarchical quadtree-based flexible block ordering in HEVC intra coding

Hierarchical quadtree-based flexible block ordering in HEVC ...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Lei Guo Dong Liu Li Li Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509053179

In all of the existing block-based image and video coding standards, blocks are processed in the fixed scan order. Then in HEVC intra coding, intra prediction is always based on the top and/or left neighboring reconstructed pixels, which incurs less accurate prediction for blocks where the spatial correlation is not along the topleft-to-bottomright direction. To obtain better intra prediction, we propose to flexibly determine the coding order of blocks in HEVC intra coding. Complying with the hierarchical quadtree structure in HEVC, our flexible block ordering (FBO) technique recursively decides the coding order of four sub-blocks when splitting one block. Moreover, we propose new methods to perform inter/extrapolation for intra prediction so as to fully utilize neighboring reconstructed pixels, not always being top/left. Experimental results show that our proposed FBO technique achieves on average 2.9% BD-rate reduction compared to HEVC baseline.

关键词： Encoding High efficiency video coding Image reconstruction Correlation Standards Interpolation Copper

来源：评论

学校读者我要写书评

暂无评论

Hybrid transform for HEVC-based lossless coding

Hybrid transform for HEVC-based lossless coding

引用

IEEE International Symposium on Circuits and systems (IScas)

作者： Fangdong Chen Jinlei Zhang Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781479934331

The High Efficiency Video Coding (HEVC) with the transform bypass mode is simple but inefficient for lossless coding. For this reason, we propose a novel transform to further eliminate the redundancy between residues of different blocks in intra prediction. Dependent on intra prediction modes, the proposed transform is adaptable to exploit correlations of residues formed by different modes. In order to accurately obtain parameters of the transform matrix, an approach similar to the Wiener filtering method is adopted. Experimental results show that on top of the lossless coding mode in HEVC, our method offers the performance with a 7.4% bit-rate reduction on average for All Intra Main configuration. Compared with other representative algorithms, our proposal still shows an improvement in the compression ratio, without substantial increases of computational complexity in the encoder or decoder.

关键词： Encoding Transforms Video coding Redundancy Prediction algorithms Proposals Standards

来源：评论

学校读者我要写书评

暂无评论

Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics

arXiv

引用

arXiv 2019年

作者： Shi, Jun Chen, Zhibo CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based frameworks have been proposed for this purpose due to their inherent end-to-end optimization mechanisms. However, it is still quite challenging to integrate these task-driven metrics seamlessly into traditional hybrid coding framework. To the best of our knowledge, this paper is the first work trying to solve this challenge based on reinforcement learning (RL) approach. Specifically, we formulate the bit allocation problem as a Markovian Decision Process (MDP) and train RL agents to automatically decide the quantization parameter (QP) of each coding tree unit (CTU) for HEVC intra coding, according to the task-driven semantic distortion metrics. This bit allocation scheme can maximize the semantic level fidelity of the task, such as classification accuracy, while minimizing the bit-rate. We also employ gradient class activation map (Grad-CAM) and Mask R-CNN tools to extract task-related importance maps to help the agents make decisions. Extensive experimental results demonstrate the superior performance of our approach by achieving 43.1% to 73.2% bit-rate saving over the anchor of HEVC under the equivalent task-related distortions. Copyright © 2019, The Authors. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Deep Grammatical Multi-classifier for Continuous Sign Language Recognition

Deep Grammatical Multi-classifier for Continuous Sign Langua...

引用

IEEE International Conference on Multimedia Big Data (BigMM)

作者： Chengcheng Wei Wengang Zhou Junfu Pu Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

In this paper, we propose a novel deep architecture with multiple classifiers for continuous sign language recognition. Representing the sign video with a 3D convolutional residual network and a bidirectional LSTM, we formulate continuous sign language recognition as a grammatical-rule-based classification problem. We first split a text sentence of sign language into isolated words and n-grams, where an n-gram is a sequence of consecutive n words in a sentence. Then, we propose a word-independent classifiers (WIC) module and an n-gram classifier (NGC) module to identify the words and n-grams in a sentence, respectively. A greedy decoding algorithm is employed to integrate words and n-grams into the sentence based on the confidence scores provided by both modules. Our method is evaluated on a Chinese continuous sign language recognition benchmark, and the experimental results demonstrate its effectiveness and superiority.

关键词： Videos Assistive technology Gesture recognition Feature extraction Task analysis Decoding Cats

来源：评论

学校读者我要写书评

暂无评论

QUERY DIFFICULTY ESTIMATION VIA PSEUDO RELEVANCE FEEDBACK FOR IMAGE SEARCH

QUERY DIFFICULTY ESTIMATION VIA PSEUDO RELEVANCE FEEDBACK FO...

引用

IEEE International Conference on Multimedia and Expo

作者： Qianghuai Jia Xinmei Tian Tao Mei CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Microsoft Research

ISBN: (纸本)9781479947607

Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has been widely investigated in text document retrieval for many years. However, few research works have been explored in image retrieval. State-of-the-art QDE methods in image retrieval mainly investigate the statistical characteristics (coherence, robustness, etc.) of the returned images to derive a value for indicating the query difficulty degree. To the best of our knowledge, little research has been done to directly estimate the real retrieval performance of the search results, such as average precision, instead of only an indicator. In this paper, we propose a novel query difficulty estimation approach which automatically estimate the average precision of the image search results. Specifically, we first select a set of query relevant and query irrelevant images for each query via pseudo relevance feedback. Then an efficient and effective voting scheme is proposed to estimate the relevance label of each image in the search results. Based on the images' relevance labels, the average precision of the search results returned for the given query is derived. The experimental results on a benchmark image search dataset demonstrate the effectiveness of the proposed method.

关键词： Query difficulty estimation (QDE) Image retrieval Average precision (AP) Pseudo relevance feedback (PRF) Voting scheme

来源：评论

学校读者我要写书评

暂无评论

Quality Assessment of Stereoscopic 360-degree Images from Multi-viewports

Quality Assessment of Stereoscopic 360-degree Images from Mu...

引用

Picture Coding Symposium, PCS

作者： Jiahua Xu Ziyuan Luo Wei Zhou Wenyuan Zhang Zhibo Chen CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex aspects are involved in 3D omnidirectional IQA, especially unlimited field of view (FoV) and extra depth perception, which brings difficulty to evaluate the quality of experience (QoE) of 3D omnidirectional images. In this paper, we propose a multi-viewport based full-reference stereo 360 IQA model. Due to the freely changeable viewports when browsing in the head-mounted display, our proposed approach processes the image inside FoV rather than the projected one such as equirectangular projection (ERP). In addition, since overall QoE depends on both image quality and depth perception, we utilize the features estimated by the difference map between left and right views which can reflect disparity. The depth perception features along with binocular image qualities are employed to further predict the overall QoE of 3D 360 images. The experimental results on our public Stereoscopic OmnidirectionaL Image quality assessment Database (SOLID) show that the proposed method achieves a significant improvement over some well-known IQA metrics and can accurately reflect the overall QoE of perceived images.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Generative Adversarial Network-Based Frame Extrapolation for Video Coding

Generative Adversarial Network-Based Frame Extrapolation for...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Jianping Lin Dong Liu Houqiang Li Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compensation using simple translational or affine motion models, which cannot efficiently characterize complex motions in natural video signal. In this paper, we propose a frame extrapolation method for motion estimation and compensation. Specifically, based on the several previous frames, our method directly extrapolates the current frame using a trained deep network model. The deep network we adopted is a redesigned Video Coding oriented LAplacian Pyramid of Generative Adversarial Networks (VC-LAPGAN). The extrapolated frame is then used as an additional reference frame. Experimental results show that the VC-LAPGAN is capable in estimating and compensating for complex motions, and extrapolating frames with high visual quality. Using the VC-LAPGAN, our method achieves on average 2.0% BD-rate reduction than High Efficiency Video Coding (HEVC) under low-delay P configuration.

关键词： Video coding Motion estimation Training Computational modeling Extrapolation Laplace equations Convolutional codes

来源：评论

学校读者我要写书评

暂无评论

Improving Semantic Segmentation via Label Propagation and Temporal Consistency

Improving Semantic Segmentation via Label Propagation and Te...

引用

Signal, information and Data processing (ICSIDP), IEEE International Conference on

作者： Feiyu Qin Lumeng Cao Xuejin Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (数字)9781728123455

ISBN: (纸本)9781728123462

Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, the performance of existing networks is greatly limited. In this paper, we exploit temporal correlation in video frames to improve the performance and robustness of segmentation networks. Two effective learning strategies are proposed to propagate the information from a few labeled frames to their immediate neighbor frames. First, we scale up training dataset for supervised semantic segmentation networks by generating pseudo ground-truth for neighboring frames from a labeled frame using filtered homography transformation. Furthermore, we introduce a self-supervised loss function to ensure temporal consistency between the segmentation results of adjacent frames. The experimental results demonstrate that our proposed method outperforms state-of-the-art techniques for semantic segmentation on NYU-Depth V2 dataset.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：