检索结果-内蒙古大学图书馆

arXiv 2020年

作者： Lin, Jianping Liu, Dong Li, Houqiang Wu, Feng Cas Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027

We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respec-tively. To compensate for the compression error of the autoencoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple ref-erence frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss func-tion. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available. Copyright © 2020, The Authors. All rights reserved.

关键词： Image compression

来源：评论

学校读者我要写书评

暂无评论

Multiscale Progressive Image Compression Network Guided by Learnable Just Noticeable Distortion

Multiscale Progressive Image Compression Network Guided by L...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Xin Jin Runchun Ye Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an end-to-end trainable image compression framework, named Multi-scale Progressive Network (MPN) to achieve spatially variant bit allocation and rate control through the guidance of a novel learnable just noticeable distortion (JND) map. Specifically, MPN's encoder archives multi-scale feature representation through a three-branched structure. Each branch employs an independent feature extraction strategy for the specific receptive field and merge progressively under the guidance of corresponding learnable JND maps that generated by our proposed Bit-Allocation sub-Network (BAN), which make MPN focus on the areas where attract the human visual system (HVS) and preserve more texture of the image during the compression procedure. Finally, a hybrid objective function is introduced to further make MPN more efficient and mimic the discriminative characteristics of the human visual system (HVS). Experiments show that MPN significantly outperforms traditional JPEG, JPEG 2000 and few state-of-art learning-based methods by multi-scale structural similarity (MS-SSIM) index, and has the ability to produce the much better visual result with rich textures, sharp edges, and fewer artifacts.

关键词： Image coding Feature extraction Transform coding Distortion Image reconstruction Bit rate Visualization

来源：评论

学校读者我要写书评

暂无评论

Convolutional Neural Network-Based Residue Super-Resolution for Video Coding

Convolutional Neural Network-Based Residue Super-Resolution ...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Kang Liu Dong Liu Houqiang Li Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that operate in the pixel domain, i.e. down- and up-sampling of image or video frame, we propose to perform down- and up-sampling in the residue domain. Specifically, for each block, we perform motion estimation and compensation to achieve residual signal at the original resolution, then we down-sample the residue and compress it at low resolution, and perform residue SR using a trained CNN model. We design a new CNN for residue SR with the help of the motion compensated prediction signal. We integrate the residue SR method into the High Efficiency Video Coding (HEVC) scheme, providing mode decision at the level of coding tree unit. Experimental results show that our method achieves on average 4.0% and 2.8% BD-rate reduction under low-delay P and low-delay B configurations, respectively.

关键词： Encoding Signal resolution Video coding spatial resolution Delays Convolution

来源：评论

学校读者我要写书评

暂无评论

3D-HEVC visual quality assessment: Database and bitstream model

3D-HEVC visual quality assessment: Database and bitstream mo...

引用

International Workshop on Quality of Multimedia Experience, QoMEx

作者： Wei Zhou Ning Liao Zhibo Chen Weiping Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509003556

Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the latest 3D-HEVC video coding standard, to investigate the relationship among video quality, depth quality, and overall quality of experience (QoE) of 3D/stereoscopic video. We also analyze the pivotal factors to the video and depth qualities. Moreover, we develop a No-Reference 3D-HEVC bitstream-level objective video quality assessment model, which utilizes the key features extracted from the 3D video bitstreams to assess the perceived quality of the stereoscopic video. The model is verified to be effective on our database as compared with widely used 2D Full-Reference quality metrics as well as a state-of-the-art 3D FR pixel-level video quality metric.

关键词： Quality assessment Three-dimensional displays Video recording Databases Stereo image processing Visualization Cameras

来源：评论

学校读者我要写书评

暂无评论

CASpatch: A SAR image simulation code to support ATR applications

CASpatch: A SAR image simulation code to support ATR applica...

引用

2009 Asia-Pacific Conference on Synthetic Aperture Radar, APSAR 2009

作者： Zhang, Rui Hong, Jun Ming, Feng Key Laboratory of Spatial Information Processing and Application System Technology CAS China Institute of Electronics Chinese Academy of Sciences Beijing China Graduate University of the Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781424427314

In this paper, A SAR image simulation code of 3D complex targets named CASpatch is introduced. This code is based on the high frequency technique of shooting and bouncing rays (SBR). The original purpose to design the code is for SAR automatic target recognition (ATR) applications, but it can also be used for RCS prediction and high resolution range profile (HRRP) generation. A GB-SAR indoor experiment is used to validate the CASpatch system. The simulation results of complex targets show that high resolution, full polarimetry and wide bandwidth SAR images can be obtained via CASpatch, which means our code can support multiple SAR ATR applications. What's more, an improved PolSAR ATR algorithm is also proposed in this paper. The full PolSAR ATR experiment based on simulated data is firstly reported in this paper, and higher recognition rate is achieved comparing with the single PolSAR result. ©2009 IEEE.

关键词： Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

SEAMLESS SWITCHING OF H.265/HEVC-CODED DASH REPRESENTATIONS WITH OPEN GOP PREDICTION STRUCTURE

SEAMLESS SWITCHING OF H.265/HEVC-CODED DASH REPRESENTATIONS ...

引用

IEEE International Conference on Image processing

作者： Ye Yan Miska M. Hannuksela Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Nokia Technologies

ISBN: (纸本)9781479983407

The Dynamic Adaptive Streaming over HTTP (DASH) enables bitrate adaptation through different representations of the same content. It is common to encode random access point (RAP) pictures at segment boundaries to support representation switching. As an open group of pictures (GOP) results into a temporary discontinuity of the video playback due to the inability to decode some pictures when switching representations, closed GOP prediction structures are normally used in DASH. This paper proposes two similar methods for using the open GOP prediction structure in DASH representations while a full picture rate is maintained also during representation switching. The first method is enabled with straightforward changes in the decoding of the High Efficiency Video Coding (H.265/HEVC) standard, whereas the second method utilizes the adaptive resolution change feature of the scalable (SHVC) extension of H.265/HEVC. Experiments show that the proposed methods outperform the use of closed GOPs by 5.6% on average in terms of Bjontegaard delta bitrate (BD-rate).

关键词： DASH H.265 HEVC representations High efficiency video coding switching SEAMLESS DECODING picture Bit rate

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Based Nonlinear Transform for HEVC Intra Coding

Deep Learning-Based Nonlinear Transform for HEVC Intra Codin...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Kun Yang Dong Liu Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (数字)9781728180687

ISBN: (纸本)9781728180694

In the hybrid video coding framework, transform is adopted to exploit the dependency within the input signal. In this paper, we propose a deep learning-based nonlinear transform for intra coding. Specifically, we incorporate the directional information into the residual domain. Then, a convolutional neural network model is designed to achieve better decorrelation and energy compaction than the conventional discrete cosine transform. This work has two main contributions. First, we propose to use the intra prediction signal to reduce the directionality in the residual. Second, we present a novel loss function to characterize the efficiency of the transform during the training. To evaluate the compression performance of the proposed transform, we implement it into the High Efficiency Video Coding reference software. Experimental results demonstrate that the proposed method achieves up to 1.79% BD-rate reduction for natural videos.

关键词： Transforms Discrete cosine transforms Neural networks Transform coding Video coding Image coding Decoding

来源：评论

学校读者我要写书评

暂无评论

Improving triplet-wise training of convolutional neural network for vehicle re-identification

Improving triplet-wise training of convolutional neural netw...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Yiheng Zhang Dong Liu Zheng-Jun Zha CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509060689

Vehicle re-identification (re-id) plays an important role in the automatic analysis of the drastically increasing urban surveillance videos. Similar to the other image retrieval problems, vehicle re-id suffers from the difficulties caused by various poses of vehicles, diversified illuminations, and complicated environments. Triplet-wise training of convolutional neural network (CNN) has been studied to address these challenges, where the CNN is adopted to automate the feature extraction from images, and the training adopts triplets of (query, positive example, negative example) to capture the relative similarity between them to learn representative features. The traditional triplet-wise training is weakly constrained and thus fails to achieve satisfactory results. We propose to improve the triplet-wise training at two aspects: first, a stronger constraint namely classification-oriented loss is augmented with the original triplet loss; second, a new triplet sampling method based on pairwise images is designed. Our experimental results demonstrate the effectiveness of the proposed methods that achieve superior performance than the state-of-the-arts on two vehicle re-id datasets, which are derived from real-world urban surveillance videos.

关键词： Training Feature extraction Streaming media Videos Sampling methods Licenses Cameras

来源：评论

学校读者我要写书评

暂无评论

Light Field Compression Based on Implicit Neural Representation

arXiv

引用

arXiv 2024年

作者： Wang, Henan Zhu, Hanxin Chen, Zhibo CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field compression scheme based on implicit neural representation to reduce redundancies between views. We store the information of a light field image implicitly in an neural network and adopt model compression methods to further compress the implicit representation. Extensive experiments have demonstrated the effectiveness of our proposed method, which achieves comparable rate-distortion performance as well as superior perceptual quality over traditional methods. Copyright © 2024, The Authors. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment

Deep Local and Global Spatiotemporal Feature Aggregation for...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Wei Zhou Zhibo Chen CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (数字)9781728180687

ISBN: (纸本)9781728180694

In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.

关键词： Quality assessment Video recording Feature extraction Spatiotemporal phenomena Databases Indexes Deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：