检索结果-内蒙古大学图书馆

Deep learning-based video coding: A review and a case study

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Liu, Dong Li, Yue Lin, Jianping Li, Houqiang Wu, Feng CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. According to the newest reports, deep schemes have achieved comparable or even higher compression efficiency than the state-of-the-art traditional schemes, such as High Efficiency Video Coding (HEVC) based scheme, for image coding;deep tools have demonstrated the compression capability beyond HEVC for video coding. However, deep schemes have not yet reached the current height of HEVC for video coding, and deep tools remain largely unexplored at many aspects including the tradeoff between compression efficiency and encoding/decoding complexity, the optimization for perceptual naturalness or semantic quality, the speciality and universality, the federated design of multiple deep tools, and so on. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Vi

关键词： Video signal processing

Multi-dimensional Parameter Estimation in RIS-aided MU-MIMO Channels

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Mo, Linlin Song, Yi Saggese, Fabio Lu, Xinhua Wang, Zhongyong Popovski, Petar Academy for Electronic Information Discipline Studies Nanyang Institute of Technology Nanyang473000 China Key Laboratory of Grain Information Processing and Control Ministry of Education Henan Engineering Research Center of Grain Condition Intelligent Detection and Application Henan University of Technology Zhengzhou450001 China School of Electrical and Information Engineering Zhengzhou University Zhengzhou450001 China Department of Electronic Systems Aalborg University Aalborg9220 Denmark

We address the channel estimation problem in reconfigurable intelligent surface (RIS) aided broadband systems by proposing a dual-structure and multi-dimensional transformations (DS-MDT) algorithm. The proposed approach leverages the dual-structure features of the channel parameters to assist users experiencing weaker channel conditions, thereby enhancing estimation performance. Moreover, given that the channel parameters are distributed across multiple dimensions of the received tensor, the proposed algorithm employs multidimensional transformations to effectively isolate and extract distinct parameters. The numerical results demonstrate the proposed algorithm reduces the normalized mean square error (NMSE) by up to 10 dB while maintaining lower complexity compared to state-of-the-art methods. © 2025, CC BY.

关键词： Channel estimation

A STUDY ON THE FREQUENCY AND AZIMUTH COHERENCE OF HIGH-RESOLUTION SAR IMAGE

学校读者我要写书评

暂无评论

A STUDY ON THE FREQUENCY AND AZIMUTH COHERENCE OF HIGH-RESOL...

IEEE International geoscience and Remote Sensing Symposium

作者： Wenji Xing Xiaolan Qiu Chibiao Ding The Key Laboratory of Technology in Geo-spatial Information Processing and Application System Institute of Electronics Chinese Academy of Sciences Beijing China

High-resolution SAR has large transmitting bandwidth and wide synthetic aperture. How to understand and take advantage of the variation characteristics of SAR scattering characteristics with angle and frequency is a topic that worth studying. This article establishes a coherence matrix of sub-band and sub-aperture SAR images, and analyzes its ability to classify scattering mechanism. Experiments are conducted using the TerraSAR-X high-resolution data of different scenarios, and some meaningful results are got, which may provide some support to the analysis and application of high-resolution SAR data.

关键词： Coherence Entropy Azimuth Radar polarimetry Synthetic aperture radar Image resolution

Learned fast HEVC intra coding

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Chen, Zhibo Shi, Jun Li, Weiping CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

In High Efficiency Video Coding (HEVC), excellent rate-distortion (RD) performance is achieved in part by having a flexible quadtree coding unit (CU) partition and a large number of intra-prediction modes. Such an excellent RD performance is achieved at the expense of much higher computational complexity. In this paper, we propose a learned fast HEVC intra coding (LFHI) framework taking into account the comprehensive factors of fast intra coding to reach an improved configurable tradeoff between coding performance and computational complexity. First, we design a low-complex shallow asymmetric-kernel CNN (AK-CNN) to efficiently extract the local directional texture features of each block for both fast CU partition and fast intra-mode decision. Second, we introduce the concept of the minimum number of RDO candidates (MNRC) into fast mode decision, which utilizes AK-CNN to predict the minimum number of best candidates for RDO calculation to further reduce the computation of intra-mode selection. Third, an evolution optimized threshold decision (EOTD) scheme is designed to achieve configurable complexity-efficiency tradeoffs. Finally, we propose an interpolation-based prediction scheme that allows for our framework to be generalized to all quantization parameters (QPs) without the need for training the network on each QP. The experimental results demonstrate that the LFHI framework has a high degree of parallelism and achieves a much better complexity-efficiency tradeoff, achieving up to 75.2% intra-mode encoding complexity reduction with negligible rate-distortion performance degradation, superior to the existing fast intra-coding schemes. Copyright © 2019, The Authors. All rights reserved.

关键词： Signal distortion

A COOPERATIVE MULTITEMPORAL SEGMENTATION METHOD FOR SAR AND OPTICAL IMAGES CHANGE DETECTION

学校读者我要写书评

暂无评论

A COOPERATIVE MULTITEMPORAL SEGMENTATION METHOD FOR SAR AND ...

IEEE International geoscience and Remote Sensing Symposium

作者： Ling Wan Yuming Xiang Hongjian You Key Laboratory of Technology in Geo-spatial Information Processing and Application System Institute of Electronics Chinese Academy of Sciences Beijing China

This paper proposes an extension version of our previous work MS-CC to achieve optical and SAR images change detection. The proposed method introduces a cooperative multitemporal segmentation, whose merging process considers the heterogeneity of SAR and optical images as parallel information, making sure that the multitemporal information can be fully utilized without interfering with each other. Then, the change detection strategy based on compound classification is carried out on the segmentation results, obtaining the multi-scale change detection maps. Experimental validation is conducted with GoaFen3 and Google Earth data.

关键词： Image segmentation Optical imaging Radar polarimetry Optical sensors Compounds Interference Stacking

RC-CNN: Representation-Consistent Convolutional Neural Networks for Achieving Transformation Invariance

学校读者我要写书评

暂无评论

RC-CNN: Representation-Consistent Convolutional Neural Netwo...

IEEE International Conference on systems, Man and Cybernetics

作者： Jun Gu Anfeng He Xinmei Tian CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei Anhui China

Convolutional neural networks (CNNs) are powerful and have achieved state-of-the-art performance in many visual recognition tasks. Despite their impressive performance, CNNs are still unable to remain invariant while some spatial transformations are applied on images. Herein, we propose representation-consistent neural networks to solve this problem. By introducing consistent losses between the representations in different layers of transformed images, the recognition performance of transformed images is significantly improved. This model not only learns to map from the transformed images to the pre-defined labels but each layer also learns to generate invariant representations when the input images are transformed. All the characteristics of transformation invariance are embedded in the model, which means that no extra parameters or computations are introduced in the well-trained model. Comparative experiments demonstrate the superiority of our model when learning invariance to rotation, translation, and scaling on large-scale image recognition and retrieval tasks.

关键词： Computational modeling Feature extraction Training Image recognition Data models Task analysis Kernel

Quality assessment of stereoscopic 360-degree images from multi-viewports

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Xu, Jiahua Luo, Ziyuan Zhou, Wei Zhang, Wenyuan Chen, Zhibo CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex aspects are involved in 3D omnidirectional IQA, especially unlimited field of view (FoV) and extra depth perception, which brings difficulty to evaluate the quality of experience (QoE) of 3D omnidirectional images. In this paper, we propose a multi-viewport based full-reference stereo 360 IQA model. Due to the freely changeable viewports when browsing in the head-mounted display, our proposed approach processes the image inside FoV rather than the projected one such as equirectangular projection (ERP). In addition, since overall QoE depends on both image quality and depth perception, we utilize the features estimated by the difference map between left and right views which can reflect disparity. The depth perception features along with binocular image qualities are employed to further predict the overall QoE of 3D 360 images. The experimental results on our public Stereoscopic OmnidirectionaL Image quality assessment Database (SOLID) show that the proposed method achieves a significant improvement over some well-known IQA metrics and can accurately reflect the overall QoE of perceived images. Copyright © 2019, The Authors. All rights reserved.

关键词： Quality of service

Two-stream action recognition-oriented video super-resolution

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhang, Haochen Liu, Dong Xiong, Zhiwei CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets–UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy. Copyright © 2019, The Authors. All rights reserved.

关键词： Optical resolving power

ON THE USE OF CNN FOR AUTOMATED QUALITY ASSESSMENT OF GF-3 POLARIMETRIC DATA

学校读者我要写书评

暂无评论

ON THE USE OF CNN FOR AUTOMATED QUALITY ASSESSMENT OF GF-3 P...

IEEE International geoscience and Remote Sensing Symposium

作者： Songtao Shangguan Xiaolan Qiu Bin Lei The Key Laboratory of Technology in Geo-spatial Information Processing and Application System Institute of Electronics Chinese Academy of Sciences Beijing China

With the needs of quality assessment for massive GF-3 polarimetric data, a method based on common distribution targets has been proposed by Sha Jiang. However, it needs manually selection of those woodlands, and cannot be performed automatically. In this paper, an automated GF-3 full-polarization SAR data quality assessment method is conducted using a classic Convolution Neural Network (VGG-16). The network is pre-trained by Radarsat-2 PolSAR data and then trained by selected typical GF-3 scenes. It is supposed to learn the features of the targets, which satisfies the azimuthal symmetry and backscatter reciprocity and fulfills the quality assessment work. Several typical GF-3 strips data are used to test the method. Experiments show that the network can predict the plots of targets from a new scene under the interference of polarimetric distortion and noise. And, the quality assessment results by the network are consistent with the manual assessment results, which shows the effectiveness of the method.

关键词： Quality assessment Distortion Data integrity Strips Manuals Forestry Radar polarimetry