检索结果-内蒙古大学图书馆

A generalization theory based on independent and task-identically distributed assumption

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zheng, Guanhua Sang, Jitao Li, Houqiang Yu, Jian Xu, Changsheng University of Science and Technology of China School of Computer and Information Technology Beijing Key Laboratory of Traffic Data Analysis and Mining Beijing Jiaotong University Beijing100044 China Chinese Academy of Sciences Key Laboratory of Technology in Geo-Spatial Information Processing and Application System Hefei230026 China National Lab of Pattern Recognition Institute of Automation CAS Beijing100190 China University of Chinese Academy of Sciences

—Existing generalization theories analyze the generalization performance mainly based on the model complexity and training process. The ignorance of the task properties, which results from the widely used IID assumption, makes these theories fail to interpret many generalization phenomena or guide practical learning tasks. In this paper, we propose a new Independent and Task-Identically Distributed (ITID) assumption, to consider the task properties into the data generating process. The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance. Based on the new bound, we introduce a practical invariance enhancement algorithm from the perspective of modifying data distributions. Finally, we verify the algorithm and theorems in the context of image classification task on both toy and real-world datasets. The experimental results demonstrate the reasonableness of the ITID assumption and the effectiveness of new generalization theory in improving practical generalization performance. Copyright © 2019, The Authors. All rights reserved.

关键词： Classification (of information)

LEARNED SCALABLE IMAGE COMPRESSION WITH BIDIRECTIONAL CONTEXT DISENTANGLEMENT NETWORK

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Zhang, Zhizheng Chen, Zhibo Lin, Jianxin Li, Weiping CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical representations, we first adopt bit-plane decomposition to decompose the information coarsely before the deep-learning-based transformation. However, the information carried by different bit-planes is not only unequal in entropy but also of different importance for reconstruction. We thus take the hidden features corresponding to different bit-planes as the context and design a network topology with bidirectional flows to disentangle the contextual information for more effective compressed representations. Our proposed scheme enables us to obtain the compressed codes with scalable rates via a one-pass encoding-decoding. Experiment results demonstrate that our proposed model outperforms the state-of-the-art DNN-based scalable image compression methods in both PSNR and MS-SSIM metrics. In addition, our proposed model achieves better performance in MS-SSIM metric than conventional scalable image codecs. Effectiveness of our technical components is also verified through sufficient ablation experiments. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

Multiscale Progressive Image Compression Network Guided by Learnable Just Noticeable Distortion

学校读者我要写书评

暂无评论

Multiscale Progressive Image Compression Network Guided by L...

IEEE Visual Communications and Image processing (VCIP)

作者： Xin Jin Runchun Ye Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an end-to-end trainable image compression framework, named Multi-scale Progressive Network (MPN) to achieve spatially variant bit allocation and rate control through the guidance of a novel learnable just noticeable distortion (JND) map. Specifically, MPN's encoder archives multi-scale feature representation through a three-branched structure. Each branch employs an independent feature extraction strategy for the specific receptive field and merge progressively under the guidance of corresponding learnable JND maps that generated by our proposed Bit-Allocation sub-Network (BAN), which make MPN focus on the areas where attract the human visual system (HVS) and preserve more texture of the image during the compression procedure. Finally, a hybrid objective function is introduced to further make MPN more efficient and mimic the discriminative characteristics of the human visual system (HVS). Experiments show that MPN significantly outperforms traditional JPEG, JPEG 2000 and few state-of-art learning-based methods by multi-scale structural similarity (MS-SSIM) index, and has the ability to produce the much better visual result with rich textures, sharp edges, and fewer artifacts.

关键词： Image coding Feature extraction Transform coding Distortion Image reconstruction Bit rate Visualization

A TOPSAR Calibration Method For processing System Of GF3 Next Generation

学校读者我要写书评

暂无评论

A TOPSAR Calibration Method For Processing System Of GF3 Nex...

Asian and Pacific Conference on Synthetic Aperture Radar (APSAR)

作者： Di Yin Bing Han Jili Sun Aiping Chen Liangbo Zhao Xinzhe Yuan Lihua Zhong Yuxin Hu University of Chinese Academy of Sciences Beijing China Institute of Electronics Chinese Academy of Sciences Beijing China Beijing Institute of Telemetry and Telecommunications Technology Beijing China Chinese Academy of Space Technology China Beijing China National Satellite Ocean Application Service State Oceanic Administration Beijing China Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems Institute of Electronics Chinese Academy of Sciences Beijing China

ISBN: (数字)9781728129129

ISBN: (纸本)9781728129136

TOPSAR is an earth-imaging technique, which can provide wide swath coverage. The paper introduces a TOPSAR focusing and calibrating experiment based on the TOPSAR data acquired by Gaofen3(GF3). In this paper, we firstly derive the processor calibration factors under the demands of keeping signal energy invariant. After that, we fully analyze the impact of antenna electronic steering on TOPSAR products. Aimed to be applied to TOPSAR mode processing system of a SAR satellite, the next generation of GF3, calibration methods to processor and electronic steering was proposed in this paper.

关键词：

Generative Adversarial Network-Based Frame Extrapolation for Video Coding

学校读者我要写书评

暂无评论

Generative Adversarial Network-Based Frame Extrapolation for...

IEEE Visual Communications and Image processing (VCIP)

作者： Jianping Lin Dong Liu Houqiang Li Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compensation using simple translational or affine motion models, which cannot efficiently characterize complex motions in natural video signal. In this paper, we propose a frame extrapolation method for motion estimation and compensation. Specifically, based on the several previous frames, our method directly extrapolates the current frame using a trained deep network model. The deep network we adopted is a redesigned Video Coding oriented LAplacian Pyramid of Generative Adversarial Networks (VC-LAPGAN). The extrapolated frame is then used as an additional reference frame. Experimental results show that the VC-LAPGAN is capable in estimating and compensating for complex motions, and extrapolating frames with high visual quality. Using the VC-LAPGAN, our method achieves on average 2.0% BD-rate reduction than High Efficiency Video Coding (HEVC) under low-delay P configuration.

关键词： Video coding Motion estimation Training Computational modeling Extrapolation Laplace equations Convolutional codes

Learning based Facial Image Compression with Semantic Fidelity Metric

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Chen, Zhibo He, Tianyu CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

Surveillance and security scenarios usually require high efficient facial image compression scheme for face recognition and identification. While either traditional general image codecs or special facial image compression schemes only heuristically refine codec separately according to face verification accuracy metric. We propose a Learning based Facial Image Compression (LFIC) framework with a novel Regionally Adaptive Pooling (RAP) module whose parameters can be automatically optimized according to gradient feedback from an integrated hybrid semantic fidelity metric, including a successfully exploration to apply Generative Adversarial Network (GAN) as metric directly in image compression scheme. The experimental results verify the framework’s efficiency by demonstrating performance improvement of 71.41%, 48.28% and 52.67% bitrate saving separately over JPEG2000, WebP and neural network-based codecs under the same face verification accuracy distortion metric. We also evaluate LFIC’s superior performance gain compared with latest specific facial image codecs. Visual experiments also show some interesting insight on how LFIC can automatically capture the information in critical areas based on semantic distortion metrics for optimized compression, which is quite different from the heuristic way of optimization in traditional image compression algorithms. Copyright © 2018, The Authors. All rights reserved.

关键词： Image compression

End-to-End Facial Image Compression with Integrated Semantic Distortion Metric

学校读者我要写书评

暂无评论

End-to-End Facial Image Compression with Integrated Semantic...

IEEE Visual Communications and Image processing (VCIP)

作者： Tianyu He Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

High efficient facial image compression is broadly required and challenging for surveillance and security scenarios, while either traditional general image codecs or special facial image compression schemes only heuristically refine codec separately according to face verification accuracy metric. We propose an End-to-End Facial Image Compression (E2EFIC) framework with a novel variable block size Regionally Adaptive Pooling (RAP) module whose parameters can be automatically optimized according to gradient feedback from an integrated semantic distortion metrics, including a successful exploration to apply Generative Adversarial Network (GAN) as metric directly in image compression scheme. The experimental results verify the framework's efficiency by demonstrating performance improvement of 71.41%, 48.28% and 52.67% bitrate saving separately over JPEG2000, WebP and neural network-based codecs under the same face verification accuracy distortion metric. We also evaluate E2EFIC's superior performance gain compared with latest specific facial image codecs.

关键词： Image coding Distortion Semantics Face Bit rate Codecs

A CNN-Based In-Loop Filter with CU Classification for HEVC

学校读者我要写书评

暂无评论

A CNN-Based In-Loop Filter with CU Classification for HEVC

IEEE Visual Communications and Image processing (VCIP)

作者： Yuanying Dai Dong Liu Zheng-Jun Zha Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781538644591;9781538644584

Lossy compression of image and video yields visually annoying artifacts including blocking, blurring, ringing, etc., especially at low bit rates. In-loop filtering techniques can reduce these artifacts, improve quality, and achieve coding gain accordingly. In this paper, we present a convolutional neural network (CNN) based in-loop filter for High Efficiency Video Coding (HEVC). First, we design a new CNN structure that is composed of multiple Variable-filter-size Residue-learning blocks, namely VRCNN-ext, for artifact reduction. VRCNN-ext is trained by natural images as well as their compressed versions at different quality levels. Second, we investigate a new in-loop filter based on the trained VRCNN-ext models. Specifically, we observed that using VRCNN-ext directly on the inter pictures is not effective. To solve this problem, we further train a classifier to decide whether to use VRCNN-ext for each coding unit (CU). The classifier makes decision based on the compressed information, thus avoiding the overhead bits to control the on/off of the CNN-based filter at the CU level. Experimental results show that our scheme achieves significant bits saving than the HEVC anchor, leading to on average 9.2%, 9.6% and 7.4% BD-rate reduction on the HEVC test sequences, under all-intra, low-delay B and random-access configurations, respectively.

关键词： Decoding Encoding Training Image coding Feature extraction Copper Video coding

An end-to-end foreground-aware network for person re-identification

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Liu, Yiheng Zhou, Wengang Liu, Jianzhuang Qi, Guojun Tian, Qi Li, Houqiang CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei230027 China Noah’s Ark Lab Huawei Technologies Company Limited Shenzhen518129 China Huawei Cloud EI Product Department Cloud & AI Huawei Technologies

Person re-identification is a crucial task of identifying pedestrians of interest across multiple surveillance camera views. For person re-identification, a pedestrian is usually represented with features extracted from a rectangular image region that inevitably contains the scene background, which incurs ambiguity to distinguish different pedestrians and degrades the accuracy. Thus, we propose an end-to-end foreground-aware network to discriminate against the foreground from the background by learning a soft mask for person re-identification. In our method, in addition to the pedestrian ID as supervision for the foreground, we introduce the camera ID of each pedestrian image for background modeling. The foreground branch and the background branch are optimized collaboratively. By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to backgrounds, which greatly reduces the negative impact of changing backgrounds on pedestrian matching across different camera views. Notably, in contrast to existing methods, our approach does not require an additional dataset to train a human landmark detector or a segmentation model for locating the background regions. The experimental results conducted on three challenging datasets, i.e., Market-1501, DukeMTMC-reID, and MSMT17, demonstrate the effectiveness of our approach. Copyright © 2019, The Authors. All rights reserved.

关键词： Cameras