检索结果-内蒙古大学图书馆

IEEE International Conference on Multimedia and Expo (ICME)

作者： Yiheng Zhang Dong Liu Zheng-Jun Zha CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509060689

Vehicle re-identification (re-id) plays an important role in the automatic analysis of the drastically increasing urban surveillance videos. Similar to the other image retrieval problems, vehicle re-id suffers from the difficulties caused by various poses of vehicles, diversified illuminations, and complicated environments. Triplet-wise training of convolutional neural network (CNN) has been studied to address these challenges, where the CNN is adopted to automate the feature extraction from images, and the training adopts triplets of (query, positive example, negative example) to capture the relative similarity between them to learn representative features. The traditional triplet-wise training is weakly constrained and thus fails to achieve satisfactory results. We propose to improve the triplet-wise training at two aspects: first, a stronger constraint namely classification-oriented loss is augmented with the original triplet loss; second, a new triplet sampling method based on pairwise images is designed. Our experimental results demonstrate the effectiveness of the proposed methods that achieve superior performance than the state-of-the-arts on two vehicle re-id datasets, which are derived from real-world urban surveillance videos.

关键词： Training Feature extraction Streaming media Videos Sampling methods Licenses Cameras

来源：评论

学校读者我要写书评

暂无评论

Fast genetic multi-operator image retargeting

Fast genetic multi-operator image retargeting

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Lingling Zhu Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509053179

Content-aware image retargeting has attracted substantial research interests in the related research community. However, so far there is still no method can adequately preserve important image contents and structure well without introducing conspicuous visible deformation in a relatively short period of time. To address this problem, we propose a Fast Genetic Multi-operator (FGM) method which integrates multiple retargeting operators. To improve the efficiency, FGM method utilizes Genetic Algorithms (GAs) to reach the optimal operator ratio, which adopts saliency and Gray-Level Co-occurrence Matrix (GLCM) as its energy function. FGM method not only can well preserve salient contents and structure, but also can greatly reduce the computational complexity. Experimental results demonstrated that our method outperforms state-of-art image retargeting methods.

关键词： Genetic algorithms Biological cells Genetics Distortion Computational complexity Visualization

来源：评论

学校读者我要写书评

暂无评论

Efficient Integer-Arithmetic-Only Convolutional Networks with Bounded ReLU

Efficient Integer-Arithmetic-Only Convolutional Networks wit...

引用

IEEE International Symposium on Circuits and systems (IScas)

作者： Hengrui Zhao Dong Liu Houqiang Li CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

To facilitate large-scale deployment of convolutional networks, integer-arithmetic-only inference has been demonstrated effective, which not only reduces computational cost but also ensures cross-platform consistency. However, previous studies on integer networks usually report a decline in the inference accuracy, given the same number of parameters as floating-point-number (FPN) networks. In this paper, we propose to finetune and quantize a well-trained FPN convolutional network to obtain an integer convolutional network. Our key idea is to adjust the upper bound of a bounded rectified linear unit (ReLU), which replaces the normal ReLU and effectively controls the dynamic range of activations. Based on the tradeoff between learning ability and quantization error of networks, we managed to preserve full accuracy after quantization and obtain efficient integer networks. Our experiments on ResNet for image classification demonstrate that our 8-bit integer networks achieve state-of-the-art performance compared with Google's TensorFlow and NVIDIA's TensorRT. Moreover, we experiment on VDSR for image super-resolution and on VRCNN for compression artifact reduction, both of which serve regression tasks that natively require high inference accuracy. Besides ensuring the equivalent performance as the corresponding FPN networks, our integer networks have only 1/4 memory cost and run 2× faster on GPUs.

关键词： Upper bound Quantization (signal) Image coding Superresolution Dynamic range Task analysis Image classification

来源：评论

学校读者我要写书评

暂无评论

3D-HEVC visual quality assessment: Database and bitstream model

3D-HEVC visual quality assessment: Database and bitstream mo...

引用

International Workshop on Quality of Multimedia Experience, QoMEx

作者： Wei Zhou Ning Liao Zhibo Chen Weiping Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781509003556

Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the latest 3D-HEVC video coding standard, to investigate the relationship among video quality, depth quality, and overall quality of experience (QoE) of 3D/stereoscopic video. We also analyze the pivotal factors to the video and depth qualities. Moreover, we develop a No-Reference 3D-HEVC bitstream-level objective video quality assessment model, which utilizes the key features extracted from the 3D video bitstreams to assess the perceived quality of the stereoscopic video. The model is verified to be effective on our database as compared with widely used 2D Full-Reference quality metrics as well as a state-of-the-art 3D FR pixel-level video quality metric.

关键词： Quality assessment Three-dimensional displays Video recording Databases Stereo image processing Visualization Cameras

来源：评论

学校读者我要写书评

暂无评论

SDM: Semantic Distortion Measurement for Video Encryption

SDM: Semantic Distortion Measurement for Video Encryption

引用

International Conference on Automatic Face and Gesture Recognition

作者： Yongquan Hu Wei Zhou Shuxin Zhao Zhibo Chen Weiping Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

Semantic information is important in video encryption. However, existing image quality assessment (IQA) methods, such as the peak signal to noise ratio (PSNR), are still widely applied to measure the encryption security. Generally, these traditional IQA methods aim to evaluate the image quality from the perspective of visual signal rather than semantic information. In this paper, we propose a novel semantic-level full-reference image quality assessment (FR-IQA) method named Semantic Distortion Measurement (SDM) to measure the degree of semantic distortion for video encryption. Then, based on a semantic saliency dataset, we verify that the proposed SDM method outperforms state-of-the-art algorithms. Furthermore, we construct a Region Of Semantic Saliency (ROSS) video encryption system to demonstrate the effectiveness of our proposed SDM method in the practical application.

关键词： Semantics Encryption Distortion Distortion measurement Visualization Object segmentation

来源：评论

学校读者我要写书评

暂无评论

Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics

arXiv

引用

arXiv 2019年

作者： Shi, Jun Chen, Zhibo CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based frameworks have been proposed for this purpose due to their inherent end-to-end optimization mechanisms. However, it is still quite challenging to integrate these task-driven metrics seamlessly into traditional hybrid coding framework. To the best of our knowledge, this paper is the first work trying to solve this challenge based on reinforcement learning (RL) approach. Specifically, we formulate the bit allocation problem as a Markovian Decision Process (MDP) and train RL agents to automatically decide the quantization parameter (QP) of each coding tree unit (CTU) for HEVC intra coding, according to the task-driven semantic distortion metrics. This bit allocation scheme can maximize the semantic level fidelity of the task, such as classification accuracy, while minimizing the bit-rate. We also employ gradient class activation map (Grad-CAM) and Mask R-CNN tools to extract task-related importance maps to help the agents make decisions. Extensive experimental results demonstrate the superior performance of our approach by achieving 43.1% to 73.2% bit-rate saving over the anchor of HEVC under the equivalent task-related distortions. Copyright © 2019, The Authors. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Improving Compression Artifact Reduction via End-to-End Learning of Side information

Improving Compression Artifact Reduction via End-to-End Lear...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Haichuan Ma Dong Liu Feng Wu CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (数字)9781728180687

ISBN: (纸本)9781728180694

We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.

关键词： Image coding Neural networks Decoding Training Feature extraction Computational modeling Transform coding

来源：评论

学校读者我要写书评

暂无评论

LEARNED SCAlabLE IMAGE COMPRESSION WITH BIDIRECTIONAL CONTEXT DISENTANGLEMENT NETWORK

arXiv

引用

arXiv 2018年

作者： Zhang, Zhizheng Chen, Zhibo Lin, Jianxin Li, Weiping CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical representations, we first adopt bit-plane decomposition to decompose the information coarsely before the deep-learning-based transformation. However, the information carried by different bit-planes is not only unequal in entropy but also of different importance for reconstruction. We thus take the hidden features corresponding to different bit-planes as the context and design a network topology with bidirectional flows to disentangle the contextual information for more effective compressed representations. Our proposed scheme enables us to obtain the compressed codes with scalable rates via a one-pass encoding-decoding. Experiment results demonstrate that our proposed model outperforms the state-of-the-art DNN-based scalable image compression methods in both PSNR and MS-SSIM metrics. In addition, our proposed model achieves better performance in MS-SSIM metric than conventional scalable image codecs. Effectiveness of our technical components is also verified through sufficient ablation experiments. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Global Homography Motion Compensation for Versatile Video Coding

Global Homography Motion Compensation for Versatile Video Co...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Yao Li Zhuoyuan Li Li Li Dong Liu Houqiang Li CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781665475938

In Versatile Video Coding (VVC), local affine motion compensation (LAMC) is adopted to handle complex motions, such as rotation and zooming. However, it is inefficient to use LAMC to handle the global motion due to the following two reasons. First, the use of LAMC may lead to some extra bit cost on the affine motion model parameters. Second, the precision of LAMC is restricted by the MV precision of the control points. Therefore, in this paper, we propose a global homography motion compensation (GHMC) framework to better characterize the global motion. For each coding block, an extra mode is added to perform motion compensation based on an 8-parameter global homography motion model. In addition, an extrapolation scheme is designed to derive the parameters from reference frames to save the bit cost for signaling them. The proposed framework is implemented into the VVC reference software VTM-6.0. Experimental results show that, on average, 0.69% and 0.66% BD-rate reduction is achieved under Low Delay P and Low Delay B configurations, respectively, for sequences with rich complex global motions.

关键词： Video coding Extrapolation Adaptation models Costs Image coding Visual communication Motion compensation

来源：评论

学校读者我要写书评

暂无评论

Deeply Exploit Depth information for Object Detection

Deeply Exploit Depth Information for Object Detection

引用

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

作者： Saihui Hou Zilei Wang Feng Wu CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance.

关键词： Object detection Visualization Image color analysis Feature extraction geometry Gravity Computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：