检索结果-内蒙古大学图书馆

arXiv 2018年

作者： Qian, Kun Zhou, Jun Xiong, Fengchao Zhou, Huixin Du, Juan Lab of Optoelectronic Imaging and Image Processing Xidian University Xi’an China School of Information and Communication Technology Griffith University Brisbane Australia College of Computer Science Zhejiang University Hangzhou China

Target tracking in hyperspectral videos is a new research topic. In this paper, a novel method based on convolutional network and Kernelized Correlation Filter (KCF) framework is presented for tracking objects of interest in hyperspectral videos. We extract a set of normalized three-dimensional cubes from the target region as fixed convolution filters which contain spectral information surrounding a target. The feature maps generated by convolutional operations are combined to form a three-dimensional representation of an object, thereby providing effective encoding of local spectral-spatial information. We show that a simple two-layer convolutional networks is sufficient to learn robust representations without the need of offline training with a large dataset. In the tracking step, KCF is adopted to distinguish targets from neighboring environment. Experimental results demonstrate that the proposed method performs well on sample hyperspectral videos, and outperforms several state-of-the-art methods tested on grayscale and color videos in the same scene. Copyright © 2018, The Authors. All rights reserved.

关键词： Target tracking

来源：评论

学校读者我要写书评

暂无评论

Automatic exudate detection in color fundus images 13th

Automatic exudate detection in color fundus images

引用

13th International Forum of Digital TV and Wireless Multimedia communication, IFTC 2016

作者： Qi, Fucong Li, Guo Zheng, Shibao Institute of Image Communication and Network Engineering Shanghai Key Labs of Digital Media Processing and Transmission Shanghai Jiao Tong University Shanghai200240 China

ISBN: (纸本)9789811042102

Diabetic retinopathy is a major cause of blindness in working age population and exudates are considered the most significant characteristics of diabetic retinopathy. Therefore, automatic exudate detection is beneficial to large-scale diabetic retinopathy screening. In this paper, an automatic approach for detection of exudates on color fundus images is presented and discussed, which is based on the thresholding technique and Kirsch’s edge detection. Besides, a color space conversion step (from RGB to YIQ) is utilized to improve the detection performance. The method is evaluated on a public dataset of fundus images from various ethnic groups. We obtain an average sensitivity of 75.17% and an average specificity of 97.98%, which outperforms the baseline method and validates the effectiveness of the proposed method. © Springer Nature Singapore Pte Ltd. 2017.

关键词： Edge detection

来源：评论

学校读者我要写书评

暂无评论

An effective crowd property analysis system for video surveillance application 13th

An effective crowd property analysis system for video survei...

引用

13th International Forum of Digital TV and Wireless Multimedia communication, IFTC 2016

作者： Yang, Shuying Yang, Hua Li, Jijia Zhu, Ji Institute of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai China Shanghai Key Laboratory of Digital Media Processing and Transmission Shanghai China

ISBN: (纸本)9789811042102

For public security, an intelligent video surveillance system that can analyze large-scale crowd scenes has become an urgent need. In this paper, we propose a system that integrates multiple crowd properties, including stationary and dynamic features, local and global characteristics, and historic statistics analysis in a unified framework. Specially our system consists of four modules. Crowd density module describes global density level and local density distribution with sparse spatial-temporal local binary pattern. Crowd segmentation module presents both global crowd grouping and local moving directions based on spatial-temporal dynamics. In crowd saliency module, salient regions are detected to alarm abnormal behaviors. At last, in order to analyze the historic features of video streaming, a historical statistics analysis module is introduced. Experiments on different crowd datasets show that our system is robust and feasible, and satisfies the requirements of video surveillance applications. © Springer Nature Singapore Pte Ltd. 2017.

关键词： Local binary pattern

来源：评论

学校读者我要写书评

暂无评论

A time-efficient video stabilization algorithm based on Block Matching in a restricted search space

A time-efficient video stabilization algorithm based on Bloc...

引用

2017 IEEE International Conference on Real-Time Computing and Robotics, RCAR 2017

作者： Joseph, Kevin Raj, Alex Noel Joseph Fan, Zhun Vidhyapathi, C.M. School of Electronics Engineering VIT University India Key Lab of Digital Signal and Image Processing of Guangdong Province Shantou University China

ISBN: (纸本)9781538620342

In this paper, we study various video stabilization techniques and develop an algorithm which can perform video stabilization under strict time constraints. To do this, an optimized version of block matching in a restricted search space is utilized to minimize the use of computational resources. We also develop an experimental setup to do real-time video stabilization under various vibrating conditions. In this study, we have also compared our algorithm with an existing stabilization algorithm and looked at how the two techniques perform under different circumstances. © 2017 IEEE.

关键词： Motion compensation

来源：评论

学校读者我要写书评

暂无评论

Controlling Expressivity using Input Codes in Neural Network based TTS

Controlling Expressivity using Input Codes in Neural Network...

引用

Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia)

作者： Xiaolian Zhu Lei Xie Xiao Chen Xiaoyan Lou Xuan Zhu Xingjun Tan Shaanxi Provincial Key Laboratory of Speech and Image Information Processing School of Computer Science Northwestern Polytechnical University Xi’an Hebei University of Economics and Business Shijiazhuang China Shaanxi Provincial Key Laboratory of Speech and Image Information Processing School of Computer Science Northwestern Polytechnical University Xi’an China Language Computing Lab Samsung R&D Institute of China Beijing China

ISBN: (纸本)9781538653128

This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the input of a BLSTM-based acoustic model, to control the expressivity of the synthesized speech. The input codes, in one-hot representation, include dialogue code, sentiment code and sentence position code. The dialogue code indicates whether the text is a dialogue or narration in an audiobook story. The sentiment code is obtained from a sentiment analysis tool, which labels each sentence as positive, negative and neutral. The sentence position code indicates the position of the sentence in the paragraph. We believe these codes are highly related to the expressiveness of the audiobook speech. Experiments on the data from the Blizzard Challenge 2017 demonstrate the effectiveness of the use of input codes in the neural network approach for expressive TTS.

关键词： Neural networks Linguistics Acoustics Hidden Markov models Speech synthesis Adaptation models Speech coding

来源：评论

学校读者我要写书评

暂无评论

Multimodal measurement of depression using deep learning models 7

Multimodal measurement of depression using deep learning mod...

引用

7th Annual Workshop on Audio/Visual Emotion Challenge, AVEC 2017

作者： Yang, Le Pei, Ercheng Jiang, Dongmei Oveneke, Meshia Cédric Xia, Xiaohan Sahli, Hichem Shaanxi Key Lab on Speech and Image Information Processing 127 Youyi Xilu Xi'an710072 China Pleinlaan 2 Brussels1050 Belgium

ISBN: (纸本)9781450355025

This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio, video and text streams. For each modality, handcrafted feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, then the learned features are fed to a DNN to predict the PHQ-8 scores. For multi-modal fusion, the estimated PHQ-8 scores from the three modalities are integrated in a DNN to obtain the final PHQ-8 score. Moreover, in this work, we propose new feature descriptors for text and video. For the text descriptors, we select the participant's answers to the questions associated with psychoanalytic aspects of depression, such as sleep disorder, and make use of the Paragraph Vector (PV) to learn the distributed representations of these sentences. For the video descriptors, we propose a new global descriptor, the Histogram of Displacement Range (HDR), calculated directly from the facial landmarks to measure their displacements and speed. Experiments have been carried out on the AVEC2017 depression sub-challenge dataset. The obtained results show that the proposed depression recognition framework obtains very promising accuracy, with the root mean square error (RMSE) as 4.653, mean absolute error (MAE) as 3.980 on the development set, and RMSE as 5.974, MAE as 5.163 on the test set. © 2017 Association for Computing Machinery.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

RGB-D based action recognition with light-weight 3D convolutional networks

arXiv

引用

arXiv 2018年

作者： Zhang, Haokui Li, Ying Wang, Peng Liu, Yu Shen, Chunhua Shaanxi Provincial Key Lab of Speech and Image Information Processing School of Computer Science Northwestern Polytechnical University Xi’an710129 China School of Computer Science University of Adelaide AdelaideSA5005 Australia

Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition. However, most of the existing action recognition models solely using RGB videos limit the performance capacity. Additionally, the state-of-the-art action recognition models, namely 3D convolutional neural networks (3D-CNNs) contain tremendous parameters suffering from computational inefficiency. In this paper, we propose a series of 3D lightweight architectures for action recognition based on RGB-D data. Compared with conventional 3D-CNN models, the proposed lightweight 3D-CNNs have considerably less parameters involving lower computation cost, while it results in favorable recognition performance. Experimental results on two public benchmark datasets show that our models can approximate or outperform the state-of-the-art approaches. Specifically, on the RGB+DNTU (NTU) dataset, we achieve 93.2% and 97.6% for cross-subject and cross-view measurement, and on the Northwestern-UCLA Multiview Action 3D (N-UCLA) dataset, we achieve 95.5% accuracy of cross-view. Copyright © 2018, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Coupled convolutional neural network with adaptive response function learning for unsupervised hyperspectral super-resolution

arXiv

引用

arXiv 2020年

作者： Zheng, Ke Gao, Lianru Liao, Wenzhi Hong, Danfeng Zhang, Bing Cui, Ximin Chanussot, Jocelyn Key Laboratory of Digital Earth Science Aerospace Information Research Institute Chinese Academy of Sciences Beijing100094 China College of Geoscience and Surveying Engineering China University of Mining and Technology [Bei Jing Beijing100083 China Key Laboratory of Digital Earth Science Aerospace Information Research Institute Chinese Academy of Sciences Beijing100094 China Mol2400 Belgium Image Processing and Interpretation IMEC Research Group Ghent University Ghent9000 Belgium Weling82234 Germany College of Resources and Environment University of Chinese Academy of Sciences Beijing100049 China College of Geoscience and Surveying Engineering China University of Mining and Technology [Bei Jing Beijing100083 China Univ. Grenoble Alpes CNRS Grenoble INP GIPSA-lab GrenobleF-38000 France Aerospace Information Research Institute Chinese Academy of Sciences Beijing100094 China

Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers to fusing HSI and MSI to generate an image with both high spatial and high spectral resolutions. Recently, several new methods have been proposed to solve this fusion problem, and most of these methods assume that the prior information of the Point Spread Function (PSF) and Spectral Response Function (SRF) are known. However, in practice, this information is often limited or unavailable. In this work, an unsupervised deep learning based fusion method HyCoNet that can solve the problems in HSIMSI fusion without the prior PSF and SRF information is proposed. HyCoNet consists of three coupled autoencoder nets in which the HSI and MSI are unmixed into endmembers and abundances based on the linear unmixing model. Two special convolutional layers are designed to act as a bridge that coordinates with the three autoencoder nets, and the PSF and SRF parameters are learned adaptively in the two convolution layers during the training process. Furthermore, driven by the joint loss function, the proposed method is straightforward and easily implemented in an end-to-end training manner. The experiments performed in the study demonstrate that the proposed method performs well and produces robust results for different datasets and arbitrary PSFs and SRFs. Copyright © 2020, The Authors. All rights reserved.

关键词： Spectroscopy

来源：评论

学校读者我要写书评

暂无评论

Eye Movement Pattern Modeling and Visual Comfort Viewing S3D images

Eye Movement Pattern Modeling and Visual Comfort Viewing S3D...

引用

IEEE Visual communications and image processing (VCIP)

作者： Chi Zhang Jun Zhou Xiao Gu Shouchen Zhu Alan. C. Bovik Institute of Image Communication & Network Enginerring Shanghai Jiao Tong University Shanghai China Shanghai Key Lab of Digital Media Processing & Transmissions Shanghai Jiao Tong University China Shanghai Yanan High School Shanghai China Department of Electrical and Computer Engineering The University of Texas at Austin USA

ISBN: (纸本)9781538644591;9781538644584

Stereoscopic-3D (S3D) displays are widely used but present problems related to experiences of visual discomfort for human vision. One aspect of this issue is the movement of the gaze point within different depth fields. Here we aim to analyze the relationship between eye movement patterns and visual comfort experienced when viewing S3D images. Rather than simply labeling eye movement data according to categories such as gaze, saccade and so on, we depoly nonparametric Bayesian method to analyze and cluster several eye movement patterns, and to relate them to visual comfort. The results are relevant to the prediction of visual comfort assessment in S3D images by automatic algorithms.

关键词： Visualization Hidden Markov models Bayes methods Brain modeling Graphical models Three-dimensional displays

来源：评论

学校读者我要写书评

暂无评论

A Novel LOS Rate Estimation Method Based on images for Strap-down Inertial Guidance

引用

Journal of Physics: Conference Series 2020年第1期1570卷

作者： Zheng Xu Haibo Luo Bin Hui Zheng Chang Shenyang Institute of Automation Chinese Academy of Sciences Shenyang 110016 China Tel.: +86-024-2397-0757 Institutes for Robotics and Intelligent Manufacturing Chinese Academy of Sciences Shenyang 110016 China University of Chinese Academy of Sciences Beijing 100049 China Key Laboratory of Opto-Electronic Information Processing Chinese Academy of Science Shenyang 110016 China The Key Lab of Image Understanding and Computer Vision Shenyang 110016 China

With the development of technology, precision guided weapon is becoming more and more important in modern war. In order to launch our recent guidance system on medium and small guided weapons, we propose a method to obtain the LOS rate by combining information from both camera and gyroscope. To be specific, we firstly calculate the body LOS angle through transforming the image pixel coordinate system into the image physical coordinate system according to camera internal parameters; then subtract the missile motion information contained in the measurement signal of the seeker and finally the LOS rate is deduced. Comparing with traditional gimballed seekers, our strap-down seekers with camera have significantly reduced costs and influences caused by external environments on platform.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：