Target tracking in hyperspectral videos is a new research topic. In this paper, a novel method based on convolutional network and Kernelized Correlation Filter (KCF) framework is presented for tracking objects of inte...
详细信息
Diabetic retinopathy is a major cause of blindness in working age population and exudates are considered the most significant characteristics of diabetic retinopathy. Therefore, automatic exudate detection is benefici...
详细信息
For public security, an intelligent video surveillance system that can analyze large-scale crowd scenes has become an urgent need. In this paper, we propose a system that integrates multiple crowd properties, includin...
详细信息
In this paper, we study various video stabilization techniques and develop an algorithm which can perform video stabilization under strict time constraints. To do this, an optimized version of block matching in a rest...
详细信息
This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the inp...
详细信息
ISBN:
(纸本)9781538653128
This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the input of a BLSTM-based acoustic model, to control the expressivity of the synthesized speech. The input codes, in one-hot representation, include dialogue code, sentiment code and sentence position code. The dialogue code indicates whether the text is a dialogue or narration in an audiobook story. The sentiment code is obtained from a sentiment analysis tool, which labels each sentence as positive, negative and neutral. The sentence position code indicates the position of the sentence in the paragraph. We believe these codes are highly related to the expressiveness of the audiobook speech. Experiments on the data from the Blizzard Challenge 2017 demonstrate the effectiveness of the use of input codes in the neural network approach for expressive TTS.
This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio,...
详细信息
Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition. However, most of the...
详细信息
Due to the limitations of hyperspectral imaging systems, hyperspectral imagery (HSI) often suffers from poor spatial resolution, thus hampering many applications of the imagery. Hyperspectral super-resolution refers t...
详细信息
Stereoscopic-3D (S3D) displays are widely used but present problems related to experiences of visual discomfort for human vision. One aspect of this issue is the movement of the gaze point within different depth field...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Stereoscopic-3D (S3D) displays are widely used but present problems related to experiences of visual discomfort for human vision. One aspect of this issue is the movement of the gaze point within different depth fields. Here we aim to analyze the relationship between eye movement patterns and visual comfort experienced when viewing S3D images. Rather than simply labeling eye movement data according to categories such as gaze, saccade and so on, we depoly nonparametric Bayesian method to analyze and cluster several eye movement patterns, and to relate them to visual comfort. The results are relevant to the prediction of visual comfort assessment in S3D images by automatic algorithms.
With the development of technology, precision guided weapon is becoming more and more important in modern war. In order to launch our recent guidance system on medium and small guided weapons, we propose a method to o...
With the development of technology, precision guided weapon is becoming more and more important in modern war. In order to launch our recent guidance system on medium and small guided weapons, we propose a method to obtain the LOS rate by combining information from both camera and gyroscope. To be specific, we firstly calculate the body LOS angle through transforming the image pixel coordinate system into the image physical coordinate system according to camera internal parameters; then subtract the missile motion information contained in the measurement signal of the seeker and finally the LOS rate is deduced. Comparing with traditional gimballed seekers, our strap-down seekers with camera have significantly reduced costs and influences caused by external environments on platform.
暂无评论