The parametric Bayesian Feature Enhancement (BFE) and a datadriven Denoising Autoencoder (DA) both bring performance gains in severe single-channel speech recognition conditions. The first can be adjusted to different...
详细信息
Recently, state-of-the-art recognition accuracies for pose-invariant face recognition have been achieved by using 2D-Warping methods in a nearest-neighbor framework. However, the main drawback of these methods is the ...
详细信息
Recently, state-of-the-art recognition accuracies for pose-invariant face recognition have been achieved by using 2D-Warping methods in a nearest-neighbor framework. However, the main drawback of these methods is the high computational complexity. In this paper we address this issue. We use a simple and fast method to get a rough estimate of a 2D-Warping. This estimate can then be used to apply an image dependent warprange on the 2D-Warping algorithm, limit the possible poses or preselect the most likely classes. By this method we are able significantly reduce the runtime of a recently proposed 2D-Warping algorithm without sacrificing recognition accuracy.
In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM ...
详细信息
In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM with pooled covariance matrix can be integrated into a neural network framework as a softmax layer with hidden variables, which allows for joint estimation of both neural network and Gaussian mixture parameters. Here, this approach is extended to include speaker adaptive training (SAT) by introducing a speaker dependent neural network layer. Error backpropagation beyond this speaker dependent layer realizes the adaptive training of the Gaussian parameters as well as the optimization of the bottleneck (BN) tandem features of the underlying acoustic model, simultaneously. In this study, after the initialization by constrained maximum likelihood linear regression (CMLLR) the speaker dependent layer itself is kept constant during the joint training. Experiments show that the deeper backpropagation through the speaker dependent layer is necessary for improved recognition performance. The speaker adaptively and jointly trained BN-GMM results in 5% relative improvement over very strong speaker-independent hybrid baseline on the Quaero English broadcast news and conversations task, and on the 300-hour Switchboard task.
In this paper we present our approach to extract profile information from anonymized tweets for the author profiling task at PAN 2015 [10]. Particularly we explore the versatility of random forest classifiers for the ...
详细信息
In this paper we present our approach to extract profile information from anonymized tweets for the author profiling task at PAN 2015 [10]. Particularly we explore the versatility of random forest classifiers for the genre and age groups information and random forest regressions to score important aspects of the personality of a user. Furthermore we propose a set of features tailored for this task based on characteristics of the twitters. In particular, our approach relies on previous proposed features for sentiment analysis tasks.
Social circles detection is a special case of community detection in social network that is currently attracting a growing interest in the research community. In this paper, we propose a two-step technique, making emp...
详细信息
ISBN:
(纸本)9781479919611
Social circles detection is a special case of community detection in social network that is currently attracting a growing interest in the research community. In this paper, we propose a two-step technique, making emphasis on the mapping of the data by Restricted Boltzmann Machines (RBMs). Social circles are subsequently inferred by k-means over the preprocessed data. We define different vectorial representations from both structural egonet information and user profile features, and perform a set of tests to adjust the optimal parameters of the RBMs. We study and compare the performance on the ego-Facebook dataset of social circles from Facebook from the Stanford Large Network Dataset Collection. We compare our results with several different baselines.
Multiple classifier systems are used to improve baseline results using different strategies. Bagging by design improves standard bagging by the minimization of intersection between the different ensembles. This work p...
详细信息
Multiple classifier systems are used to improve baseline results using different strategies. Bagging by design improves standard bagging by the minimization of intersection between the different ensembles. This work proposes the use of design bagging for continuous handwriting recognition. The design is performed using a multi-objective particle swarm optimizer. Hidden Markov Models and Long-Short Term Memory Recurrent Neural Networks are used to validate the proposed design. Experiments on English and French Handwriting recognition with different setups show significant improvements.
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual...
详细信息
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text recognition allowed paleographs to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is obtaining the draft transcription by dictating the contents to an Automatic Speech recognition system. When both sources (image and speech) are available, a multimodal combination is possible, and an iterative process can be used in order to refine the final hypothesis. In this work, a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.
Text line segmentation is the process by which text lines in a document image are localized and extracted. It is an important step in off-line Handwritten Text recognition (HTR) given that the input of these systems i...
详细信息
In this work, multiple hierarchical language modeling strategies for a zero OOV rate large vocabulary continuous speech recognition system are investigated. In our previously proposed hierarchical approach, a full-wor...
详细信息
ISBN:
(纸本)9781467369985
In this work, multiple hierarchical language modeling strategies for a zero OOV rate large vocabulary continuous speech recognition system are investigated. In our previously proposed hierarchical approach, a full-word language model and a context independent character-level LM (CLM) are directly used during search. The novelty of this work is to jointly model the character-level prior and the pronunciation probabilities, to introduce across-word context into the characterlevel LM, and to properly normalize the character-level LM using prefix-tree based normalization for the hierarchical approach. Significant reductions in-terms of word error rates (WER) on the best full-word Quaero Polish LVCSR system are reported.
This paper introduces the RWTH-PHOENIX-Weather 2014, a video-based, large vocabulary, German sign language corpus which has been extended over the last two years, tripling the size of the original corpus. The corpus c...
详细信息
ISBN:
(纸本)9782951740884
This paper introduces the RWTH-PHOENIX-Weather 2014, a video-based, large vocabulary, German sign language corpus which has been extended over the last two years, tripling the size of the original corpus. The corpus contains weather forecasts simultaneously interpreted into sign language which were recorded from German public TV and manually annotated using glosses on the sentence level and semi-automatically transcribed spoken German extracted from the videos using the open-source speech recognition system RASR. Spatial annotations of the signers' hands as well as shape and orientation annotations of the dominant hand have been added for more than 40k respectively 10k video frames creating one of the largest corpora allowing for quantitative evaluation of object tracking algorithms. Further, over 2k signs have been annotated using the SignWriting annotation system, focusing on the shape, orientation, movement as well as spatial contacts of both hands. Finally, extended recognition and translation setups are defined, and baseline results are presented.
暂无评论