The SHREC'13 Track: Retrieval of Objects Captured with Low-Cost Depth-Sensing Cameras is a first attempt at evaluating the effectiveness of 3D shape retrieval algorithms in low fidelity model databases, such as th...
详细信息
With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the underlying structure of the topics thems...
详细信息
This paper introduces the theory of factor analysis of the mixture of Auto-Associative Neural Networks (AANNs) with application in speaker verification. First, we formulate the problem of learning a low-dimensional su...
详细信息
The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker ...
详细信息
Short Utterance Speaker Recognition (SUSR) is an important area of speaker recognition when only small amount of speech data is available for testing and training. We list the most commonly used state-of-the-art metho...
详细信息
We present a new approach of using Auto-Associative Neural Networks (AANNs) in the conventional GMM speaker verification framework with i-vector feature extraction and PLDA modeling. In this technique, an i-vector fea...
详细信息
We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in new languages which have only few hours of annotated in-domain training data (for exa...
详细信息
We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in new languages which have only few hours of annotated in-domain training data (for example, 1 hour of data). In our approach, large amounts of annotated out-of-domain data from multiple languages are used to train multilingual MLP systems without dealing with the different phoneme sets for these languages. Features extracted from these MLP systems are used to train LVCSR systems in the low-resource language similar to the Tandem approach. In our experiments, the proposed features provide a relative improvement of about 30% in an low-resource LVCSR setting with only one hour of training data.
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their in...
详细信息
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speechprocessing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.
In this paper, we propose to evaluate the quality of emotional speech synthesis by means of an automatic emotion identification system. We test this approach using five different parametric speech synthesis systems, r...
详细信息
ISBN:
(纸本)9787560848693
In this paper, we propose to evaluate the quality of emotional speech synthesis by means of an automatic emotion identification system. We test this approach using five different parametric speech synthesis systems, ranging from plain non-emotional synthesis to full re-synthesis of pre-recorded speech. We compare the results achieved with the automatic system to those of human perception tests. While preliminary, our results indicate that automatic emotion identification can be used to assess the quality of emotional speech synthesis, potentially replacing time consuming and expensive human perception tests.
For better performance in multilayer or hierarchical classification of handwritten text, appropriate grouping of similar symbols is very important. Here we aim to develop a reliable grouping schema for the similar loo...
详细信息
For better performance in multilayer or hierarchical classification of handwritten text, appropriate grouping of similar symbols is very important. Here we aim to develop a reliable grouping schema for the similar looking basic characters, numerals and vowel modifiers of Bangla language. We experimented with thickened and thinned segmented handwritten text to compare which type of image is better for which group. For classification we chose Support Vector Machine (SVM) as it outperforms other classifiers in this field. We used both “one against one” and “one against all” strategies for multiclass SVM and compared their performance.
暂无评论