For better performance in multilayer or hierarchical classification of handwritten text, appropriate grouping of similar symbols is very important. Here we aim to develop a reliable grouping schema for the similar loo...
详细信息
For better performance in multilayer or hierarchical classification of handwritten text, appropriate grouping of similar symbols is very important. Here we aim to develop a reliable grouping schema for the similar looking basic characters, numerals and vowel modifiers of Bangla language. We experimented with thickened and thinned segmented handwritten text to compare which type of image is better for which group. For classification we chose Support Vector Machine (SVM) as it outperforms other classifiers in this field. We used both “one against one” and “one against all” strategies for multiclass SVM and compared their performance.
Short Utterance Speaker Recognition (SUSR) is an important area of speaker recognition when only small amount of speech data is available for testing and training. We list the most commonly used state-of-the-art metho...
详细信息
Short Utterance Speaker Recognition (SUSR) is an important area of speaker recognition when only small amount of speech data is available for testing and training. We list the most commonly used state-of-the-art methods of speaker recognition and the significance of prosodic speaker recognition. A short survey of SUSR is hereby conducted, highlighting various methodologies when using short utterances to recognize speakers. We also specify future research directions in the field SUSR which, together with modern technologies and the ongoing research in prosodic speaker recognition, can lead to better results in speaker recognition.
The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker ...
详细信息
The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker information needs further investigation. This paper presents a novel method of using vowel categories for SUSR. We define Vowel Categories (VC's) considering Chinese and English languages. After recognition and extraction of phonemes, the obtained vowels are divided into VC's, which are then used to develop Universal Background VC Models (UBVCM) for each VC. Conventional GMM-UBM system is used for training and testing. The proposed categories give minimum EERs of 13.76%, 14.03% and 16.18% for 3, 2 and 1 second respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at phoneme level. The similar properties of phonemes can be used such that accurate speech recognition is not required, rather Phoneme Categories can be used effectively for SUSR. Also, it is shown that vowels contain large amount of speaker information, which remains undisturbed when VC are employed.
In Short Utterance Speaker Recognition (SUSR), the role of complete speech units like syllables in carrying speaker information needs further investigation. This paper presents a novel method of using syllable categor...
详细信息
In Short Utterance Speaker Recognition (SUSR), the role of complete speech units like syllables in carrying speaker information needs further investigation. This paper presents a novel method of using syllable categories for SUSR. We define Syllable Categories (SCs) with the help of syllable structure of Chinese language. Syllables in speech are segmented into SCs, which are then used to develop Universal Background SC Model for each SC. Conventional GMM-UBM system is used for training and testing. The proposed categories give average EER of 17.79%, 19.35% and 21.65% for 3, 2 and 1 second of test utterance length respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at syllable level where prosodic idiosyncrasies can be utilized. This information can be used in SUSR by exploiting similarities in consonants and vowels of a syllable such that SCs can be used effectively.
In this paper, we present a supervised learning method to seek out answers to the most frequently asked descriptive questions: reason, method, and definition questions. Most of the previous systems for question answer...
详细信息
Text-Dependent Speaker Recognition (TDSR) is widely used nowadays. The short-term features like Mel-Frequency Cepstral Coefficient (MFCC) have been the dominant features used in traditional Dynamic Time Warping (DTW) ...
详细信息
Text-Dependent Speaker Recognition (TDSR) is widely used nowadays. The short-term features like Mel-Frequency Cepstral Coefficient (MFCC) have been the dominant features used in traditional Dynamic Time Warping (DTW) based TDSR systems. The short-term features capture better local portion of the significant temporal dynamics but worse in overall sentence statistical characteristics. Functional Data Analysis (FDA) has been proven to show significant advantage in exploring the statistic information of data, so in this paper, a long-term feature extraction based on MFCC and FDA theory is proposed, where the extraction procedure consists of the following steps: Firstly, the FDA theory is applied after the MFCC feature extraction; Secondly, for the purpose of compressing the redundant data information, new feature based on the Functional Principle Component Analysis (FPCA) is generated; Thirdly, the distance between train features and test features is calculated for the use of the recognition procedure. Compared with the existing MFCC plus DTW method, experimental results show that the new features extracted with the proposed method plus the cosine similarity measure demonstrates better performance.
The past few years have seen an increasing interest in using Amazon's Mechanical Turk for purposes of collecting data and performing annotation tasks. One such task is the mass evaluation of system output in a var...
详细信息
Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speechprocessing system. In the current work, we propose a novel segmentation approach based on a perceptio...
详细信息
Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. ...
详细信息
Performance degradation with time varying is a generally acknowledged phenomenon in speaker recognition and it is widely assumed that speaker models should be updated from time to time to maintain representativeness. However, it is costly, user-unfriendly, and sometimes, perhaps unrealistic, which hinders the technology from practical applications. From a pattern recognition point of view, the time-varying issue in speaker recognition requires such features that are speakerspecific, and as stable as possible across time-varying sessions. Therefore, after searching and analyzing the most stable parts of feature space, a Discrimination-emphasized Mel-frequencywarping method is proposed. In implementation, each frequency band is assigned with a discrimination score, which takes into account both speaker and session information, and Melfrequency- warping is done in feature extraction to emphasize bands with higher scores. Experimental results show that in the time-varying voiceprint database, this method can not only improve speaker recognition performance with an EER reduction of 19.1%, but also alleviate performance degradation brought by time varying with a reduction of 8.9%.
There are many factors corresponding to performance degradation of an actual speaker recognition system. Mismatch in speaking style of a target speaker during training and testing is an important one. When a client en...
详细信息
There are many factors corresponding to performance degradation of an actual speaker recognition system. Mismatch in speaking style of a target speaker during training and testing is an important one. When a client enrolls in a system, it is natural for him/her to speak in a spontaneous way. However, it is difficult to maintain the same speaking style throughout test phases. In view of this situation, this paper, based on a database with multiple speaking styles, proposes the concept of Speaking-style-Dependent Background Model (SDBM). The SDBM-based system is presented to train speaking style featured speaker models aiming to alleviate the speaking style mismatch between training and testing. Experimental results show that EER can be reduced by 35.40%.
暂无评论