In Short Utterance Speaker Recognition (SUSR), the role of complete speech units like syllables in carrying speaker information needs further investigation. This paper presents a novel method of using syllable categor...
详细信息
In Short Utterance Speaker Recognition (SUSR), the role of complete speech units like syllables in carrying speaker information needs further investigation. This paper presents a novel method of using syllable categories for SUSR. We define Syllable Categories (SCs) with the help of syllable structure of Chinese language. Syllables in speech are segmented into SCs, which are then used to develop Universal Background SC Model for each SC. Conventional GMM-UBM system is used for training and testing. The proposed categories give average EER of 17.79%, 19.35% and 21.65% for 3, 2 and 1 second of test utterance length respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at syllable level where prosodic idiosyncrasies can be utilized. This information can be used in SUSR by exploiting similarities in consonants and vowels of a syllable such that SCs can be used effectively.
Text-Dependent Speaker Recognition (TDSR) is widely used nowadays. The short-term features like Mel-Frequency Cepstral Coefficient (MFCC) have been the dominant features used in traditional Dynamic Time Warping (DTW) ...
详细信息
Text-Dependent Speaker Recognition (TDSR) is widely used nowadays. The short-term features like Mel-Frequency Cepstral Coefficient (MFCC) have been the dominant features used in traditional Dynamic Time Warping (DTW) based TDSR systems. The short-term features capture better local portion of the significant temporal dynamics but worse in overall sentence statistical characteristics. Functional Data Analysis (FDA) has been proven to show significant advantage in exploring the statistic information of data, so in this paper, a long-term feature extraction based on MFCC and FDA theory is proposed, where the extraction procedure consists of the following steps: Firstly, the FDA theory is applied after the MFCC feature extraction; Secondly, for the purpose of compressing the redundant data information, new feature based on the Functional Principle Component Analysis (FPCA) is generated; Thirdly, the distance between train features and test features is calculated for the use of the recognition procedure. Compared with the existing MFCC plus DTW method, experimental results show that the new features extracted with the proposed method plus the cosine similarity measure demonstrates better performance.
Many linguistic and textual processes involve transduc-tion of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is ...
详细信息
ISBN:
(纸本)9781622765034
Many linguistic and textual processes involve transduc-tion of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, "similar" strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to refer to persons in Wikipedia, improving over standard untrained distances such as Jaro-Winkler and Leven-shtein distance.
作者:
Ondrej BojarDekai WuCharles University in Prague
Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics HKUST
Human Language Technology Center Department of Computer Science and Engineering Hong Kong University of Science and Technology
HMEANT (Lo and Wu, 2011a) is a manual MT evaluation technique that focuses on predicate-argument structure of the sentence. We relate HMEANT to an established linguistic theory, highlighting the possibilities of reusi...
详细信息
ISBN:
(纸本)9781627483452
HMEANT (Lo and Wu, 2011a) is a manual MT evaluation technique that focuses on predicate-argument structure of the sentence. We relate HMEANT to an established linguistic theory, highlighting the possibilities of reusing existing knowledge and resources for interpreting and automating HMEANT. We apply HMEANT to a new language, Czech in particular, by evaluating a set of English-to-Czech MT systems. HMEANT proves to correlate with manual rankings at the sentence . level better than a range of automatic metrics. However, the main contribution of this paper is the identification of several issues of HMEANT annotation and our proposal on how to resolve them.
Many analysis techniques are currently available to identify the signaling pathways significantly impacted in a given condition. All these approaches calculate a p-value that aims to quantify the significance of the i...
详细信息
ISBN:
(纸本)9781467314886
Many analysis techniques are currently available to identify the signaling pathways significantly impacted in a given condition. All these approaches calculate a p-value that aims to quantify the significance of the involvement of a given pathway in the condition under study. These p-values were thought to be related to the likelihood of their respective pathways being involved in the given condition, and to be independent. Here we show that this is not true, and that many pathways are not independent and that can considerably affect each other's p-values through a phenomenon we refer to as "cross-talk." Thus, the significance of a given pathway in a given experiment has to be interpreted in the context of the other pathways that appear to be significant. Using real data, we show that in same cases pathways with significant classical p-values are not biologically meaningful, and that some biologically meaningful pathways with insignificant p-values become significant when the crosstalk effects of other pathways are removed. We show that this phenomenon is related to the amount of common genes between different pathways, affecting the most widely used methods for pathway analysis, and we propose an analysis technique that is able to correct the over-enrichment significance of a pathway when the cross-talk effects of other pathways are removed.
The recursive least squares (RLS) algorithm is well known and has been widely used for many years. Most analyses of RLS have assumed statistical properties of the data or the noise process, but recent robust ℌ ∞ ana...
详细信息
The recursive least squares (RLS) algorithm is well known and has been widely used for many years. Most analyses of RLS have assumed statistical properties of the data or the noise process, but recent robust ℌ ∞ analyses have been used to bound the ratio of the performance of the algorithm to the total noise. In this paper, we provide an additive analysis bounding the difference between performance and noise. Our analysis provides additional convergence guarantees in general, and particular benefits for structured input data. We illustrate the analysis using human speech and white noise.
We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features spec...
详细信息
We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.
We suggest a multivariate genotype-phenotype association test for functional magnetic resonance imaging (fMRI) data. The method uses a voxel selection and ranking scheme based on iterative adaptive Lasso for defining ...
详细信息
ISBN:
(纸本)9781467314886
We suggest a multivariate genotype-phenotype association test for functional magnetic resonance imaging (fMRI) data. The method uses a voxel selection and ranking scheme based on iterative adaptive Lasso for defining a functional region of interest. A classifier-based test is used to assess the significance of potential associations between the differential activity pattern within that region and a set of candidate genetic variants. We applied the method to a small sample dataset from an ongoing imaging genetics study and identified a significant genetic association to a stimulus-locked imaging phenotype.
暂无评论