Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM);states and s...
详细信息
Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM);states and s...
详细信息
Knowing the degree of antonymy between words has widespread applications in natural languageprocessing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We ...
详细信息
作者:
Lei XieGuangsen WangAudio
Speech and Language Processing Group School of Computer Science Northwestern Polytechnical University Xi'an China
This paper presents a two-stage multi-feature integration approach for unsupervised speaker change detection in real-time news broadcasting. We integrate MFCC and LSP features (i.e. a perceptual feature plus a articul...
详细信息
ISBN:
(纸本)9781424429424
This paper presents a two-stage multi-feature integration approach for unsupervised speaker change detection in real-time news broadcasting. We integrate MFCC and LSP features (i.e. a perceptual feature plus a articulatory feature) in the metric-based potential speaker change detection stage to collect speaker boundary candidates as many as possible. We adopt a weighted Bayesian information criterion (BIC) to integrate boundary decisions from MFCC and LSP features in the speaker boundary confirmation stage. This multi-feature integration strategy makes use of the complementarity between perceptual features and articulatory features to achieve a performance gain. Speaker change detection experiments show that the multi- feature integration approach significantly outperforms the individual features with relative improvements of 26% over the LSP-only approach and 6% over the MFCC-only approach.
作者:
Yulian YangLei XieAudio
Speech and Language Processing Group School of Computer Science Northwestern Polytechnical University Xi'an China
This paper proposes to perform latent semantic analysis (LSA) on character/syllable n-gram sequences of automatic speech recognition (ASR) transcripts, namely subword LSA, as an extension of our previous work on subwo...
详细信息
ISBN:
(纸本)9781424429424
This paper proposes to perform latent semantic analysis (LSA) on character/syllable n-gram sequences of automatic speech recognition (ASR) transcripts, namely subword LSA, as an extension of our previous work on subword text tiling for automatic story segmentation of Chinese broadcast news. LSA represents the 'meaning' of a lexical term by a feature vector conveying the term's relations with other terms. We apply subword LSA vectors to the measurement of inter-sentence lexical score in text tiling-based story segmentation. Subword n-grams are robust to speech recognition errors, especially out-of-vocabulary (OOV) words, in lexical matching on Chinese ASR transcripts. This work combines the concept matching merit of LSA and the robustness of subwords. Experimental results on the TDT2 Mandarin corpus show that subword-LSA-based text tiling can effectively improve the story segmentation performance. Character-bigram-LSA-based text tiling achieves the best Fl-measure of 0.6598 with relative improvement of 17.4% over the conventional word-based text tiling and 6.5% over our previous syllable-bigram-based text tiling.
System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order ...
详细信息
System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order to produce lattices (or confusion networks), from which a composite hypothesis is chosen, possibly with the help of a large language model. The benefit of this approach is two-fold: (i) whenever many systems agree with each other on a set of words, the combination output contains these words with high confidence; and (ii) whenever the systems disagree, the language model resolves the ambiguity based on the (probably correct) agreed upon context. The case of machine translation system combination is more challenging because of the different word orders of the translations: the alignment has to incorporate computationally expensive movements of word blocks. In this paper, we show how one can combine translation outputs efficiently, extending the incremental alignment procedure of (A-V.I. Rosti et al., 2008). A comparison between different system combination design choices is performed on an Arabic speech translation task.
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in...
详细信息
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in...
详细信息
For concatenative speech synthesis based on non-uniform unit selection, the key to improve the synthetic quality is the careful designing of measuring criteria respect to the units adopted. With our previous hierarchi...
详细信息
For concatenative speech synthesis based on non-uniform unit selection, the key to improve the synthetic quality is the careful designing of measuring criteria respect to the units adopted. With our previous hierarchical non-uniform unit selection framework [1], two measurements for selecting optimal non-uniform units during searching at different layers are proposed in this paper, including inter-syllable pitch control and spectra distance by phonetic context. These measures are used as components of our cost function, especially for boundaries in front of syllables starting with voiceless consonants. Experiment shows it outperforms our previous system.
String matching is a fundamental issue in computerscience. This paper presents a lightweight string matching algorithm for short pattern matching, in which less than 20 keywords are often involved in the pattern set....
详细信息
String matching is a fundamental issue in computerscience. This paper presents a lightweight string matching algorithm for short pattern matching, in which less than 20 keywords are often involved in the pattern set. The new algorithm makes use of condensed hash tables and computes the shift distance after each test by observing the character that immediately passes the test window. Experiments show that the new algorithm improves execution speed and decreases memory requirement. This algorithm is suitable for applications with small pattern set (i.e. containing up to 30 keywords), particularly for embedded equipments.
暂无评论