Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the r...
详细信息
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method th...
详细信息
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchical phrase-based machine translation system, and show that a discriminative language model significantly improves upon a state-of-the-art baseline. The experiments also highlight the benefits of our data selection method.
We present a novel method for discovering and modeling the relationship between informal Chinese expressions (including colloquialisms and instant-messaging slang) and their formal equivalents. Specifically, we propos...
详细信息
We present a novel algorithm for the acquisition of multilingual lexical taxonomies (including hyponymy/hypernymy, meronymy and taxonomic cousinhood), from monolingual corpora with minimal supervision in the form of s...
详细信息
We propose a novel method of exploiting prosodic breaks in language modeling for automatic speech recognition (ASR) based on the random forest language model (RFLM), which is a collection of randomized decision tree l...
详细信息
ISBN:
(纸本)9780616220030
We propose a novel method of exploiting prosodic breaks in language modeling for automatic speech recognition (ASR) based on the random forest language model (RFLM), which is a collection of randomized decision tree language models and can potentially ask any questions about the history in order to predict the future. We demonstrate how questions about prosodic breaks can be easily incorporated into the RFLM and present two language models which treat prosodic breaks as observable and hidden variables, respectively. Meanwhile, we show empirically that a finer grained prosodic break is needed for language modeling. Experimental results showed that given prosodic breaks, we were able to reduce the LM perplexity by a significant margin, suggesting a prosodic N-best rescoring approach for ASR.
Knowing the degree of antonymy between words has widespread applications in natural languageprocessing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We ...
详细信息
System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order ...
详细信息
System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order to produce lattices (or confusion networks), from which a composite hypothesis is chosen, possibly with the help of a large language model. The benefit of this approach is two-fold: (i) whenever many systems agree with each other on a set of words, the combination output contains these words with high confidence; and (ii) whenever the systems disagree, the language model resolves the ambiguity based on the (probably correct) agreed upon context. The case of machine translation system combination is more challenging because of the different word orders of the translations: the alignment has to incorporate computationally expensive movements of word blocks. In this paper, we show how one can combine translation outputs efficiently, extending the incremental alignment procedure of (A-V.I. Rosti et al., 2008). A comparison between different system combination design choices is performed on an Arabic speech translation task.
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in...
详细信息
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in...
详细信息
For concatenative speech synthesis based on non-uniform unit selection, the key to improve the synthetic quality is the careful designing of measuring criteria respect to the units adopted. With our previous hierarchi...
详细信息
For concatenative speech synthesis based on non-uniform unit selection, the key to improve the synthetic quality is the careful designing of measuring criteria respect to the units adopted. With our previous hierarchical non-uniform unit selection framework [1], two measurements for selecting optimal non-uniform units during searching at different layers are proposed in this paper, including inter-syllable pitch control and spectra distance by phonetic context. These measures are used as components of our cost function, especially for boundaries in front of syllables starting with voiceless consonants. Experiment shows it outperforms our previous system.
暂无评论