New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even ...
详细信息
New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel method to extract new words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a candidate list of new words. Then, we employ the statistical language knowledge to extract the top ranked new words. Experimental results show that our proposed method is able to extract a large number of new words both in Chinese and English corpus, and notably outperforms the state-of-the-art methods. Moreover, we also demonstrate our method increases the accuracy of Chinese word segmentation by 10% on corpus containing new words.
End-to-end learning treats the entire system as a whole adaptable black box, which, if sufficient data are available, may learn a system that works very well for the target task. This principle has recently been appli...
详细信息
This paper describes the submission of Johns Hopkins University for the shared translation task of ACL 2016 First Conference on Machine Translation (WMT 2016). We set up phrase-based, hierarchical phrase-based and syn...
详细信息
We address the problem of unknown words, also known as out of vocabulary (OOV) words, in machine translation of low resource languages. Our technique comprises a combination of methods, inspired by the common OOV type...
详细信息
We present an interactive translation prediction method based on neural machine translation. Even with the same translation quality of the underlying machine translation systems, the neural prediction method yields mu...
详细信息
For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay ...
详细信息
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By en...
详细信息
Past work in relation extraction has focused on binary relations in single sentences. Recent NLP inroads in high-value domains have sparked interest in the more general setting of extracting n-ary relations that span ...
详细信息
Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some automatic speech analysis tasks the state of the art is surprisingly poor. One of these is "diarization", the ...
详细信息
ISBN:
(纸本)9781538646595
Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some automatic speech analysis tasks the state of the art is surprisingly poor. One of these is "diarization", the task of determining who spoke when. Diarization is key to processing meeting audio and clinical interviews, extended recordings such as police body cam or child language acquisition data, and any other speech data involving multiple speakers whose voices are not cleanly separated into individual channels. Overlapping speech, environmental noise and suboptimal recording techniques make the problem harder. During the JSALT Summer Workshop at CMU in 2017, an international team of researchers worked on several aspects of this problem, including calibration of the state of the art, detection of overlaps, enhancement of noisy recordings, and classification of shorter speech segments. This paper sketches the workshop's results, and announces plans for a "Diarization Challenge" to encourage further progress.
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID m...
详细信息
ISBN:
(纸本)9781538633342
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone- aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.
暂无评论