Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some automatic speech analysis tasks the state of the art is surprisingly poor. One of these is "diarization", the ...
详细信息
ISBN:
(纸本)9781538646595
Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some automatic speech analysis tasks the state of the art is surprisingly poor. One of these is "diarization", the task of determining who spoke when. Diarization is key to processing meeting audio and clinical interviews, extended recordings such as police body cam or child language acquisition data, and any other speech data involving multiple speakers whose voices are not cleanly separated into individual channels. Overlapping speech, environmental noise and suboptimal recording techniques make the problem harder. During the JSALT Summer Workshop at CMU in 2017, an international team of researchers worked on several aspects of this problem, including calibration of the state of the art, detection of overlaps, enhancement of noisy recordings, and classification of shorter speech segments. This paper sketches the workshop's results, and announces plans for a "Diarization Challenge" to encourage further progress.
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID m...
详细信息
ISBN:
(纸本)9781538633342
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone- aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.
This work describes our submission to the WMT16 Bilingual Document Alignment task. We show that a very simple distance metric, namely Cosine distance of tf/idf weighted document vectors provides a quick and reliable w...
详细信息
We present a simple, prepackaged solution to generating paraphrases of English sentences. We use the Paraphrase Database (PPDB) for monolingual sentence rewriting and provide machine translation language packs: Prepac...
详细信息
Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF)....
详细信息
Opinion targets extraction of Chinese microblogs plays an important role in opinion mining. There has been a significant progress in this area recently, especially the method based on conditional random field (CRF). However, this method only takes lexicon-related features into consideration and does not excavate the implied syntactic and semantic knowledge. We propose a novel approach which incorporates domain lexicon with groups of syntactical and semantic features. The approach acquires domain lexicon through a novel way which explores syntactic and semantic information through Part- of-speech, dependency structure, phrase structure, semantic role and semantic similarity based on word embedding. And then we combine the domain lexicon with opinion targets extracted from CRF with groups of features for opinion targets extraction. Experimental results on COAE2014 dataset show the outperformance of the approach compared with other well-known methods on the task of opinion targets extraction.
Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms. When translating into Englis...
详细信息
Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID metho...
详细信息
Canonical morphological segmentation aims to divide words into a sequence of standardized segments. In this work, we propose a character-based neural encoder-decoder model for this task. Additionally, we extend our mo...
详细信息
We present a model of morphological segmentation that jointly learns to segment and restore orthographic changes, e.g., funniest ⟼ fun-y-est. We term this form of analysis canonical segmentation and contrast it with t...
详细信息
A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change poin...
详细信息
ISBN:
(纸本)9781509041183
A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change points. The d-vector is a kind of frame-level speaker discriminative feature, whose discriminative training process corresponds to the goal of discriminating a speaker change point from a single speaker speech segment in a short time window. Following the traditional metric-based segmentation, each analysis window contains two sub-windows and is shifting along the audio stream to detect speaker change points, where the speaker characteristics are represented by the means of deep speaker vectors for all frames in each window. Experimental investigations conducted in fast speaker change scenarios show that the proposed method can detect speaker change points more quickly and more effectively than the commonly used segmentation methods.
暂无评论