检索结果-内蒙古大学图书馆

Deep speaker verification: Do we need end to end?

学校读者我要写书评

暂无评论

arXiv 2017年

作者： Wang, Dong Li, Lantian Tang, Zhiyuan Zheng, Thomas Fang Center for Speech and Language Technologies Research Institute of Information Technology Department of Computer Science and Technology Tsinghua University China

End-to-end learning treats the entire system as a whole adaptable black box, which, if sufficient data are available, may learn a system that works very well for the target task. This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric. An opposite approach to end-to-end is feature learning, which firstly trains a feature learning model, and then constructs a back-end classifier separately to perform SV. Recently, both approaches achieved significant performance gains on SV, mainly attributed to the smart utilization of deep neural networks. However, the two approaches have not been carefully compared, and their respective advantages have not been well discussed. In this paper, we compare the end-to-end and feature learning approaches on a text-independent SV task. Our experiments on a dataset sampled from the Fisher database and involving 5,000 speakers demonstrated that the feature learning approach outperformed the end-to-end approach. This is a strong support for the feature learning approach, at least with data and computation resources similar to ours. Copyright © 2017, The Authors. All rights reserved.

关键词： speech recognition

The JHU Machine Translation Systems for WMT 2016 1

学校读者我要写书评

暂无评论

The JHU Machine Translation Systems for WMT 2016

1st Conference on Machine Translation, WMT 2016, held at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016

作者： Ding, Shuoyang Duh, Kevin Khayrallah, Huda Koehn, Philipp Post, Matt Center for Language and Speech Processing Human Language Technology Center of Excellence Department of Computer Science Johns Hopkins University BaltimoreMD United States

ISBN: (纸本)9781945626104

This paper describes the submission of Johns Hopkins University for the shared translation task of ACL 2016 First Conference on Machine Translation (WMT 2016). We set up phrase-based, hierarchical phrase-based and syntax-based systems for all 12 language pairs of this year's evaluation campaign. Novel research directions we investigated include: neural probabilistic language models, bilingual neural network language models, morphological segmentation, and the attentionbased neural machine translation model as reranking feature. © 2016 Association for Computational Linguistics.

关键词： Neural machine translation

Translation of Unknown Words in Low Resource languages 12

学校读者我要写书评

暂无评论

Translation of Unknown Words in Low Resource Languages

12th Conference of the Association for Machine Translation in the Americas, AMTA 2016

作者： Gujral, Biman Khayrallah, Huda Koehn, Philipp Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

We address the problem of unknown words, also known as out of vocabulary (OOV) words, in machine translation of low resource languages. Our technique comprises a combination of methods, inspired by the common OOV types observed. We also design evaluation techniques for measuring coverage of OOVs achieved and integrate the new translation candidates in a Statistical Machine Translation (SMT) system. Experimental results on Hindi and Uzbek show that our system achieves a good coverage of OOV words. We show that our methods produced correct candidates for 50% of Hindi OOVs and 30% of Uzbek OOVs, in scenarios that have 1 and 3 OOVs per sentence. This offers a potential for improvement of translation quality for languages that have limited parallel data available for training. © 2016 The Authors.

关键词： Machine translation

Neural Interactive Translation Prediction 12

学校读者我要写书评

暂无评论

Neural Interactive Translation Prediction

12th Conference of the Association for Machine Translation in the Americas, AMTA 2016

作者： Knowles, Rebecca Koehn, Philipp Department of Computer Science Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States

We present an interactive translation prediction method based on neural machine translation. Even with the same translation quality of the underlying machine translation systems, the neural prediction method yields much higher word prediction accuracy (61.6% vs. 43.3%) than the traditional method based on search graphs, mainly due to better recovery from errors. We also develop efficient means to enable practical deployment. © 2016 The Authors.

关键词： Neural machine translation

A study on replay attack and anti-spoofing for automatic speaker verification

学校读者我要写书评

暂无评论

arXiv 2017年

作者： Li, Lantian Chen, Yixiang Wang, Dong Zheng, Thomas Fang Center for Speech and Language Technologies Research Institute of Information Technology Department of Computer Science and Technology Tsinghua University Beijing100084 China

For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: The over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: Speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database. Copyright © 2017, The Authors. All rights reserved.

关键词： Risk assessment

Full-info training for deep speaker feature learning

学校读者我要写书评

暂无评论

arXiv 2017年

作者： Li, Lantian Tang, Zhiyuan Wang, Dong Zheng, Thomas Fang Center for Speech and Language Technologies Research Institute of Information Technology Department of Computer Science and Technology Tsinghua University Beijing100084 China

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential problem of the present model is that it involves a parametric classifier, i.e., the last affine layer, which may consume some discriminative knowledge, thus leading to 'information leak' for the feature learning. This paper presents a full-info training approach that discards the parametric classifier and enforces all the discriminative knowledge learned by the feature net. Our experiments on the Fisher database demonstrate that this new training scheme can produce more coherent features, leading to consistent and notable performance improvement on the speaker verification task. Copyright © 2017, The Authors. All rights reserved.

关键词： Deep neural networks

Cross-sentence N-ary relation extraction with graph LSTMs

学校读者我要写书评

暂无评论

arXiv 2017年

作者： Peng, Nanyun Poon, Hoifung Quirk, Chris Toutanova, Kristina Yih, Wen-Tau Center for Language and Speech Processing Computer Science Department Johns Hopkins University BaltimoreMD United States Microsoft Research RedmondWA United States Google Research SeattleWA United States

Past work in relation extraction has focused on binary relations in single sentences. Recent NLP inroads in high-value domains have sparked interest in the more general setting of extracting n-ary relations that span multiple sentences. In this paper, we explore a general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unified way of exploring different LSTM approaches and incorporating various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations. A robust contextual representation is learned for the entities, which serves as input to the relation classifier. This simplifies handling of relations with arbitrary arity, and enables multi-task learning with related relations. We evaluate this framework in two important precision medicine settings, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Cross-sentence extraction produced larger knowledge bases. and multi-task learning significantly improved extraction accuracy. A thorough analysis of various LSTM approaches yielded useful insight the impact of linguistic analysis on extraction accuracy. Copyright © 2017, The Authors. All rights reserved.

关键词： Long short-term memory

Enhancement and Analysis of Conversational speech: JSALT 2017

学校读者我要写书评

暂无评论

Enhancement and Analysis of Conversational Speech: JSALT 201...

IEEE International Conference on Acoustics, speech and Signal processing

作者： Neville Ryanta Elika Bergelson Kenneth Church Alejandrina Cristia Jun Du Sriram Ganapathy Sanjeev Khudanpur Diana Kowalski Mahesh Krishnamoorthy Rajat Kulshreshta Mark Liberman Yu-Ding Lu Matthew Maciejewski Florian Metze Jan Profant Lei Sun Yu Tsao Zhou Yu Linguistic Data Consortium University of Pennsylvania Philadelphia PA USA Department of Psychology and Neuroscience Duke University Durham NC USA IBM Yorktown Heights NY USA Laboratoire de Sciences Cognitives et Psycholinguistique ENS Paris France University of Science and Technology of China Hefei China Electrical Engineering Department Indian Institute of Science Bangalore India Center for Language and Speech Processing Johns Hopkins University Baltimore MD USA University of Illinois at Urbana-Champaign Champaign IL USA Apple Cupertino CA USA Language Technologies Institute Carnegie Mellon University Pittsburgh PA USA Research Center for Information Technology Innovation Academia Sinica Taipei Taiwan Brno University of Technology Brno Czech Republic Department of Computer Science University of California Davis Davis CA USA

ISBN: (纸本)9781538646595

Automatic speech recognition is more and more widely and effectively used. Nevertheless, in some automatic speech analysis tasks the state of the art is surprisingly poor. One of these is "diarization", the task of determining who spoke when. Diarization is key to processing meeting audio and clinical interviews, extended recordings such as police body cam or child language acquisition data, and any other speech data involving multiple speakers whose voices are not cleanly separated into individual channels. Overlapping speech, environmental noise and suboptimal recording techniques make the problem harder. During the JSALT Summer Workshop at CMU in 2017, an international team of researchers worked on several aspects of this problem, including calibration of the state of the art, detection of overlaps, enhancement of noisy recordings, and classification of shorter speech segments. This paper sketches the workshop's results, and announces plans for a "Diarization Challenge" to encourage further progress.

关键词： diarization overlap detection speech enhancement automatic speech recognition speech recognition speech enhancement speech state of the art monuron Recordings

Phone-aware neural language identification

学校读者我要写书评

暂无评论

Phone-aware neural language identification

Oriental COCOSDA International Conference on speech Database and Assessments

作者： Zhiyuan Tang Dong Wang Yixiang Chen Ying Shi Lantian Li Center for Speech and Language Technologies RIIT Tsinghua University Tsinghua National Laboratory for Information Science and Technology Tsinghua University Department of Computer Science Tsinghua University

ISBN: (纸本)9781538633342

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone- aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.

关键词： Phonetics Training Computational modeling Databases Acoustics Standardization Integrated circuit modeling