检索结果-内蒙古大学图书馆

Alignment-Based Neural Machine Translation 1

Alignment-Based Neural Machine Translation

1st Conference on Machine Translation, WMT 2016, held at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016

作者： Alkhouli, Tamer Bretschner, Gabriel Peter, Jan-Thorsten Hethnawi, Mohammed Guta, Andreas Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany

ISBN: (纸本)9781945626104

Neural machine translation (NMT) has emerged recently as a promising statistical machine translation approach. In NMT, neural networks (NN) are directly used to produce translations, without relying on a pre-existing translation framework. In this work, we take a step towards bridging the gap between conventional word alignment models and NMT. We follow the hidden Markov model (HMM) approach that separates the alignment and lexical models. We propose a neural alignment model and combine it with a lexical neural model in a log-linear framework. The models are used in a standalone word-based decoder that explicitly hypothesizes alignments during search. We demonstrate that our system outperforms attention-based NMT on two tasks: IWSLT 2013 German→English and BOLT Chinese→English. We also show promising results for re-aligning the training data using neural models. © 2016 Association for Computational Linguistics

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

A resource-light method for cross-lingual semantic textual similarity

arXiv

引用

arXiv 2018年

作者： Glavaš, Goran Franco-Salvador, Marc Ponzetto, Simone P. Rosso, Paolo Data and Web Science Group School of Business Informatics and Matemathics University of Mannheim B6 26 MannheimDE-68159 Germany Symanto Research Pretzfelder Strasse 15 NurembergDE-90425 Germany Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino de Vera s/n ValenciaES-46022 Spain

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks. Copyright © 2018, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing 1

A Comparative Study on Vocabulary Reduction for Phrase Table...

引用

1st Conference on Machine Translation, WMT 2016, held at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016

作者： Kim, Yunsu Guta, Andreas Wuebker, Joern Ney, Hermann Human Language Technology and Pattern Recognition Group RWTH Aachen University Aachen Germany Lilt Inc

ISBN: (纸本)9781945626104

This work systematically analyzes the smoothing effect of vocabulary reduction for phrase translation models. We extensively compare various word-level vocabularies to show that the performance of smoothing is not significantly affected by the choice of vocabulary. This result provides empirical evidence that the standard phrase translation model is extremely sparse. Our experiments also reveal that vocabulary reduction is more effective for smoothing large-scale phrase tables. © 2016 Association for Computational Linguistics

关键词： Translation (languages)

来源：评论

学校读者我要写书评

暂无评论

CharacTER: Translation Edit Rate on Character Level 1

CharacTER: Translation Edit Rate on Character Level

引用

1st Conference on Machine Translation, WMT 2016, held at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016

作者： Wang, Weiyue Peter, Jan-Thorsten Rosendahl, Hendrik Ney, Hermann Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University Aachen52056 Germany

ISBN: (纸本)9781945626104

Recently, the capability of character-level evaluation measures for machine translation output has been confirmed by several metrics. This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level. The novel metric shows high system-level correlation with human rankings, especially for morphologically rich languages. It outperforms the strong CHRF by up to 7% correlation on different metric tasks. In addition, we apply the hypothesis sentence length for normalizing the edit distance in CharacTER, which also provides significant improvements compared to using the reference sentence length. © 2016 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Interactive-predictive translation based on multiple word-segments 19

Interactive-predictive translation based on multiple word-se...

引用

19th Annual Conference of the European Association for Machine Translation, EAMT 2016

作者： Domingo, Miguel Peris, Álvaro Casacuberta, Francisco Pattern Recognition and Human Language Technology Research Center Camino de Vera s/n Valencia46022 Spain

Current machine translation systems require human revision to produce high-quality translations. This is achieved through a post-editing process or by means of an interactive human-computer collaboration. Most protocols belonging to the last scenario follow a left-to-right strategy, where the prefix of the translation is iteratively increased by successive validations and corrections made by the user. In this work, we propose a new interactive protocol which allows the user to validate all correct word sequences in the translation generated by the system, breaking the left-to-right barrier. We evaluated our proposal through simulated experiments, obtaining large reductions of the human effort. © 2016, European Association for Machine Translation. All rights reserved.

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

The RWTH Aachen University English-Romanian Machine Translation System for WMT 2016 1

The RWTH Aachen University English-Romanian Machine Translat...

引用

1st Conference on Machine Translation, WMT 2016, held at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016

作者： Peter, Jan-Thorsten Alkhouli, Tamer Guta, Andreas Ney, Hermann Human Language Technology and Pattern Recognition Group Computer Science Department RWTH Aachen University AachenD-52056 Germany

ISBN: (纸本)9781945626104

This paper describes the statistical machine translation system developed at RWTH Aachen University for the English→Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). We combined three different state-of the- art systems in a system combination: A phrase-based system, a hierarchical phrase-based system and an attentionbased neural machine translation system. The phrase-based and the hierarchical phrase-based systems make use of a language model trained on all available data, a language model trained on the bilingual data and a word class language model. In addition, we utilized a recurrent neural network language model and a bidirectional recurrent neural network translation model for reranking the output of both systems. The attention-based neural machine translation system was trained using all bilingual data together with the backtranslated data from the News Crawl 2015 corpora. © 2016 Association for Computational Linguistics.

关键词： Neural machine translation

来源：评论

学校读者我要写书评

暂无评论

Robust online multi-channel speech recognition

Robust online multi-channel speech recognition

引用

12. ITG-Fachtagung Sprachkommunikation - 12th ITG Conference on Speech Communication

作者： Kitza, Markus Zeyer, Albert Schlüter, Ralf Heymann, Jahn Haeb-Umbach, Reinhold Human Language Technology and Pattern Recognition RWTH Aachen Aachen52074 Germany Department of Communications Engineering Paderborn Paderborn University Paderborn33098 Germany

ISBN: (纸本)9783800742752

In this paper we present a system for robust online far-field multi-channel speech recognition with minimal assumptions on microphone configuration and target location. We employ an online-enabled Generalized Eigenvalue (GEV) beamformer and a Long Short-Term Memory (LSTM) network to robustly calculate the signal statistics necessary for the beamforming operation in the front-end. After multiple channels have been condensed to one, a Bidirectional Long Short-Term Memory (BLSTM) acoustic model is applied on a running window of input speech. This enables online decoding in combination with the beamforming front-end. To assess the performance of the system we test it on the real evaluation set of the CHiME 3 data where we achieve a Word Error Rate (WER) of 10.4 %. © 2016 VDE VERLAG GMBH.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

The CTC and its intriguing "blank" label: Comparative study of neural network training methods for handwriting recognition

The CTC and its intriguing "blank" label: Comparative study ...

引用

Conference en Recherche d'Informations et Applications, CORIA 2016, 13th French Information Retrieval Conference, CIFED 2016, et Colloque International Francophone sur l'Ecrit et le Document - Conference on Information Retrieval and its Applications, CORIA 2016, 13th French Information Retrieval Conference, CIFED 2016, and International French-Speaking Colloquium on Writing and Documentation

作者： Bluche, Théodore Kermorvant, Christopher Ney, Hermann Louradour, Jérôme A2iA SA Paris France Teklia SASU Paris France Human Language Technology and Pattern Recognition RWTH Aachen Allemagne Germany

In recent years, Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) trained with the Connectionist Temporal Classification (CTC) objective won many international handwriting recognition evaluations. The CTC algorithm is based on a forward-backward procedure, avoiding the need of a segmentation of the input before training. The network outputs are characters labels, and a special non-character label. On the other hand, in the hybrid Neural Network / Hidden Markov Models (NN/HMM) framework, networks are trained with framewise criteria to predict state labels. In this paper, we show that CTC training is close to forward-backward training of NN/HMMs, and can be extended to more standard HMM topologies. We apply this method to Multi-Layer Perceptrons (MLPs), and investigate the properties of CTC, especially the role of the special label.

关键词： Character recognition

来源：评论

学校读者我要写书评

暂无评论

A multimodal crowdsourcing framework for transcribing historical handwritten documents 16

A multimodal crowdsourcing framework for transcribing histor...

引用

16th ACM Symposium on Document Engineering, DocEng 2016

作者： Granell, Emilio Martínez-Hinarejos, Carlos D. Pattern Recognition and Human Language Technology Research Center Universitat Politècnica de València Camino Vera s/n 46022 Valencia Spain

ISBN: (纸本)9781450344388

Transcription of handwritten historical documents is one of the main topics in document analysis systems, due to cultural reasons. State-of-the-art handwritten text recognition systems allow to speed up the transcription task. Currently, this automatic transcription is far from perfect, and human expert revision is required in order to obtain the actual transcription. In this context, crowdsourcing emerged as a powerful tool for massive transcription at a relatively low cost, since the supervision effort of professional transcribers may be dramatically reduced. However, current transcription crowdsourcing platforms are mainly limited to the use of nonmobile devices, since the use of keyboards in mobile devices is not friendly enough for most users. This work presents the alternative of using speech dictation of handwritten text lines as transcription source in a crowdsourcing platform. The experiments explore how an initial handwritten text recognition hypothesis can be improved by using the contribution of speech recognition from several speakers, providing as a final result a better hypothesis to be amended by a professional transcriber with less effort.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Bridging the Native language and language Variety Identification Tasks

引用

Procedia Computer Science 2017年 112卷 1554-1561页

作者： Marc Franco-Salvador Greg Kondrak Paolo Rosso Symanto Research 90425 Nuremberg Germany Pattern Recognition and Human Language Technology (PRHLT) Research Center Universitat Politècnica de València 46022 Valencia Spain Department of Computing Science University of Alberta Edmonton AB T6G 2E8 Canada

The objective of Native language Identification is to determine the native language of the author of a text that he or she wrote in another language. By contrast, language Variety Identification aims at classifying texts representing different varieties of a single language. We postulate that both tasks may be reduced to a single objective, which is to identify the language variety of the text. We design a general approach that combines string kernels and word embeddings, which capture different characteristics of texts. The results of our experiments show that the approach achieves excellent results on both tasks, without any task-specific adaptations.

关键词： Native language Identification language Variety Identification String Kernels Word Embeddings Classifier Combination

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：