Determining similar objects is a fundamental operation both in data mining tasks such as clustering and in query-driven object retrieval. By definition of similarity search, query objects can only be imprecise descrip...
详细信息
ISBN:
(纸本)9781605585123
Determining similar objects is a fundamental operation both in data mining tasks such as clustering and in query-driven object retrieval. By definition of similarity search, query objects can only be imprecise descriptions of what users are looking for in a database, and even high-quality similarity measures can only be approximations of the users' notion of similarity. To overcome these shortcomings, iterative query refinement systems have been proposed. They utilize user feedback regarding the relevance of intermediate results to adapt the query object and/or the similarity measure. We propose an optimization-based relevance feedback approach for adaptable distance measures - focusing on the Earth Mover's Distance. Our technique enables quicker iterative database exploration as shown by our experiments. Copyright 2009 ACM.
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h a...
详细信息
We present a Minimum Bayes Risk (MBR) decoder for statistical machine translation. The approach aims to minimize the expected loss of translation errors with regard to the BLEU score. We show that MBR decoding on N-be...
详细信息
In phrase-based statistical machine translation, the phrase-table requires a large amount of memory. We will present an efficient representation with two key properties: on-demand loading and a prefix tree structure f...
详细信息
Transcription of handwritten historical documents is one of the main topics in document analysis systems, due to cultural reasons. State-of-the-art handwritten text recognition systems allow to speed up the transcript...
详细信息
ISBN:
(纸本)9781450344388
Transcription of handwritten historical documents is one of the main topics in document analysis systems, due to cultural reasons. State-of-the-art handwritten text recognition systems allow to speed up the transcription task. Currently, this automatic transcription is far from perfect, and human expert revision is required in order to obtain the actual transcription. In this context, crowdsourcing emerged as a powerful tool for massive transcription at a relatively low cost, since the supervision effort of professional transcribers may be dramatically reduced. However, current transcription crowdsourcing platforms are mainly limited to the use of nonmobile devices, since the use of keyboards in mobile devices is not friendly enough for most users. This work presents the alternative of using speech dictation of handwritten text lines as transcription source in a crowdsourcing platform. The experiments explore how an initial handwritten text recognition hypothesis can be improved by using the contribution of speech recognition from several speakers, providing as a final result a better hypothesis to be amended by a professional transcriber with less effort.
We present discriminative reordering models for phrase-based statistical machine translation. The models are trained using the maximum entropy principle. We use several types of features: based on words, based on word...
详细信息
Current neural translation networks are based on an effective attention mechanism that can be considered as an implicit probabilistic notion of alignment. Such architectures do not guarantee a high quality alignment, ...
详细信息
This paper describes an efficient method to extract large n-best lists from a word graph produced by a statistical machine translation system. The extraction is based on the k shortest paths algorithm which is efficie...
详细信息
We give an overview of the RWTH phrase-based statistical machine translation system that was used in the evaluation campaign of the International Workshop on Spoken language Translation (IWSLT) 2006. The system was ra...
详细信息
The RWTH system for the IWSLT 2007 evaluation is a combination of several statistical machine translation systems. The combination includes Phrase-Based models, a n-gram translation model and a hierarchical phrase mod...
详细信息
暂无评论