Analyzing the evolutionary history of the design of object-oriented software is an important and difficult task where matching algorithms play a fundamental role. In this paper, we investigate the applicability of an ...
详细信息
Analyzing the evolutionary history of the design of object-oriented software is an important and difficult task where matching algorithms play a fundamental role. In this paper, we investigate the applicability of an error-correcting graph matching (ECGM) algorithm to object-oriented software evolution. By means of a case study, we report evidence of ECGM applicability in studying the Mozilla class diagram evolution. We collected 144 Mozilla snapshots over the past six years, reverse-engineered class diagrams and recovered traceability links between subsequent class diagrams. Our algorithm allows us to identify evolving classes that maintain a stable structure of relations(associations, inheritances and aggregations) with other classes and thus likely constitute the backbone of Mozilla.
In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate th...
详细信息
ISBN:
(纸本)9781937284190
In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM, where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extensive experiments comparing different settings of the hybrid LM are reported on publicly available benchmarks based on TED talks, from Arabic to English and from English to French. the proposed models show to better exploit in-domain data than conventional word-based LMs for the target language modeling component of a phrase-based statistical machine translation system.
We propose a real-time machine translation system that allows users to select a news category and to translate the related live news articles from Arabic, Czech, Danish, Farsi, French, German, Italian, Polish, Portugu...
详细信息
ISBN:
(纸本)9781622765003
We propose a real-time machine translation system that allows users to select a news category and to translate the related live news articles from Arabic, Czech, Danish, Farsi, French, German, Italian, Polish, Portuguese, Spanish and Turkish into English. the Moses-based system was optimised for the news domain and differs from other available systems in four ways: (1) News items are automatically categorised on the source side, before translation; (2) Named entity translation is optimised by recognising and extracting them on the source side and by re-inserting their translation in the target language, making use of a separate entity repository; (3) News titles are translated with a separate translation system which is optimised for the specific style of news titles; (4) the system was optimised for speed in order to cope withthe large volume of daily news articles.
this work proposes to adapt an existing general SMT model for the task of translating queries that are subsequently going to be used to retrieve information from a target language collection. In the scenario that we f...
详细信息
ISBN:
(纸本)9781937284190
this work proposes to adapt an existing general SMT model for the task of translating queries that are subsequently going to be used to retrieve information from a target language collection. In the scenario that we focus on access to the document collection itself is not available and changes to the IR model are not possible. We propose two ways to achieve the adaptation effect and both of them are aimed at tuning parameter weights on a set of parallel queries. the first approach is via a standard tuning procedure optimizing for BLEU score and the second one is via a reranking approach optimizing for MAP score. We also extend the second approach by using syntax-based features. Our experiments show improvements of 1-2.5 in terms of MAP score over the retrieval withthe non-adapted translation. We show that these improvements are due both to the integration of the adaptation and syntax-features for the query translation task.
the processing of parallel corpus plays very crucial role for improving the overall performance in Phrase Based Statistical Machine Translation systems (PB-SMT). In this paper the automatic alignments of different kin...
详细信息
ISBN:
(纸本)9781622764709
the processing of parallel corpus plays very crucial role for improving the overall performance in Phrase Based Statistical Machine Translation systems (PB-SMT). In this paper the automatic alignments of different kind of chunks have been studied that boosts up the word alignment as well as the machine translation quality. Single-tokenization of Noun-noun MWEs, phrasal preposition (source side only) and reduplicated phrases (target side only) and the alignment of named entities and complex predicates provide the best SMT model for bootstrapping. Automatic bootstrapping on the alignment of various chunks makes significant gains over the previous best English-Bengali PB-SMT system. the source chunks are translated into the target language using the PB-SMT system and the translated chunks are compared withthe original target chunk. the aligned chunks increase the size of the parallel corpus. the processes are run in a bootstrapping manner until all the source chunks have been aligned withthe target chunks or no new chunk alignment is identified by the bootstrapping process. the proposed system achieves significant improvements (2.25 BLEU over the best System and 8.63 BLEU points absolute over the baseline system, 98.74% relative improvement over the baseline system) on an English- Bengali translation task.
We describe a novel method that extracts paraphrases from a bitext, for boththe source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct sep...
详细信息
ISBN:
(纸本)9781937284190
We describe a novel method that extracts paraphrases from a bitext, for boththe source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. the resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial cooccurrence counts with a novel technique. the co-occurrence count distribution belongs to the power-law family.
暂无评论