Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary infor...
详细信息
Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary information, either because they are developed on different datasets, or because they use different features or different modeling approaches. Most authors apply fusion as a final resource for improving performance based on an existing set of systems. Though relative performance gains decrease as larger sets of systems are considered, best performance is usually attained by fusing all the available systems, which may lead to high computational costs. In this paper, we aim to discover which technologies combine the best through fusion and to analyse the factors (data, features, modeling methodologies, etc.) that may explain such a good performance. Results are presented and discussed for a number of systems provided by the participating sites and the organizing team of the Albayzin 2010 language Recognition Evaluation. We hope the conclusions of this work help research groups make better decisions in developing language recognition technology.
Chinese word segmentation is an active area in Chinese language processing though it is suffering from the argument about what precisely is a word in Chinese. Based on corpus-based segmentation standard, we launched t...
详细信息
The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We prop...
详细信息
ISBN:
(纸本)9783642003813
The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization of this method based on the notion of the generalized arithmetic mean instead of the simple average used in the expression to calculate the Monge-Elkan method. The experiments carried out with 12 well-known name-matching data sets show that the proposed approach outperforms the original Monge-Elkan method when character-based measures are used to compare tokens.
For most English words, dictionaries give various senses: e.g., "bank" can stand for a financial institution, shore, set, etc. Automatic selection of the sense intended in a given text has crucial importance...
详细信息
The paper presents a method of automatic enrichment of a very large dictionary of word combinations. The method is based on results of automatic syntactic analysis (parsing) of sentences. The dependency formalism is u...
详细信息
This paper describes a system dedicated to online handwritten sentence recognition. The prototype is made up of two basic processors. The first controls the data acquisition, pentip trace segmentation, letter identifi...
详细信息
This paper describes a system dedicated to online handwritten sentence recognition. The prototype is made up of two basic processors. The first controls the data acquisition, pentip trace segmentation, letter identification. The second aims at identifying and correcting the words candidates by integrating syntactic and lexical information. Sentences are parsed to list the grammatical classes of each incorrect candidates then lexical query searches for words in a lexicon according to grammatical classes. A final decision is made using a string comparison algorithm. Tests of the complete system are reported at the end for a typical writer-dependent application.
Opportunity and Curiosity find similar rocks on Mars. One can generally understand this statement if one knows that Opportunity and Curiosity are instances of the class of Mars rovers, and recognizes that, as signalle...
详细信息
ISBN:
(数字)9783031021787
ISBN:
(纸本)9783031010507
Opportunity and Curiosity find similar rocks on Mars. One can generally understand this statement if one knows that Opportunity and Curiosity are instances of the class of Mars rovers, and recognizes that, as signalled by the word on, rocks are located on Mars. Two mental operations contribute to understanding: recognize how entities/concepts mentioned in a text interact and recall already known facts (which often themselves consist of relations between entities/concepts). Concept interactions one identifies in the text can be added to the repository of known facts, and aid the processing of future texts. The amassed knowledge can assist many advanced language-processing tasks, including summarization, question answering and machine translation. Semantic relations are the connections we perceive between things which interact. The book explores two, now intertwined, threads in semantic relations: how they are expressed in texts and what role they play in knowledge repositories. A historical perspective takes us back more than 2000 years to their beginnings, and then to developments much closer to our time: various attempts at producing lists of semantic relations, necessary and sufficient to express the interaction between entities/concepts. A look at relations outside context, then in general texts, and then in texts in specialized domains, has gradually brought new insights, and led to essential adjustments in how the relations are seen. At the same time, datasets which encompass these phenomena have become available. They started small, then grew somewhat, then became truly large. The large resources are inevitably noisy because they are constructed automatically. The available corpora—to be analyzed, or used to gather relational evidence—have also grown, and some systems now operate at the Web scale. The learning of semantic relations has proceeded in parallel, in adherence to supervised, unsupervised or distantly supervised paradigms. Detailed analyses of annota
暂无评论