In this paper, we propose a simple and intuitive yet linguistically and practically motivated method for English-Chinese name transliteration generation. Our system is essentially a syllable-based Maximum Matching sys...
详细信息
In this survey we overview graph-based clustering and its applications in computational linguistics. We summarize graph-based clustering as a five-part story: hypothesis, modeling, measure, algorithm and evaluation. W...
详细信息
graph-basedmethods that are en vogue in the social network analysis area, such as centrality models, have been recently applied to linguistic knowledge bases, including unsupervised Word Sense Disambiguation. Althoug...
详细信息
作者:
Biemann, Chris
475 Brannan St Ste. 330 San Francisco CA 94107 United States
This paper examines the influence of features based on clusters of co-occurrences for supervised Word Sense Disambiguation and Lexical Substitution. Cooccurrence cluster features are derived from clustering the local ...
详细信息
This work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the influence of frequency on the process of lexical acquis ition and use. We examine d...
详细信息
These proceedings contain the final versions of the papers presented at the 7th International workshop on Finite-State methods and naturallanguageprocessing (FSMNLP), held in Ispra, Italy, on September 1112, 2008. T...
详细信息
ISBN:
(纸本)9781586039752
These proceedings contain the final versions of the papers presented at the 7th International workshop on Finite-State methods and naturallanguageprocessing (FSMNLP), held in Ispra, Italy, on September 1112, 2008. The aim of the FSMNLP workshops is to bring together members of the research and industrial community working on finite-state based models in language technology, computational linguistics, web mining, linguistics and cognitive science on one hand, and on related theory and methods in fields such as computer science and mathematics on the other. Thus, the workshop series is a forum for researchers and practitioners working on applications as well as theoretical and implementation aspects. The special theme of FSMNLP 2008 was high performance finite-state devices in large-scale naturallanguage text processing systems and applications. The papers in this publication cover a range of interesting NLP applications, including machine learning and translation, logic, computational phonology, morphology and semantics, data mining, information extraction and disambiguation, as well as programming, optimization and compression of finite-state networks. The applied methods include weighted algorithms, kernels and tree automata. In addition, relevant aspects of software engineering, standardization and European funding programs are *** Press is an international science, technical and medical publisher of high-quality books for academics, scientists, and professionals in all fields. Some of the areas we publish in: -Biomedicine -Oncology -Artificial intelligence -Databases and information systems -Maritime engineering -Nanotechnology -Geoengineering -All aspects of physics -E-governance -E-commerce -The knowledge economy -Urban studies -Arms control -Understanding and responding to terrorism -Medical informatics -Computer Sciences
We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to part-of-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar n-gr...
详细信息
The proceedings contain 13 papers. The topics discussed include: lexisla: a legislative information retrieval system;mOCRA: mobile OCR application;enterprise 2.0: plagiarism detection and opinion analysis;NLP techniqu...
The proceedings contain 13 papers. The topics discussed include: lexisla: a legislative information retrieval system;mOCRA: mobile OCR application;enterprise 2.0: plagiarism detection and opinion analysis;NLP techniques & the Internet: searching for opinions and automatic sentiments analysis;naturalOpinions: NLP-based opinion extraction in user-generated content;Babxel: multilingual search;mobile augmented information system;trust based recommendations for social media;approximate retrieval of postal addresses;personalized health information system;naturallanguageprocessing interactive multimodal systems;an approach on improving search engines through social content recommendation;and towards naturallanguage interaction.
Traditionally, popular synonym acquisition methods are based on the distributional hypothesis, and a metric such as Jaccard coefficients is used to evaluate the similarity between the contexts of words to obtain synon...
详细信息
Traditionally, popular synonym acquisition methods are based on the distributional hypothesis, and a metric such as Jaccard coefficients is used to evaluate the similarity between the contexts of words to obtain synonyms for a query. On the other hand, when one tries to compile and clean a thesaurus, one often already has a modest number of synonym relations at hand. Could something be done with a half-built thesaurus alone? We propose the use of spectral methods and discuss their relation to other network-based algorithms in naturallanguageprocessing (NLP), such as Page Rank and Bootstrapping. Since compiling a thesaurus is very laborious, we believe that adding the proposed method to the toolkit of thesaurus constructors would significantly ease the pain in accomplishing this task.
Purpose - This paper sets out to discuss the use of information extraction (IE), a naturallanguage-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The...
详细信息
Purpose - This paper sets out to discuss the use of information extraction (IE), a naturallanguage-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse naturallanguage resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts *** Appellation and *** Object.
暂无评论