Prevalent use of human language in computer systems and language oriented document processing, especially in the web, caused creating the need of designing mechanisms for developing natural languageprocessing. In thi...
详细信息
ISBN:
(纸本)9788988678206
Prevalent use of human language in computer systems and language oriented document processing, especially in the web, caused creating the need of designing mechanisms for developing natural languageprocessing. In this paper, a survey in Natural languageprocessing Laboratory has been done and a proposed framework for its development is suggested. Achieving this goal, the specifications, structures and activities of available Natural languageprocessing Laboratories have been reviewed and valuable results are obtained. Afterward, a proposed framework is established. this framework introduces requirements of Laboratory in a structural way that providing solutions for it any clear plan in NLP Laboratories. Finally, according to National requirements of Iran and characteristics of Persian language, a Natural languageprocessing Laboratory in discussed framework is designed. the activities of this laboratory includes basic tasks like creating linguistic resources (Corpora, Bilingual corpora, Lexicon, etc), test datasets, developing Natural languageprocessing System's Evaluation Framework, development of basic tools (Parser, POS-tagger, etc), research tasks like defining areas for universities, and services such as application development, natural languageprocessing system evaluation, and granting of the appropriate certificates.
the definition and specification of infrastructure or framework for GIS web services are given to provide a solution to the issue of format and information diversity. the need for a definition of this kind is due to t...
详细信息
Many natural languageprocessing tasks, including information extraction, question answering and recognizing textual entailment, require analysis of the polarity, focus of polarity, tense, aspect, mood and source of t...
详细信息
ISBN:
(纸本)9782951740860
Many natural languageprocessing tasks, including information extraction, question answering and recognizing textual entailment, require analysis of the polarity, focus of polarity, tense, aspect, mood and source of the event mentions in a text in addition to its predicate argument structure analysis. We refer to modality, polarity and other associated information as extended modality. In this paper, we propose a new annotation scheme for representing the extended modality of event mentions in a sentence. Our extended modality consists of the following seven components: Source, Time, Conditional, Primary modality type, Actuality, Evaluation and Focus. We reviewed the literature about extended modality in Linguistics and Natural languageprocessing (NLP) and defined appropriate labels of each component. In the proposed annotation scheme, information of extended modality of an event mention is summarized at the core predicate of the event mention for immediate use in NLP applications. We also report on the current progress of our manual annotation of a Japanese corpus of about 50,000 event mentions, showing a reasonably high ratio of inter-annotator agreement.
In web pages, the reviews are written in natural language and are unstructured-free-texts scheme. Online product reviews is considered as a significant informative resource which is useful for both potential customers...
详细信息
the stress testing of AI-based systems differs from the approach taken for more traditional web services, both in terms of the design of test cases and the metrics used to measure quality. the expected variability in ...
详细信息
Large-scale comparable corpora became more abundant and accessible than parallel corpora, withthe explosive growth of the World Wide web. From the Cross-languageinformation Retrieval point of view, limitation of tra...
详细信息
ISBN:
(纸本)9783642147692
Large-scale comparable corpora became more abundant and accessible than parallel corpora, withthe explosive growth of the World Wide web. From the Cross-languageinformation Retrieval point of view, limitation of translation resources as well as ambiguity arising due to failure to translate query terms is largely responsible for large drops in the effectiveness below monolingual performance. therefore, strategies on bilingual terminology extraction from comparable texts must be given more attention in order to enrich existing bilingual lexicons and thesauri and to enhance Cross-languageinformation Retrieval. In the present paper, we focus on the enhancement of Cross-languageinformation Retrieval using a two-stage corpus-based translation model that includes bi-directional extraction of bilingual terminology from comparable corpora and selection of best translation alternatives on the basis of their morphological knowledge. the impact of comparable corpora on the performance of the Cross-languageinformation Retrieval process is evaluated in this study and the results indicate that the effect is clearly positive, especially when using the linear combination with bilingual dictionaries and Japanese-English pair of languages.
Standards for systems and software lifecycle processes have become rather popular in the last decade. Being expressed in natural language, their requirements, or clauses, are exposed to the risk of ambiguity, vaguenes...
详细信息
ISBN:
(纸本)9780769542416
Standards for systems and software lifecycle processes have become rather popular in the last decade. Being expressed in natural language, their requirements, or clauses, are exposed to the risk of ambiguity, vagueness and subjectivity, even when safety of people and environment is the Standard's main concern. the paper addresses some issues of this problem and presents an experimental approach to the determination and evaluation of a set of properties of the clauses, which capture the notion of the quality of their expressions. the approach adopts a rather intuitive quality model for the English language and includes the use of a tool for sentence processing. Results of a descriptive analysis of some well-known, safety-related Standards for different software application domains are shown and discussed.
this paper presents a comparison of three computational approaches to selectional preferences: (i) an intuitive distributional approach that uses second-order co-occurrence of predicates and complement properties;(ii)...
详细信息
ISBN:
(纸本)2951740867
this paper presents a comparison of three computational approaches to selectional preferences: (i) an intuitive distributional approach that uses second-order co-occurrence of predicates and complement properties;(ii) an EM-based clustering approach that models the strengths of predicate-noun relationships by latent semantic clusters;and (iii) an extension of the latent semantic clusters by incorporating the MDL principle into the EM training, thus explicitly modelling the predicate-noun selectional preferences by WordNet classes. We describe various experiments on German data and two evaluations, and demonstrate that the simple distributional model outperforms the more complex cluster-based models in most cases, but does itself not always beat the powerful frequency baseline.
In the Low Countries, a major reference corpus for written Dutch is currently being built. In this paper, we discuss the interplay between data acquisition and data processing during the creation of the SoNaR Corpus. ...
详细信息
ISBN:
(纸本)2951740867
In the Low Countries, a major reference corpus for written Dutch is currently being built. In this paper, we discuss the interplay between data acquisition and data processing during the creation of the SoNaR Corpus. Based on recent developments in traditional corpus compiling and new web harvesting approaches, SoNaR is designed to contain 500 million words, balanced over 36 text types including both traditional and new media texts. Beside its balanced design, every text sample included in SoNaR will have its IPR issues settled to the largest extent possible. this data collection task presents many challenges because every decision taken on the level of text acquisition has ramifications for the level of processing and the general usability of the corpus later on. As far as the traditional text types are concerned, each text brings its own processing requirements and issues. For new media texts - SMS, chat - the problem is even more complex, issues such as anonimity, recognizability and citation right, all present problems that have to be tackled one way or another. the solutions may actually lead to the creation of two corpora: a gigaword SoNaR, IPR-cleared for research purposes, and the smaller - of commissioned size - more privacy compliant SoNaR, IPR-cleared for commercial purposes as well.
web Services Description language (WSDL) allows a structured way to standardize the description of web Services, exploiting XML for the exchange of structured information. Nevertheless XML supports little interoperabi...
详细信息
暂无评论