After a brief overview of the elements of modern grid computing, a number of common use-cases of natural languageprocessing tasks running on the grid are presented, notably corpus annotation with morpho-syntactic tag...
详细信息
ISBN:
(纸本)9782951740860
After a brief overview of the elements of modern grid computing, a number of common use-cases of natural languageprocessing tasks running on the grid are presented, notably corpus annotation with morpho-syntactic tagging (600+ million-word corpus in one day), n-gram statistics processing of a corpus and web-accessible services with annotation and term-extraction as examples. Implementation considerations and common problems of using grid for this type of tasks are laid out. Finally, a simple action plan for evolving the infrastructure created for these experiments into a fully functional Human languagetechnology grid Virtual Organization is given withthe goal to make the power of European grid infrastructure available to the linguistic community.
the proceedings contain 641 papers. the topics discussed include: using linear interpolation and weighted reordering hypotheses in the Moses system;creating a reusable English-Chinese parallel corpus for bilingual dic...
ISBN:
(纸本)2951740867
the proceedings contain 641 papers. the topics discussed include: using linear interpolation and weighted reordering hypotheses in the Moses system;creating a reusable English-Chinese parallel corpus for bilingual dictionary construction;test suite design for ontology concept recognition systems;studying word sketches for Russian;a python toolkit for universal transliteration;FOLKER: an annotation tool for efficient transcription of natural, multi-party interaction;FreeLing 2.1: five years of open-source languageprocessing tools;a bilingual dictionary Mexican sign language-Spanish/Spanish-Mexican sign language;a general methodology for equipping ontologies with time;comment extraction from blog posts and its applications to opinion mining;the web library of Babel: evaluating genre collections;a general method for creating a bilingual transliteration dictionary;the sign linguistics Corpora network: towards standards for signed language resources;an annotated dataset for extracting definitions and hypernyms from the web;an evolving eScience environment for research data in linguistics;classifying action items for semantic email;automatic acquisition of parallel corpora from websites with dynamic content;and united we stand: improving sentiment analysis by joining machine learning and rule based methods.
Architecture description language (ADL) provides a linguistic approach to represent software architecture. Usually, it has to develop a new ADL for a particular domain. Some domains are difficult to develop the ADLs, ...
详细信息
In the past decade the massive growth of the Internet brought huge changes in the way humans live their daily life;however, the biggest concern with rapid growth of digital information is how to efficiently manage and...
详细信息
Nowadays, novel applications, such as personalized e-commerce services, call for cooperation across enterprise boundaries. Service-Oriented-Architecture (SOA) forms a solution to build loosely coupled distributed appl...
详细信息
the current financial supervision model of China is isolated supervision, and each financial supervision institution has its own financial supervision information system, which may lead to many problems like lacking c...
详细信息
this study examines how the Latent Dirichlet Allocation (LDA) model combined with natural languageprocessing techniques can be used to identify hot topics from free-text customer reviews. To verify the validity of th...
详细信息
In this paper we present a relation extraction system which uses a combination of pattern based, structure based and statistical approaches. this system uses raw texts and Wikipedia articles to learn conceptual relati...
详细信息
In recent years, due to the popularity of Internet, more and more people use online dictionaries instead of the traditional dictionaries. However, the functions of the existing online Mandarin-Taiwan dialects dictiona...
详细信息
the Text Analysis conference (TAC) is a series of Natural languageprocessing evaluation workshops organized by the National Institute of Standards and technology. the Knowledge Base Population (KBP) track at TAC 2009...
详细信息
ISBN:
(纸本)9782951740860
the Text Analysis conference (TAC) is a series of Natural languageprocessing evaluation workshops organized by the National Institute of Standards and technology. the Knowledge Base Population (KBP) track at TAC 2009, a hybrid descendant of the TREC Question Answering track and the Automated Content Extraction (ACE) evaluation program, is designed to support development of systems that are capable of automatically populating a knowledge base withinformation about entities mined from unstructured text. An important component of the KBP evaluation is the Entity Linking task, where systems must accurately associate text mentions of unknown Person (PER), Organization (ORG), and Geopolitical (GPE) names to entries in a knowledge base. Linguistic Data Consortium (LDC) at the University of Pennsylvania creates and distributes linguistic resources including data, annotations, system assessment, tools and specifications for the TAC KBP evaluations. this paper describes the 2009 resource creation efforts, with particular focus on the selection and development of named entity mentions for the Entity Linking task evaluation.
暂无评论