this paper presents the overview of Topic-based Chinese Message Polarity Classification in SIGHAN 2015 bake-off. Topic-based message polarity classification plays an important role in sentiment analysis, information e...
详细信息
Recognition of inscribed characters on stone is an essential part of our work to reveal information's of our ancestors. the concept of Optical Character Recognition is not adopted as the rock surface becomes textu...
详细信息
ISBN:
(纸本)9781509019335
Recognition of inscribed characters on stone is an essential part of our work to reveal information's of our ancestors. the concept of Optical Character Recognition is not adopted as the rock surface becomes texture of an image posing challenges in recovering the text. In this paper, we propose a new feature called Positional Distance Metric which is independent to any language script to address the various issues occurred on stone inscribed images. then Image based and Zone based Normalized Positional Distance Metric (INPDM/ZNPDM) feature is computed and combined with structural and regional features to yield a better recognition rate. We repeated this procedure for all the training images and finally populated in to the list. Nearest Neighbor classifier is used for subsequent classification and recognition of characters. Experiments are performed on 350 characters and showed significant recognition rate.
Withthe amount of textual data available on the web, new methodologies of knowledge extraction domain are provided. Some original methods allow the users to combine different types of data in order to extract relevan...
详细信息
ISBN:
(纸本)9789897581649
Withthe amount of textual data available on the web, new methodologies of knowledge extraction domain are provided. Some original methods allow the users to combine different types of data in order to extract relevant information. In this context, we present the cornerstone of manipulations on textual documents and their preparation for extracting compatible spatial information withthose contained in satellite images. the term footprint is defined and its extraction is performed. In this paper, we describe the general process and some experiments conducted in the ANIMITEX project, which aims to match the information coming from texts withthose of satellite images.
In this study, an automatic classification method based on the sentiment polarity of text is proposed. this method uses two sentiment dictionaries from different sources: the Chinese sentiment dictionary CSWN that int...
详细信息
Six million children below the age of five who perish from preventable causes every year. It all happens because of an insufficiency of information or misinformation about the benefits of immunization. In this paper, ...
详细信息
ISBN:
(纸本)9781509019335
Six million children below the age of five who perish from preventable causes every year. It all happens because of an insufficiency of information or misinformation about the benefits of immunization. In this paper, we concentrate on to construct a complete vaccine information ontology system using web Ontology language (OWL). In this paper, we provide the overall information regarding vaccination. this helps the people to get the information about the importance of immunization. this vaccine ontology was constructed through protege 4.3 and it consists of 28 vaccine information statements accumulated from the CDC website. Salient and pertinent information from the document was recorded and knowledge triples were extracted. By using the collection of knowledge triples, the meta-level conceptualization of the vaccine information ontology was developed. this immunizationontology covers classes, data properties, object properties, individuals and restrictions on a relationship between classes and other description. Finally, HermiT reasoner was used to examine discrepancies.
We add an interpretable semantics to the paraphrase database (PPDB). To date, the relationship between phrase pairs in the database has been weakly defined as approximately equivalent. We show that these pairs represe...
详细信息
the detection and correction of erroneous Chinese characters is an important problem in many applications. this paper proposed an automatic method for correcting erroneous Chinese characters. the method is divided int...
详细信息
Understanding lexical characteristics of clinical documents is the foundation of sublanguage based Medical languageprocessing (MLP) approach. However, there are limited studies focused on the lexical characters of Ch...
详细信息
ISBN:
(纸本)9781467383028
Understanding lexical characteristics of clinical documents is the foundation of sublanguage based Medical languageprocessing (MLP) approach. However, there are limited studies focused on the lexical characters of Chinese clinical documents. In this study, a lexical characteristics analysis on both syntactic and semantic levels was conducted in a clinical corpus which contains 3,500 clinical documents generated during daily practices. the analysis was based on the automatic tagging results of a lexicon-based part-of-speech (POS) and semantic tagging method. the medical lexicon contains 237,291 entries annotated with both semantic and syntactic classes. the normalized frequency of different terms, syntactic and semantic classes was calculated and visualized. Major contribution of this paper is providing a wide-coverage Chinese medical semantic lexicon and presenting the lexical characteristics of Chinese clinical documents. Both of these will lay a good foundation for sublanguage based MLP studies in China.
A growth of data published on the web is still observed. Keyword-based search, used by most search engines, is a common way of information retrieval on the web. Subsequently, keyword-based search may provide a huge am...
详细信息
ISBN:
(纸本)9783319240695;9783319240688
A growth of data published on the web is still observed. Keyword-based search, used by most search engines, is a common way of information retrieval on the web. Subsequently, keyword-based search may provide a huge amount of retrieved valueless information. this problem can be solved by Question Answering System (QAS, QA system). One of the challenging tasks for available QA systems is to understand the natural language questions correctly and deduce the precise meaning to retrieve accurate responses. A significant role of QA and an increasing number of them may cause a problem with selection the most suitable QA system. the general aim of this paper is to provide knowledge-based approach to QA system selection. It should ensure knowledge systematization and help users to find a proper solution that meets their needs.
Named entity recognition (NER) plays an important role in the NLP literature. the traditional methods tend to employ large annotated corpus to achieve a high performance. Different with many semi-supervised learning m...
详细信息
暂无评论