Semantic information retrieval has been a topic of great interest in recent years. Its purpose is to improve the effectiveness of information retrieval methods by exploiting the semantic information in documents and q...
详细信息
This paper describes the topic-based Chinese message polarity classification system submitted by LCYS-TEAM at SIGHAN8-Task2. The system mainly includes two parts: 1) a graph-based ranking model integrating local and g...
详细信息
Named entity recognition (NER) plays an important role in the NLP literature. The traditional methods tend to employ large annotated corpus to achieve a high performance. Different with many semi-supervised learning m...
详细信息
This paper deals with the problem of automatic topic identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text cate...
详细信息
ISBN:
(纸本)9781467375818
This paper deals with the problem of automatic topic identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text categories. Unfortunately, most of the proposed methods are effective in clean and long texts. In this research work, we use an in-house dataset of noisy Arabic texts, which are collected from several Arabic discussion forums related to 6 topics. In this investigation, we propose a graph approach called LIGA for topic identification task. This approach was firstly introduced for language identification field. Moreover, we propose two other extensions in order to enhance LIGA performances. The experiments undergone on the Arabic dataset have shown quite interesting performances, reaching about 98% of accuracy.
The BI ()-structure, which highlights a contrasting characteristic between two items, is the key comparative sentence structure in Chinese. In this paper, we explore the methods of extracting the 6 constituents of the...
详细信息
The proceedings contain 14 papers. The topics discussed include: style & identity recognition;Slavonic corpus for stylometry research;the initial study of term vector generation methods for news summarization;towa...
ISBN:
(纸本)9788026309741
The proceedings contain 14 papers. The topics discussed include: style & identity recognition;Slavonic corpus for stylometry research;the initial study of term vector generation methods for news summarization;towards automatic finding of word sense changes in time;converting the corpus query language to the naturallanguage;concurrent processing of text corpus queries;bilingual terminology extraction in sketch engine;corpus based extraction of hypernyms;software and data for corpus pattern analysis;semantic regularity of derivational relations;AST: new tool for logical analysis of sentences based on transparent intensional logic;annotation of multi-word expressions in Czech texts;TIL as hyperintensional logic for naturallanguage analysis;and generating Czech Iambic verse.
This paper provides an overview of the Sheffield University submission to the WMT15 Translation Task for the Finnish- English language pair. The submitted translations were created from a system built using the CDEC d...
详细信息
In this paper, we present our contribution in INEX 2015 Social Book Search Track. This track aims to exploit social information (users reviews, ratings, etc. . . ) from LibraryThing and Amazon collections. We used tra...
详细信息
In this paper, we present our contribution in INEX 2015 Social Book Search Track. This track aims to exploit social information (users reviews, ratings, etc. . . ) from LibraryThing and Amazon collections. We used traditional information retrieval models, namely, InL2 and the Sequential Dependence Model (SDM) and tested their combination. We integrated tools from naturallanguageprocessing (NLP) and approaches based on graph analysis to improve the recommendation performances.
Constructing standard and computable clinical diagnostic criteria is an important and challenging research area in clinical informatics community. In this study, we present our framework and methods for representing c...
详细信息
Scoring short-Answer questions has disadvantages that may take long time to grade and may be an issue on consistency in scoring. To alleviate the disadvantages, automated scoring systems are widely used in America or ...
详细信息
暂无评论