News describe real-world events of varying granularity, and recognition of internal structure of events is important for automated reasoning over events. We propose an approach for constructing coherent event hierarch...
In online discussions, users often back up their stance with arguments. Their arguments are often vague, implicit, and poorly worded, yet they provide valuable insights into reasons underpinning users' opinions. I...
详细信息
作者:
Šnajder, JanUniversity of Zagreb
Faculty of Electrical Engineering and Computing Text Analysis and Knowledge Engineering Lab. Unska 3 Zagreb10000 Croatia
knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of ***, a large-coverage morphological resource for C...
详细信息
ISBN:
(纸本)9782951740884
knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of ***, a large-coverage morphological resource for Croatian. *** groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal derivation between and within nouns, verbs, and adjectives. We propose two approaches: an unsupervised approach and a knowledge-based approach based on a hand-crafted morphology model but without using any additional lexico-semantic resources. The resource acquisition procedure consists of three steps: corpus preprocessing, acquisition of an inflectional lexicon, and the induction of derivational families. We describe an evaluation methodology based on manually constructed derivational families from which we sample and annotate pairs of lemmas. We evaluate *** on the so-obtained sample, and show that the knowledge-based version attains good clustering quality of 81.2% precision, 76.5% recall, and 78.8% F1-score. As with similar resources for other languages, we expect *** to be useful for a number of NLP tasks.
In news stories, event mentions denote real-world events of different spatial and temporal granularity. Narratives in news stories typically describe some real-world event of coarse spatial and temporal granularity al...
详细信息
ISBN:
(纸本)9782951740884
In news stories, event mentions denote real-world events of different spatial and temporal granularity. Narratives in news stories typically describe some real-world event of coarse spatial and temporal granularity along with its subevents. In this work, we present HiEve, a corpus for recognizing relations of spatiotemporal containment between events. In HiEve, the narratives are represented as hierarchies of events based on relations of spatiotemporal containment (i.e., superevent-subevent relations). We describe the process of manual annotation of HiEve. Furthermore, we build a supervised classifier for recognizing spatiotemporal containment between events to serve as a baseline for future research. Preliminary experimental results are encouraging, with classifier performance reaching 58% F1-score, only 11% less than the inter-annotator agreement.
In this paper we present CroNER, a named entity recognition and classification system for Croatian language based on supervised sequence labeling with conditional random fields (CRF). We use a rich set of lexical and ...
详细信息
In this paper we present CroNER, a named entity recognition and classification system for Croatian language based on supervised sequence labeling with conditional random fields (CRF). We use a rich set of lexical and gazetteer-based features and different methods for enforcing document-level label consistency. Extensive evaluation shows that our method achieves state-of-the-art results (MUC F1 90.73%, Exact F1 87.42%) when compared to existing NERC systems for Croatian and other Slavic languages.
We describe GPKEX, a keyphrase extraction method based on genetic programming. We represent keyphrase scoring measures as syntax trees and evolve them to produce rankings for keyphrase candidates extracted from text. ...
详细信息
Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizin...
详细信息
We propose a simple, yet effective, pipeline architecture for document classification. The task we intend to solve is to classify large and content-wise heterogeneous document streams on a layered nine-category system...
详细信息
ISBN:
(纸本)2951740816
We propose a simple, yet effective, pipeline architecture for document classification. The task we intend to solve is to classify large and content-wise heterogeneous document streams on a layered nine-category system, which distinguishes medical from non-medical texts and sorts medical texts into various subgenres. While the document classification problem is often dealt with using computationally powerful and, hence, costly classifiers (e.g., Bayesian ones), we have gathered empirical evidence that a much simpler approach based on n-gram-statistics achieves a comparable level of classification performance.
We describe the structure of a German-language corpus which contains a variety of medical text genres. Clinical documents (discharge summaries, pathology, histology and surgery reports) are distinguished from non-clin...
ISBN:
(纸本)2951740816
We describe the structure of a German-language corpus which contains a variety of medical text genres. Clinical documents (discharge summaries, pathology, histology and surgery reports) are distinguished from non-clinical ones (textbook articles and consumer health care documents from aWeb portal). After introducing a medical extension of the general-language STTS tagset which accounts for unique features of the medical sublanguage encountered in these documents, we discuss some of the quantitative properties of the annotations (e.g., distribution patterns of part-of-speech tags).
Modeling complex compositional objects in the field of the life sciences requires to solve intricate ontological problems, especially those related to parts of a whole, space and location. For concrete physical entiti...
详细信息
暂无评论