Generating coherent discourse is an important aspect in naturallanguage generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structur...
详细信息
ISBN:
(纸本)9781622765034
Generating coherent discourse is an important aspect in naturallanguage generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structures in a model that exceeds the sentence level. We present an important subtask for this overall goal, in which we align predicates across comparable texts, admitting partial argument structure correspondence. The contribution of this work is two-fold: We first construct a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments. Secondly, we present a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts. Our method significantly outperforms other alignment techniques when applied to this novel alignment task, by a margin of at least 6.5 percentage points in F_1-score.
graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of the...
详细信息
ISBN:
(纸本)9781622765034
graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored-without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higher-order edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6-13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while providing the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.
In this paper we propose KOIOS++, which automatically processes naturallanguage queries provided by handwritten input. The system integrates several recent achievements in the area of handwriting recognition, natural...
详细信息
In this paper we propose KOIOS++, which automatically processes naturallanguage queries provided by handwritten input. The system integrates several recent achievements in the area of handwriting recognition, naturallanguageprocessing, information retrieval, and human computer interaction. It uses a knowledge base described by the resource description framework (RDF). Our generic approach first generates a lexicon as background information for the handwritten text recognition. After recognizing a handwritten query, several output hypotheses are sent to a naturallanguageprocessing system in order to generate a structured query (SPARQL query). Subsequently, the query is applied to the given knowledge base and a result graph visualizes the retrieved information. At all stages, the user can easily adjust the intermediate results if there is any undesired outcome. The system is implemented as a web-service and therefore works for handwritten input on digital paper as well as on input on Pen-enabled interactive surfaces. Furthermore, we build on the generic RDF-representation of semantic knowledge which is also used by the linked open data (LOD) initiative. As such, our system works well in various scenarios. We have implemented prototypes for querying company knowledge bases, the DBPedia1, the DBLP computer science bibliography2, and a knowledge base of the DAS 2012.
Word Sense Induction (WSI) is an unsupervised approach for learning the multiple senses of a word. graph-based approaches to WSI frequently represent word co-occurrence as a graph and use the statistical properties of...
详细信息
Millstream systems are a non-hierarchical model of naturallanguage. We describe an incremental method for building Millstream configurations while reading a sentence. This method is based on a lexicon associating wor...
详细信息
The proceedings contain 13 papers. The topics discussed include: structured databases of named entities from Bayesian nonparametrics;unsupervised cross-lingual lexical substitution;reducing the size of the representat...
ISBN:
(纸本)1937284131
The proceedings contain 13 papers. The topics discussed include: structured databases of named entities from Bayesian nonparametrics;unsupervised cross-lingual lexical substitution;reducing the size of the representation for the uDOP-estimate;evaluating unsupervised learning for naturallanguageprocessing tasks;unsupervised language-independent name translation mining from Wikipedia infoboxes;twitter polarity classification with label propagation over lexical links and the follower graph;unsupervised concept annotation using latent Dirichlet allocation and segmental methods;and unsupervised alignment for segmental-basedlanguage understanding.
In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a rando...
详细信息
We propose the use of a nonparametric Bayesian model, the Hierarchical Dirichlet Process (HDP), for the task of Word Sense Induction. Results are shown through comparison against Latent Dirichlet Allocation (LDA), a p...
详细信息
Usually unsupervised dependency parsing tries to optimize the probability of a corpus by modifying the dependency model that was presumably used to generate the corpus. In this article we explore a different view in w...
详细信息
The proceedings contain 25 papers. The topics discussed include: not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text;unsupervised entailment detection betw...
ISBN:
(纸本)9781932432916
The proceedings contain 25 papers. The topics discussed include: not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text;unsupervised entailment detection between dependency graph fragments;learning phenotype mapping for integrating large genetic data;EVEX: a PubMed-scale resource for homology-based generalization of text mining predictions;fast and simple semantic class assignment for biomedical text;the role of information extraction in the design of a document triage application for biocuration;medical entity recognition: a comparison of semantic and statistical methods;automatic acquisition of huge training data for bio-medical named entity recognition;and building frame-based corpus on the basis of ontological domain knowledge.
暂无评论