In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciat...
详细信息
ISBN:
(纸本)193243254X
In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciation is not new, the simultaneous identification of the sound correspondences giving rise to the geographical clustering presents a novel opportunity in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant sound correspondences which co-varied with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual sound correspondences which are associated, even while seeking groups of sites which share sound correspondences. We show that the application of this method results in clear and sensible geographical groupings and discuss the concomitant sound correspondences.
This paper describes the system designed by the Baidu PGL Team which achieved the first place in the Textgraphs 2020 Shared Task. The task focuses on generating explanations for elementary science questions. Given a q...
详细信息
Commonsense reasoning aims to empower machines with the human ability to make presumptions about ordinary situations in our daily life. In this paper, we propose a textual inference framework for answering commonsense...
详细信息
ISBN:
(纸本)9781950737901
Commonsense reasoning aims to empower machines with the human ability to make presumptions about ordinary situations in our daily life. In this paper, we propose a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences. The framework first grounds a question-answer pair from the semantic space to the knowledge-based symbolic space as a schema graph, a related sub-graph of external knowledge graphs. It represents schema graphs with a novel knowledge-aware graph network module named KAGNE T, and finally scores answers with graph representations. Our model is based on graph convolutional networks and LSTMs, with a hierarchical path-based attention mechanism. The intermediate attention scores make it transparent and interpretable, which thus produce trustworthy inferences. Using ConceptNet as the only external resource for BERT-based models, we achieved state-of-the-art performance on the CommonsenseQA, a large-scale dataset for commonsense reasoning. We open-source our code1 to the community for future research in knowledge-aware commonsense reasoning.
Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical grap...
详细信息
Generating domain-specific content such as legal clauses based on minimal user-provided information can be of significant benefit in automating legal contract generation. In this paper, we propose a controllable graph...
详细信息
Similarity measures for text have historically been an important tool for solving information retrieval problems. In this paper we consider extended similarity metrics for documents and other objects embedded in graph...
详细信息
We introduce a stochastic graph-based method for computing relative importance of textual units for naturallanguageprocessing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on...
详细信息
We introduce a stochastic graph-based method for computing relative importance of textual units for naturallanguageprocessing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-basedmethods (including LexRank) outperform both centroid-basedmethods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.
The proceedings contain 13 papers. The topics discussed include: structured databases of named entities from Bayesian nonparametrics;unsupervised cross-lingual lexical substitution;reducing the size of the representat...
ISBN:
(纸本)1937284131
The proceedings contain 13 papers. The topics discussed include: structured databases of named entities from Bayesian nonparametrics;unsupervised cross-lingual lexical substitution;reducing the size of the representation for the uDOP-estimate;evaluating unsupervised learning for naturallanguageprocessing tasks;unsupervised language-independent name translation mining from Wikipedia infoboxes;twitter polarity classification with label propagation over lexical links and the follower graph;unsupervised concept annotation using latent Dirichlet allocation and segmental methods;and unsupervised alignment for segmental-basedlanguage understanding.
This work is on a previously formalized semantic evaluation task of spatial role labeling (SpRL) that aims at extraction of formal spatial meaning from text. Here, we report the results of initial efforts towards expl...
详细信息
graph-based techniques have gained traction for representing and analyzing data in various naturallanguageprocessing (NLP) tasks. Knowledge graph-basedlanguage representation models have shown promising results in ...
详细信息
暂无评论