In this work, we propose a graph-based approach to computing similarities between words in an unsupervised manner, and take advantage of heterogeneous feature types in the process. The approach is based on the creatio...
详细信息
Pre-trained models like Bidirectional Encoder Representations from Transformers (BERT), have recently made a big leap forward in naturallanguageprocessing (NLP) tasks. However, there are still some shortcomings in t...
详细信息
Text adventure games, in which players must make sense of the world through text descriptions and declare actions through text descriptions, provide a stepping stone toward grounding action in language. Prior work has...
详细信息
In this paper, we explore strategies to evaluate models for the task research paper novelty detection: Given all papers released at a given date, which of the papers discuss new ideas and influence future research? We...
详细信息
We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): graph Spanning based Parsing (GSP). One novel characteristic of GSP is that it constructs a parse graph increment...
详细信息
ISBN:
(纸本)9781950737901
We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): graph Spanning based Parsing (GSP). One novel characteristic of GSP is that it constructs a parse graph incrementally in a top-down fashion. Starting from the root, at each step, a new node and its connections to existing nodes will be jointly predicted. The output graph spans the nodes by the distance to the root, following the intuition of first grasping the main ideas then digging into more details. The core semantic first principle emphasizes capturing the main ideas of a sentence, which is of great interest. We evaluate our model on the latest AMR sembank and achieve the state-of-the-art performance in the sense that no heuristic graph re-categorization is adopted. More importantly, the experiments show that our parser is especially good at obtaining the core semantics.
The first task of statistical computational linguistics, or any other type of datadriven processing of language, is the extraction of counts and distributions of phenomena. This is much more difficult for the type of ...
详细信息
ISBN:
(纸本)193243254X
The first task of statistical computational linguistics, or any other type of datadriven processing of language, is the extraction of counts and distributions of phenomena. This is much more difficult for the type of complex structured data found in treebanks and in corpora with sophisticated annotation than for tokenized texts. Recent developments in data mining, particularly in the extraction of frequent subtrees from treebanks, offer some solutions. We have applied a modified version of the TreeMiner algorithm to a small treebank and present some promising results.
We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models)...
详细信息
ISBN:
(纸本)193243254X
We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models). These have a variety of practical applications, ranging from spam detection to speech recognition, and dialectometrical methods on the social graph. Users should be able to view any content in their language (even if it is spoken by a small population), and to browse our site with appropriately translated interface (automatically generated, for locales with little crowd-sourced community effort).
Review quality is determined by identifying the relevance of a review to a submission (the article or paper the review was written for). We identify relevance in terms of the semantic and syntactic similarities betwee...
详细信息
We learn graph-based similarity measures for the task of extracting word synonyms from a corpus of parsed text. A constrained graph walk variant that has been successfully applied in the past in similar settings is sh...
详细信息
Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based ass...
详细信息
暂无评论