We consider the task of named entity recognition for Chinese social media. The long line of work in Chinese NER has focused on formal domains, and NER for social media has been largely restricted to English. We presen...
详细信息
We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machi...
详细信息
In 2015 NIST coordinated the first language recognition evaluation (LRE) that used i-vectors as input, with the goals of attracting researchers outside of the speechprocessing community to tackle the language recogni...
详细信息
Automated geolocation of social media messages can benefit a variety of downstream applications. However, these geolocation systems are typically evaluated without attention to how changes in time impact geolocation. ...
详细信息
Multiview LSA (MVLSA) is a generalization of Latent Semantic Analysis (LSA) that supports the fusion of arbitrary views of data and relies on Generalized Canonical Correlation Analysis (GCCA). We present an algorithm ...
详细信息
In many applications of machine listening it is useful to know how well an automatic speech recognition system will do before the actual recognition is performed. In this study we investigate different performance mea...
详细信息
The lack of demographic information available when conducting passive analysis of social media content can make it difficult to compare results to traditional survey results. We present DEMOGRAPHER,1 a tool that predi...
详细信息
The automatic induction of scripts (Schank and Abelson, 1977) has been the focus of many recent works. In this paper, we employ a variety of these methods to learn Schank and Abelson's canonical restaurant script,...
ISBN:
(纸本)9781941643396
The automatic induction of scripts (Schank and Abelson, 1977) has been the focus of many recent works. In this paper, we employ a variety of these methods to learn Schank and Abelson's canonical restaurant script, using a novel dataset of restaurant narratives we have compiled from a website called "Dinners from Hell." Our models learn narrative chains, script-like structures that we evaluate with the "narrative cloze" task (Chambers and Jurafsky, 2008).
Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved per...
详细信息
We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an ...
暂无评论