We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models)...
详细信息
ISBN:
(纸本)193243254X
We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models). These have a variety of practical applications, ranging from spam detection to speech recognition, and dialectometrical methods on the social graph. Users should be able to view any content in their language (even if it is spoken by a small population), and to browse our site with appropriately translated interface (automatically generated, for locales with little crowd-sourced community effort).
Using the technique of "semantic mirroring" a graph is obtained that represents words and their translations from a parallel corpus or a bilingual lexicon. The connectedness of the graph holds information ab...
详细信息
This paper presents a system that performs skill extraction from text documents. It outputs a list of professional skills that are relevant to a given input text. We argue that the system can be practical for hiring a...
详细信息
On a scientific concept hierarchy, a parent concept may have a few attributes, each of which has multiple values being a group of child concepts. We call these attributes facets: classification has a few facets such a...
详细信息
One research goal in Second language Acquisition (SLA) is to formulate and test hypotheses about errors and the environments in which they are made, a process which often involves substantial effort;large amounts of d...
详细信息
graph-basedmethods that are en vogue in the social network analysis area, such as centrality models, have been recently applied to linguistic knowledge bases, including unsupervised Word Sense Disambiguation. Althoug...
详细信息
We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the p...
详细信息
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of naturallanguageprocessing tasks. However, because knowledge graphs are typically incomplete, it...
详细信息
The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs othe...
详细信息
Word Sense Induction (WSI) is an unsupervised approach for learning the multiple senses of a word. graph-based approaches to WSI frequently represent word co-occurrence as a graph and use the statistical properties of...
详细信息
暂无评论