The number of research publications in various disciplines is growing exponentially. Researchers and scientists are increasingly finding themselves in the position of having to quickly understand large amounts of tech...
详细信息
Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, an...
详细信息
In previous work, we presented several distributional approaches to anomaly detection for a speech activity detector by training a model on purely nominal data and estimating the divergence between it and other input....
详细信息
In previous work, we presented several distributional approaches to anomaly detection for a speech activity detector by training a model on purely nominal data and estimating the divergence between it and other input. Here, we reformulate the problem in an unsupervised framework and allow for anomalous contamination of the training data. After noting the instability of Gaussian mixture models (GMMs) in this context, we focus on non-parametric methods using regularly binned histograms. While the performance of the log likelihood baseline suffered as the amount of contamination was increased, many of the distributional approaches were not affected. We found that the L 1 distance, chi 2 statistic, and information theory divergences consistently outperformed the other methods for a variety of contamination levels and test segment lengths.
Floor control is a scheme used by people to organize speaking turns in multi-party conversations. Identifying the floor control shifts is important for understanding a conversation's structure and would be helpful...
详细信息
ISBN:
(纸本)9781605587721
Floor control is a scheme used by people to organize speaking turns in multi-party conversations. Identifying the floor control shifts is important for understanding a conversation's structure and would be helpful for more natural human computer interaction systems. Although people tend to use verbal and nonverbal cues for managing floor control shifts, only audio cues, e.g., lexical and prosodic cues, have been used in most previous investigations on speaking turn prediction. In this paper, we present a statistical model to automatically detect floor control shifts using both verbal and nonverbal cues. Our experimental results show that using a combination of verbal and nonverbal cues provides more accurate detection. Copyright 2009 ACM.
We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as informa...
详细信息
Morpho Challenge 2008 hosted an extrinsic evaluation of morphological analysis that explored whether unsupervised morphology induction could benefit information retrieval. This paper presents results in alternative me...
详细信息
Morpho Challenge 2008 hosted an extrinsic evaluation of morphological analysis that explored whether unsupervised morphology induction could benefit information retrieval. This paper presents results in alternative methods for word normalization using test sets from the Cross-language Evaluation Forum (CLEF) ad-hoc collections. Preliminary results for the Morpho Challenge 2008 evaluation are consistent with these data. We found that: (1) rule-based stemming is effective in less morphologically complicated languages;(2) alternative methods for stemming such as unsupervised learning of morphemes and least common n-gram stemming are helpful;and, (3) full character n-gram indexing is the most effective form of tokenization in more morphologically complex languages.
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in...
详细信息
Knowing the degree of antonymy between words has widespread applications in natural languageprocessing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We ...
详细信息
Knowing the degree of antonymy between words has widespread applications in natural languageprocessing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We ...
Knowing the degree of antonymy between words has widespread applications in natural languageprocessing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We present a new automatic and empirical measure of antonymy that combines corpus statistics with the structure of a published thesaurus. The approach is evaluated on a set of closest-opposite questions, obtaining a precision of over 80%. Along the way, we discuss what humans consider antonymous and how antonymy manifests itself in utterances.
This paper describes a computational approach to resolving the true referent of a named mention of a person in the body of an email. A generative model of mention generation is used to guide mention resolution. Result...
详细信息
暂无评论