Anomaly detection is a longstanding problem with many applications in signal processing. We consider anomaly detection on graphs, a subject which has not previously had treatment in such depth. Our approach is inspire...
详细信息
Anomaly detection is a longstanding problem with many applications in signal processing. We consider anomaly detection on graphs, a subject which has not previously had treatment in such depth. Our approach is inspired largely by previous work, where anomaly detection in an acoustic signal is accomplished by measuring and comparing the distribution of localized measurements to those available from a non-anomalous signal. In similar spirit, we proceed by comparing distributions of vertex invariants to those obtained from non-anomalous graphs. Specifically, we consider homogeneous Erdös-Rényi random graphs (where each vertex is connected independently with equal probability p) to be non-anomalous, and compare them to four classes of heterogeneous alternatives (where a subset of the vertices are connected according to a different process). Our contributions are (1) a novel method of incorporating information from vertex invariants for anomaly detection on graphs, (2) a principled approach to fusing information from an arbitrary number of such statistics, and (3) evaluation on several types of anomalous graphs. We demonstrate superior performance to available state-of-the-art approaches against the specific type of anomalies optimized for, and further demonstrate superior generalization to an entire class of anomalies.
Anomaly detection is a longstanding problem with many applications in signal processing. We consider anomaly detection on graphs, a subject which has not previously had treatment in such depth. Our approach is inspire...
详细信息
We present several novel minimally-supervised models for detecting latent attributes of social media users, with a focus on ethnicity and gender. Previous work on ethnicity detection has used coarse-grained widely sep...
详细信息
Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df (w), the number of documents that mention w at least o...
详细信息
JHU was awarded a long-term contract in January, 2007 to establish and operate a humanlanguagetechnologycenter of excellence (HLTCOE) near the JHU Homewood campus. The HLTCOE's research focused on advanced tech...
JHU was awarded a long-term contract in January, 2007 to establish and operate a humanlanguagetechnologycenter of excellence (HLTCOE) near the JHU Homewood campus. The HLTCOE's research focused on advanced technology for automatically analyzing a wide range of speech text and document image data in multiple languages. The focus of the technical program was on automatic population of knowledge bases from text, proof-of-context experiments for robust speechtechnology and stream characterization from content. These projects addressed key issues in extracting information from large sources of text and speech.
In this paper we give an introduction to using Amazon's Mechanical Turk crowdsourcing platform for the purpose of collecting data for humanlanguage technologies. We survey the papers published in the NAACL-2010 W...
This paper introduces the sparse auto-associative neural network (SAANN) in which the internal hidden layer output is forced to be sparse. This is achieved by adding a sparse regularization term to the original recons...
详细信息
Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for ea...
详细信息
There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural languageprocessing, text classification and information retrieval. Many of these boxes...
详细信息
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform v...
详细信息
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95% on entities mentioned from newswire and 80% on a public test set that was designed to include challenging queries.
暂无评论