The ability to discover illicit behaviour in complex, heterogeneous data is a daunting problem. In the VAST 2014 competition, one of the challenges involves identifying for local law enforcement which employees are in...
详细信息
ISBN:
(纸本)9781479962273
The ability to discover illicit behaviour in complex, heterogeneous data is a daunting problem. In the VAST 2014 competition, one of the challenges involves identifying for local law enforcement which employees are involved and where they should be concentrating their efforts. One approach to handling this problem is a graph-based approach. In this paper, we present a graphbasedanomalydetection approach for discovering suspicious employees and geographic locations.
An important area of data mining is anomalydetection, particularly for fraud. However, little work has been done in terms of detecting anomalies in data that is represented as a graph. In this paper we present graph-...
详细信息
An important area of data mining is anomalydetection, particularly for fraud. However, little work has been done in terms of detecting anomalies in data that is represented as a graph. In this paper we present graph-based approaches to uncovering anomalies in domains where the anomalies consist of unexpected entity/relationship alterations that closely resemble non-anomalous behavior. We have developed three algorithms for the purpose of detecting anomalies in all three types of possible graph changes: label modifications, vertex/edge insertions and vertex/edge deletions. Each of our algorithms focuses on one of these anomalous types, using the minimum description length principle to first discover the normative pattern. Once the common pattern is known, each algorithm then uses a different approach to discover particular anomalous types. In this paper, we validate all three approaches using synthetic data, verifying that each of the algorithms on graphs and anomalies of varying sizes, are able to detect the anomalies with very high detection rates and minimal false positives. We then further validate the algorithms using real-world cargo data and actual fraud scenarios injected into the data set with 100% accuracy and no false positives. Each of these algorithms demonstrates the usefulness of examining a graph-based representation of data for the purposes of detecting fraud.
暂无评论