poisson tensor factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of ten...
详细信息
ISBN:
(纸本)9781665423694
poisson tensor factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of tensors with sparse count data values via maximum likelihood estimation. Such an approach reduces to solving a nonlinear, non-convex optimization problem, which can leverage considerable parallel computation due to the structure of the problem. However, since the maximum likelihood estimator corresponds to the global minimizer of this optimization problem, it is important to consider how effective methods are at both leveraging this inherent parallelism as well as computing a good approximation to the global minimizer. In this work we present comparisons of multiple methods for PTF that illustrate the tradeoffs in computational efficiency and accurately computing the maximum likelihood estimator. We present results using synthetic and real-world data tensors to demonstrate some of the challenges when choosing a method for a given tensor.
As the attack surfaces of large enterprise networks grow, anomaly detection systems based on statistical user behavior analysis play a crucial role in identifying malicious activities. Previous work has shown that lin...
详细信息
ISBN:
(纸本)9781728188003
As the attack surfaces of large enterprise networks grow, anomaly detection systems based on statistical user behavior analysis play a crucial role in identifying malicious activities. Previous work has shown that link prediction algorithms based on non-negative matrix factorization learn highly accurate predictive models of user actions. However, most statistical link prediction models have been constructed on bipartite graphs, and fail to capture the nuanced, multi-faceted details of a user's activity profile. This paper establishes a new benchmark for red team event detection on the Los Alamos National Laboratory Unified Host and Network Dataset by applying a tensorfactorization model that exploits the multi-dimensional and sparse structure of user authentication logs. We show that learning patterns of normal activity across multiple dimensions in one unified statistical framework yields improved detection of penetration testing events. We further show operational value by developing fusion methods that can identify anomalous users, source devices, and destination devices in the network.
We present a Bayesian tensorfactorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form &quo...
详细信息
ISBN:
(纸本)9781450336642
We present a Bayesian tensorfactorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form "country i took action a toward country j at time t"-known as dyadic events-in order to form and test theories of international relations. We represent these event data as a tensor of counts and develop Bayesian poisson tensor factorization to infer a low dimensional, interpretable representation of their salient patterns. We demonstrate that our model's predictive performance is better than that of standard non-negative tensorfactorization methods. We also provide a comparison of our variational updates to their maximum likelihood counterparts. In doing so, we identify a better way to form point estimates of the latent factors than that typically used in Bayesian poisson matrix factorization. Finally, we showcase our model as an exploratory analysis tool for political scientists. We show that the inferred latent factor matrices capture interpretable multilateral relations that both conform to and inform our knowledge of international affairs.
Distinguishing malicious anomalous activities from unusual but benign activities is a fundamental challenge for cyber defenders. Prior studies have shown that statistical user behavior analysis yields accurate detecti...
详细信息
Distinguishing malicious anomalous activities from unusual but benign activities is a fundamental challenge for cyber defenders. Prior studies have shown that statistical user behavior analysis yields accurate detections by learning behavior profiles from observed user activity. These unsupervised models are able to generalize to unseen types of attacks by detecting deviations from normal behavior without knowledge of specific attack signatures. However, approaches proposed to date based on probabilistic matrix factorization are limited by the information conveyed in a two-dimensional space. Non-negative tensorfactorization, however, is a powerful unsupervised machine learning method that naturally models multi-dimensional data, capturing complex and multi-faceted details of behavior profiles. Our new unsupervised statistical anomaly detection methodology matches or surpasses state-of-the-art supervised learning baselines across several challenging and diverse cyber application areas, including detection of compromised user credentials, botnets, spam e-mails, and fraudulent credit card transactions.
tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic ...
详细信息
tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) low-rank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or Kullback-Leibler divergence, enabling tensor decomposition for binary or count data. We present a variety of statistically motivated loss functions for various scenarios. We provide a generalized framework for computing gradients and handling missing data that enable the use of standard optimization methods for fitting the model. We demonstrate the flexibility of the GCP decomposition on several real-world examples including interactions in a social network, neural activity in a mouse, and monthly rainfall measurements in India.
tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to dev...
详细信息
tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensorfactorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for poisson tensor factorization called CANDECOMP-PARAFAC alternating poisson regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee-Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mild conditions. We also explain how to implement CP-APR for large-scale sparse tensors and present results on several data sets, both real and simulated.
暂无评论