检索结果-内蒙古大学图书馆

Scaling up Bayesian variational inference using distributed computing clusters

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING 2017年 88卷 435-451页

作者： Masegosa, Andres R. Martinez, Ana M. Langseth, Helge Nielsen, Thomas D. Salmeron, Antonio Ramos-Lopez, Dario Madsen, Anders L. Norwegian Univ Sci & Technol Dept Comp & Informat Sci Trondheim Norway Aalborg Univ Dept Comp Sci Aalborg Denmark Univ Almeria Dept Math Almeria Spain HUGIN EXPERT AS Aalborg Denmark

In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which efficiently support iterative map reduce operations. Our approach is defined as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real world datasets from different domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a financial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favorably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. 75% latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning (http://***) Masegosa et al. (2017) [29]. (C) 2017 Published by Elsevier Inc.

关键词： probabilistic graphical models Conjugate exponential family Scalable Bayesian learning Variational inference Apache Flink

来源：评论

学校读者我要写书评

暂无评论

Order priors for Bayesian network discovery with an application to malware phylogeny

引用

STATISTICAL ANALYSIS AND DATA MINING 2017年第5期10卷 343-358页

作者： Oyen, Diane Anderson, Blake Sentz, Kari Anderson-Cook, Christine Los Alamos Natl Lab Los Alamos NM USA Cisco Syst Inc Durham NC USA

Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges) in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.

关键词： Bayesian networks cyber security malware probabilistic graphical models

来源：评论

学校读者我要写书评

暂无评论

Context- and bias-free probabilistic mission impact assessment

引用

COMPUTERS & SECURITY 2017年 65卷 166-186页

作者： Motzek, Alexander Moeller, Ralf Univ Lubeck Inst Informat Syst Ratzeburger Allee 160 D-23562 Lubeck Germany

Assessing and understanding the impact of scattered and widespread events onto a mission is a pertinacious problem. Current approaches attempting to solve mission impact assessment employ score-based algorithms leading to spurious results. We identify a fourfold problem with score-based algorithms: (1) score-based algorithms enforce deep training of experts to employed frameworks for specification (non-context-free), (2) require reference results for interpreting obtained results (non-bias-free), (3) require assessments outside of an experts' expertise (non-local), and (4) require validation of end-results against ground truth. This paper provides a formal, mathematical model for bias-and context-free mission impact assessment. Based on a probabilistic model we reduce mission impact assessment to a well-understood mathematical problem based on definitions from local expertise and allow for a validation at data level. This is useful for areas and applications where qualitative assessments are required, such as assessments in critical infrastructures or military contexts. (C) 2016 The Authors. Published by Elsevier Ltd.

关键词： Mission impact probabilistic graphical models Impact assessment Critical infrastructure Data validation Bayesian networks Vulnerability assessment

来源：评论

学校读者我要写书评

暂无评论

VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization

引用

BMC BIOINFORMATICS 2017年第1期18卷 1-18页

作者： Bolgar, Bence Antal, Peter Budapest Univ Technol & Econ Dept Measurement & Informat Syst Magyar Tudosok Krt 2 H-1117 Budapest Hungary

Background: Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. Method: We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. Results: VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purp

关键词： Drug-target interaction prediction Matrix factorization Multiple kernel learning Variational Bayes probabilistic graphical models

来源：评论

学校读者我要写书评

暂无评论

Identifying prescription patterns with a topic model of diseases and medications

引用

JOURNAL OF BIOMEDICAL INFORMATICS 2017年 75卷 35-47页

作者： Park, Sungrae Choi, Doosup Kim, Minki Cha, Wonchul Kim, Chuhyun Moon, Il-Chul Korea Adv Inst Sci & Technol Dept Ind & Syst Engn Daejeon South Korea Korea Adv Inst Sci & Technol Coll Business Seoul South Korea Samsung Med Ctr Dept Emergency Med Seoul South Korea Inje Univ Dept Emergency Med Coll Med Seoul South Korea SeoulPaik Hosp Seoul South Korea

Wide variance exists among individuals and institutions for treating patients with medicine. This paper analyzes prescription patterns using a topic model with more than four million prescriptions. Specifically, we propose the disease-medicine pattern model (DMPM) to extract patterns from a large collection of insurance data by considering disease codes joined with prescribed medicines. We analyzed insurance prescription data from 2011 with DMPM and found prescription patterns that could not be identified by traditional simple disease classification, such as the International Classification of Diseases (ICD). We analyzed the identified prescription patterns from multiple aspects. First, we found that our model better explain unseen prescriptions than other probabilistic models. Second, we analyzed the similarities of the extracted patterns to identify their characteristics. Third, we compared the identified patterns from DMPM to the known disease categorization, ICD. This comparison showed what additional information can be provided by the data-oriented bottom-up patterns in contrast to the knowledge-based top down categorization. The comparison results showed that the bottom-up categorization allowed for the identification of (1) diverse treatment options for the same disease symptoms, and (2) diverse disease cases sharing the same prescription options. Additionally, the extracted bottom-up patterns revealed treatment differences based on basic patient information better than the top-down categorization. We conclude that this data-oriented analysis will be an effective alternative method for analyzing the complex interwoven disease-prescription relationship. (C) 2017 Elsevier Inc. All rights reserved.

关键词： Topic modeling probabilistic graphical models Medical information

来源：评论

学校读者我要写书评

暂无评论

Efficient Attack Graph Analysis through Approximate Inference

引用

ACM TRANSACTIONS ON PRIVACY AND SECURITY 2017年第3期20卷 10.1-10.1页

作者： Munoz-Gonzalez, Luis Sgandurra, Daniele Paudice, Andrea Lupu, Emil C. Imperial Coll London Dept Comp 180 Queens Gate London SW7 2AZ England Royal Holloway Univ London Informat Secur Grp Egham TW20 0EX Surrey England

Attack graphs provide compact representations of the attack paths an attacker can follow to compromise network resources from the analysis of network vulnerabilities and topology. These representations are a powerful tool for security risk assessment. Bayesian inference on attack graphs enables the estimation of the risk of compromise to the system's components given their vulnerabilities and interconnections and accounts for multi-step attacks spreading through the system. While static analysis considers the risk posture at rest, dynamic analysis also accounts for evidence of compromise, for example, from Security Information and Event Management software or forensic investigation. However, in this context, exact Bayesian inference techniques do not scale well. In this article, we show how Loopy Belief Propagation-an approximate inference technique-can be applied to attack graphs and that it scales linearly in the number of nodes for both static and dynamic analysis, making such analyses viable for larger networks. We experiment with different topologies and network clustering on synthetic Bayesian attack graphs with thousands of nodes to show that the algorithm's accuracy is acceptable and that it converges to a stable solution. We compare sequential and parallel versions of Loopy Belief Propagation with exact inference techniques for both static and dynamic analysis, showing the advantages and gains of approximate inference techniques when scaling to larger attack graphs.

关键词： Bayesian networks probabilistic graphical models approximate inference

来源：评论

学校读者我要写书评

暂无评论

Spike and slab biclustering

引用

PATTERN RECOGNITION 2017年 72卷 186-195页

作者： Denitto, M. Bicego, M. Farinelli, A. Figueiredo, M. A. T. Univ Verona Str Le Grazie 15Ca Vignal 2 Verona Italy Univ Lisbon Inst Telecomunicacoes Ave Rovisco Pais 1 Lisbon Portugal Univ Lisbon Inst Super Tecn Ave Rovisco Pais 1 Lisbon Portugal

Biclustering refers to the problem of simultaneously clustering the rows and columns of a given data matrix, with the goal of obtaining submatrices where the selected rows present a coherent behaviour in the selected columns, and vice-versa. To face this intrinsically difficult problem, we propose a novel generative model, where biclustering is approached from a sparse low-rank matrix factorization perspective. The main idea is to design a probabilistic model describing the factorization of a given data matrix in two other matrices, from which information about rows and columns belonging to the sought for biclusters can be obtained. One crucial ingredient in the proposed model is the use of a spike and slab sparsity inducing prior, thus we term the approach spike and slab biclustering (SSBi). To estimate the parameters of the SSBi model, we propose an expectation-maximization (EM) algorithm, termed SSBiEM, which solves a low-rank factorization problem at each iteration, using a recently proposed augmented Lagrangian algorithm. Experiments with both synthetic and real data show that the SSBi approach compares favorably with the state-of-the-art. (C) 2017 Elsevier Ltd. All rights reserved.

关键词： Biclustering Spike and slab probabilistic graphical models Expectation-maximization

来源：评论

学校读者我要写书评

暂无评论

Latent tree models for hierarchical topic detection

引用

ARTIFICIAL INTELLIGENCE 2017年 250卷 105-124页

作者： Chen, Peixian Zhang, Nevin L. Liu, Tengfei Poon, Leonard K. M. Chen, Zhourong Khawar, Farhan Hong Kong Univ Sci & Technol Dept Comp Sci & Engn Hong Kong Hong Kong Peoples R China Ant Financial Serv Grp Shanghai Peoples R China Educ Univ Hong Kong Dept Math & Informat Technol Hong Kong Hong Kong Peoples R China

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTM5). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies. (C) 2017 Elsevier B.V. All rights reserved.

关键词： probabilistic graphical models Text analysis Hierarchical latent tree analysis Hierarchical topic detection

来源：评论

学校读者我要写书评

暂无评论

Scaling Up Markov Logic probabilistic Inference for Social Graphs

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2017年第2期29卷 433-445页

作者： Chen, Haiquan Ku, Wei-Shinn Wang, Haixun Tang, Liang Sun, Min-Te Valdosta State Univ Dept Comp Sci Valdosta GA 31698 USA Auburn Univ Dept Comp Sci & Software Engn Auburn AL 36849 USA Google Res 1600 Amphitheater Pkwy Mountain View CA 94043 USA Natl Cent Univ Dept Comp Sci & Informat Engn Taoyuan 32001 Taiwan

Link prediction is a fundamental problem in social network analysis. Although the link prediction problem is not new, the challenge of how to exploit various existing network information, such as network structure data and node attribute data, to enable AI-style knowledge inference for large social networks still remains unsolved. In this paper, we design and implement a scalable framework that treats link prediction as knowledge reasoning using Markov Logic Networks(MLNs). Differing from other probabilistic graphical models, MLNs allow undirected relationships with cycles and long-range (non-adjacent) dependency, which are essential and abound in social networks. In our framework, the prior knowledge is captured as the structure dependency (such as friendship) and the attribute dependency (such as social communities) in terms of inference rules, associated with uncertainty represented as probabilities. Next, we employ the random walk to discover the inference subgraph, on which probabilistic inference is performed, so that the required computation and storage cost can be significantly reduced without much sacrifice of the inference accuracy. Our extensive experiments with real-world datasets verify the superiority of our proposed approaches over two baseline methods and show that our approaches are able to provide a tunable tradeoff between inference accuracy and efficiency.

关键词： Social network analysis graph pruning probabilistic graphical models Markov logic network

来源：评论

学校读者我要写书评

暂无评论

Using Directed Acyclic Graphs in Epidemiological Research in Psychosis: An Analysis of the Role of Bullying in Psychosis

引用

SCHIZOPHRENIA BULLETIN 2017年第6期43卷 1273-1279页

作者： Moffa, Giusi Catone, Gennaro Kuipers, Jack Kuipers, Elizabeth Freeman, Daniel Marwaha, Steven Lennox, Belinda R. Broome, Matthew R. Bebbington, Paul UCL Div Psychiat 67-73 Riding House St London W1T 7NF England Univ Hosp Basel Inst Clin Epidemiol & Biostat Basel Switzerland Univ Basel Basel Switzerland Univ Naples SUN Dept Mental & Phys Hlth & Prevent Med Naples Italy Suor Orsola Benicasa Univ Fac Educ Sci Naples Italy Swiss Fed Inst Technol D BSSE Basel Switzerland Kings Coll London Inst Psychiat Psychol & Neurosci Dept Psychol London England South London & Maudsley NHS Fdn Trust Biomed Res Ctr Beckenham Kent England Univ Oxford Warneford Hosp Dept Psychiat Oxford England Univ Warwick Warwick Med Sch Div Mental Hlth & Wellbeing Coventry W Midlands England Oxford Hlth NHS Fdn Trust Warneford Hosp Oxford England

Modern psychiatric epidemiology researches complex interactions between multiple variables in large datasets. This creates difficulties for causal inference. We argue for the use of probabilistic models represented by directed acyclic graphs (DAGs). These capture the dependence structure of multiple variables and, used appropriately, allow more robust conclusions about the direction of causation. We analyzed British national survey data to assess putative mediators of the association between bullying victimization and persecutory ideation. We compared results using DAGs and the Karlson-Holm-Breen (KHB) logistic regression commands in STATA. We analyzed data from the 2007 English National Survey of Psychiatric Morbidity, using the equivalent 2000 survey in an instant replication. Additional details of methods and results are provided in the supplementary material. DAG analysis revealed a richer structure of relationships than could be inferred using the KHB logistic regression commands. Thus, bullying had direct effects on worry, persecutory ideation, mood instability, and drug use. Depression, sleep and anxiety lay downstream, and therefore did not mediate the link between bullying and persecutory ideation. Mediation by worry and mood instability could not be definitively ascertained. Bullying led to hallucinations indirectly, via persecutory ideation and depression. DAG analysis of the 2000 dataset suggested the technique generates stable results. While causality cannot be fully determined from cross-sectional data, DAGs indicate the relationships providing the best fit. They thereby advance investigation of the complex interactions seen in psychiatry, including the mechanisms underpinning psychiatric symptoms. It may consequently be used to optimize the choice of intervention targets.

关键词： probabilistic graphical models directed acyclic graphs mediation bullying persecutory ideation psychosis worry depression anxiety

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：