Learning the graphical structure of Bayesian networks is key to describing data generating mechanisms in many complex applications and it poses considerable computational challenges. Observational data can only identi...
详细信息
Learning the graphical structure of Bayesian networks is key to describing data generating mechanisms in many complex applications and it poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular pc algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. Here, we propose the dual pc algorithm, a novel scheme to carry out the CI tests within the pc algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can efficiently supplement partial correlation tests at each step with those of complementary (or dual) conditioning sets. The multiple CI tests of the dual pc algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual pc algorithm outperforms the classic pc algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity.
The pc algorithm infers causal relations using conditional independence tests that require a pre-specified Type I alpha level. pc is however unsupervised, so we cannot tune alpha using traditional cross-validation. We...
详细信息
The pc algorithm infers causal relations using conditional independence tests that require a pre-specified Type I alpha level. pc is however unsupervised, so we cannot tune alpha using traditional cross-validation. We therefore propose Autopc, a fast procedure that optimizes alpha directly for a user chosen metric. We in particular force pc to double check its output by executing a second run on the recovered graph. We choose the final output as the one which maximizes stability between the two runs. Autopc consistently outperforms the state of the art across multiple metrics. (C) 2021 Elsevier B.V. All rights reserved.
Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify ...
详细信息
Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular pc algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual pc algorithm is a novel scheme to carry out the CI tests within the pc algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual pc algorithm proceed by first considering marginal and fullorder CI relationships and progressively moving to central-order ones. Simulation studies show that the dual pc algorithm outperforms the classic pc algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual pc algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.& COPY;2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons .org /licenses /by /4 .0/).
The main goal in many fields in the empirical sciences is to discover causal relationships among a set of variables from observational data. pc algorithm is one of the promising solutions to learn underlying causal st...
详细信息
The main goal in many fields in the empirical sciences is to discover causal relationships among a set of variables from observational data. pc algorithm is one of the promising solutions to learn underlying causal structure by performing a number of conditional independence tests. In this paper, we propose a novel GPU-based parallel algorithm, called cupc, to execute an order-independent version of pc. The proposed solution has two variants, cupc-E and cupc-S, which parallelize pc in two different ways for multivariate normal distribution. Experimental results show the scalability of the proposed algorithms with respect to the number of variables, the number of samples, and different graph densities. For instance, in one of the most challenging datasets, the runtime is reduced from more than 11 hours to about 4 seconds. On average, cupc-E and cupc-S achieve 500X and 1300X speedup, respectively, compared to serial implementation on CPU.
Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The pc algorithm is the state-of-the-art constraint based method for causal discovery. Howe...
详细信息
Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The pc algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the pc algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g., gene expression datasets. On another note, the advancement of computer hardware in the last decade has resulted in the widespread availability of multi-core personal computers. There is a significant motivation for designing a parallelized pc algorithm that is suitable for personal computers and does not require end users' parallel computing knowledge beyond their competency in using the pc algorithm. In this paper, we develop parallel-pc, a fast and memory efficient pc algorithm using the parallel computing technique. We apply our method to a range of synthetic and real-world high dimensional datasets. Experimental results on a dataset from the DREAM 5 challenge show that the original pc algorithm could not produce any results after running more than 24 hours;meanwhile, our parallel-pc algorithm managed to finish within around 12 hours with a 4-core CPU computer, and less than six hours with a 8-core CPU computer. Furthermore, we integrate parallel-pc into a causal inference method for inferring miRNA-mRNA regulatory relationships. The experimental results show that parallel-pc helps improve both the efficiency and accuracy of the causal inference algorithm.
Many causal discovery algorithms infer graphical structure from observational data. The pc algorithm in particular estimates a completed partially directed acyclic graph (CPDAG), or an acyclic graph containing directe...
详细信息
Many causal discovery algorithms infer graphical structure from observational data. The pc algorithm in particular estimates a completed partially directed acyclic graph (CPDAG), or an acyclic graph containing directed edges identifiable with conditional independence testing. However, few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this article, we introduce pc with p-values (pc-p), a fast algorithm that robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. pc-p specifically uses the p-values returned by many conditional independence (CI) tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original pc algorithm also help pc-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that pc-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.
Bayesian network structure learning is the basis of parameter learning and Bayesian inference. However, it is a NP-hard problem to find the optimal structure of Bayesian networks because the computational complexity i...
详细信息
Bayesian network structure learning is the basis of parameter learning and Bayesian inference. However, it is a NP-hard problem to find the optimal structure of Bayesian networks because the computational complexity increases exponentially with the increasing number of nodes. Hence, numerous algorithms have been proposed to obtain feasible solutions, while almost all of them are of certain limits. In this paper, we adopt a heuristic algorithm to learn the structure of Bayesian networks, and this algorithm can provide a reasonable solution to combine the pc and Particle Swarm Optimization (PSO) algorithms. Moreover, we consider structure priors to improve the performance of our pc-PSO algorithm. Meanwhile, we utilize a new mutation operator called Uniform Mutation by Addition and Deletion (UMAD) and a crossover operator called Uniform Crossover. Experiments on different networks show that the approach proposed in this paper has achieved better Bayesian Information Criterion (BIC) scores than other algorithms.
Guarantee of the mining of the data of extricating unforeseen information from unpleasantly huge databases. Ways are created to make link comments since gigantic data sets. These demonstrate the quality of relationshi...
详细信息
ISBN:
(纸本)9781538605141
Guarantee of the mining of the data of extricating unforeseen information from unpleasantly huge databases. Ways are created to make link comments since gigantic data sets. These demonstrate the quality of relationship of two or a more data attributes. From numerous points of view, the enthusiasm for association decides is that they give the guarantee (or hallucination) of causative, or at least, prophetic connections. Regardless of whether it is frequently same that any association rules straight out a causative relationship must be inspected. The term causality is assuming on basic part in basic essential by giving a premise to picking what is probably going to head a coveted outcome.
This paper describes a parallel version of the pc algorithm for learning the structure of a Bayesian network from data. The pc algorithm is a constraint-based algorithm consisting of five steps where the first step is...
详细信息
ISBN:
(纸本)9783319245980;9783319245973
This paper describes a parallel version of the pc algorithm for learning the structure of a Bayesian network from data. The pc algorithm is a constraint-based algorithm consisting of five steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelisation of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel pc algorithm is evaluated on data sets generated at random from five different real-world Bayesian networks. The results demonstrate that significant time performance improvements are possible using the proposed algorithm.
The refractive index profile (RIP) of the optical fiber determines the transmission performance and application scenario. In this article, a phase-correct quantitative phase microscopy (pc-QPM) method is proposed to m...
详细信息
The refractive index profile (RIP) of the optical fiber determines the transmission performance and application scenario. In this article, a phase-correct quantitative phase microscopy (pc-QPM) method is proposed to measure the RIP of fibers with high accuracy and nonfiber destructiveness. The pc algorithm is introduced to compensate the error of light phase caused by the inclined matching oil glass vessel (MOGV). A 3-D refractive index measurement (3D-RIM) configuration is proposed and established based on pc-QPM. A standard fiber is applied and verifies the effectiveness of the pc-QPMRIM with a high accuracy of 1 x 10(-4). By the multipoint test in the longitude, the 3D-RIP characterization is achieved. Then, three commercial multimode fibers are tested. Results show that the measured value fits well with the data provided by the manufacturer. The constructed pc-QPM-RIM enables to achieve 3D-RIP measurement in the transverse and longitudinal directions.
暂无评论