Background: Detection of differential methylation between biological samples is an important task in bisulfite-seq data analysis. Several studies have attempted de novo finding of differentially methylated regions (DM...
详细信息
Background: Detection of differential methylation between biological samples is an important task in bisulfite-seq data analysis. Several studies have attempted de novo finding of differentially methylated regions (DMRs) using hidden Markov models (HMMs). However, there is room for improvement in the design of HMMs, especially on emission functions that evaluate the likelihood of differential methylation at each cytosine site. Results: We describe a new HMM for DMR detection from bisulfite-seq data. Our method utilizes emission functions that combine binomial models for aligned read counts, and beta mixtures for incorporating genome-wide methylation level distributions. We also develop unsupervised learning algorithms to adjust parameters of the beta-binomial models depending on differential methylation types (up, down, and not changed). In experiments on both simulated and real datasets, the new HMM improves DMR detection accuracy compared with HMMs in our previous study. Furthermore, our method achieves better accuracy than other methods using Fisher's exact test and methylation level smoothing. Conclusions: Our method enables accurate DMR detection from bisulfite-seq data. the implementation of our method is named ComMet, and distributed as a part of Bisulfighter package, which is available at http://***/bisulfighter.
Nowadays, O2O commercial platforms are playing a crucial role in our daily purchases. However, some people are trying to manipulate the online market maliciously by opinion spamming, a kind of web fraud behavior like ...
详细信息
Within observational Data Science workloads, Berkson's paradox can lead to false causal inferences. One of the prominent quasi-experimental methods to mitigate this selection bias is Propensity Score Matching (PSM...
详细信息
ISBN:
(数字)9798331539603
ISBN:
(纸本)9798331539610
Within observational Data Science workloads, Berkson's paradox can lead to false causal inferences. One of the prominent quasi-experimental methods to mitigate this selection bias is Propensity Score Matching (PSM). An approach called Neural PSM (NPSM) was developed to overcome the drawbacks of conventional regression-based PSM, including its limited flexibility to model high-dimensional data and non-linear relationships that could cause imperfect covariate balance. In this study, a three-layer depth of Deep Neural Networks was designed to estimate propensity scores and finally balance both control and treatment groups of the Groupon dataset. An unsupervised k-Nearest Neighbor algorithm then helped the model to efficiently detect and cluster similar matching points. From the five salient features presented, NPSM successfully achieved lower differences in Cohen's d effect size, i.e., 0.313 for coupon duration, 0.017 for promotion length, 0.425 for quantity sold, -0.199 for limited supply, and 0.395 for Facebook likes. While these results mostly outperformed Linear Regression (LR) and Random Forest (RF) models, further evaluation is needed to verify the true effectiveness of NPSM in mitigating Berkson's paradox in broader e-commerce contexts.
Biomass is an important phenotypic trait in plant growth analysis. In this study, we established and compared 8 models for measuring aboveground biomass of 402 rice varieties. Partial least squares(PLS) regression a...
详细信息
ISBN:
(纸本)9783319483566
Biomass is an important phenotypic trait in plant growth analysis. In this study, we established and compared 8 models for measuring aboveground biomass of 402 rice varieties. Partial least squares(PLS) regression and all subsets regression(ASR) were carried out to determine the effective ***, 6models were developed based on support vector regression(SVR). the kernel function used in this study was radial basis function(RBF). three different optimization methods, Genetic Algorithm(GA) K-fold Cross Validation(K-CV), and Particle Swarm Optimization(PSO), were applied to optimize the penalty error C and RBF c. We also compared SVR models withmodels based on PLS regression and ASR. the result showed the model in combination of ASR, GA optimization and SVR outperformed other models with coefficient of determination(R) of 0.85 for the 268 varieties in the training set and 0.79 for the 134 varieties in the testing set, respectively. this paper extends the application of SVR and intelligent algorithm in measurement of cereal biomass and has the potential of promoting the accuracy of biomass measurement for different varieties.
Eukaryotic genome is a highly compacted nucleoprotein complex organized in a hierarchical structure based on nucleosomes. Detailed organization of this structure remains unknown. In the present work we developed algor...
详细信息
Eukaryotic genome is a highly compacted nucleoprotein complex organized in a hierarchical structure based on nucleosomes. Detailed organization of this structure remains unknown. In the present work we developed algorithms for geometry modeling of the supernucleosomal chromatin structure and for computing distance distribution functions and small-angle neutron scattering (SANS) spectra of the genome-scale (similar to 10(6) nucleosomes) chromatin structure at residue resolution. Our physical nucleosome model was based on the mononucleosome crystal structure. A nucleosome was assumed to be rigid within a local coordinate system. Interface parameters between nucleosomes can be set for each nucleosome independently. Pair distance distributions were computed with Monte Carlo simulation. SANS spectra were calculated with Fourier transformation of weighted distance distribution;the concentration of heavy water in solvent and probability of H/D exchange were taken into account. Two main modes of supernucleosomal structure generation were used. In a free generation mode all interface parameters were chosen randomly, whereas nucleosome self-intersections were not allowed. the second generation mode (generation in volume) enabled spherical or cubical wall restrictions. It was shown that calculated SANS spectra for a number of our models were in general agreement with available experimental data. (C) 2011 American Institute of Physics. [doi: 10.1063/1.3661987]
Early exit, as an effective method to accelerate pre-trained language models, has recently attracted much attention in the field of natural language processing. However, existing early exit methods are only suitable f...
详细信息
Foreword this volume of Journal of Physics: conference Series is dedicated to the scientific contributions presented during the 6thinternational Workshop on New Computational methods for Inverse Problems, NCMIP 2016 ...
Foreword this volume of Journal of Physics: conference Series is dedicated to the scientific contributions presented during the 6thinternational Workshop on New Computational methods for Inverse Problems, NCMIP 2016 (http://***/NCMIP ***). this workshop took place at Ecole Normale Supérieure de Cachan, on May 20, 2016. the prior editions of NCMIP also took place in Cachan, France, firstly within the scope of ValueTools conference, in May 2011, and secondly at the initiative of Institut Farman, in May 2012, May 2013, May 2014 and May 2015. the New Computational methods for Inverse Problems (NCMIP) workshop focused on recent advances in the resolution of inverse problems. Indeed, inverse problems appear in numerous scientific areas such as geophysics, biological and medical imaging, material and structure characterization, electrical, mechanical and civil engineering, and finances. the resolution of inverse problems consists in estimating the parameters of the observed system or structure from data collected by an instrumental sensing or imaging device. Its success firstly requires the collection of relevant observation data. It also requires accurate models describing the physical interactions between the instrumental device and the observed system, as well as the intrinsic properties of the solution itself. Finally, it requires the design of robust, accurate and efficient inversion algorithms. Advanced sensor arrays and imaging devices provide high rate and high volume data; in this context, the efficient resolution of the inverse problem requires the joint development of new models and inversion methods, taking computational and implementation aspects into account. During this one- day workshop, researchers had the opportunity to bring to light and share new techniques and results in the field of inverse problems. the topics of the workshop were: algorithms and computational aspects of inversion, Bayesian estimation, Kernel method
暂无评论