An expectation-maximization algorithm for maximum-likelihood refinement of electron-microscopy images is presented that is based on fitting mixtures of multivariate t-distributions. The novel algorithm has intrinsic c...
详细信息
An expectation-maximization algorithm for maximum-likelihood refinement of electron-microscopy images is presented that is based on fitting mixtures of multivariate t-distributions. The novel algorithm has intrinsic characteristics for providing robustness against atypical observations in the data, which is illustrated using an experimental test set with artificially generated outliers. Tests on experimental data revealed only minor differences in two-dimensional classifications, while three-dimensional classification with the new algorithm gave stronger elongation factor G density in the corresponding class of a structurally heterogeneous ribosome data set than the conventional algorithm for Gaussian mixtures.
Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one ...
详细信息
Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks;inferential paradigms;and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children's obesity study.
Malaria is an infectious disease that is caused by a group of parasites of the genus Plasmodium. Characterizing the association between polymorphisms in the parasite genome and measured traits in an infected human hos...
详细信息
Malaria is an infectious disease that is caused by a group of parasites of the genus Plasmodium. Characterizing the association between polymorphisms in the parasite genome and measured traits in an infected human host may provide insight into disease aetiology and ultimately inform new strategies for improved treatment and prevention. This, however, presents an analytic challenge since individuals are often multiply infected with a variable and unknown number of genetically diverse parasitic strains. In addition, data on the alignment of nucleotides on a single chromosome, which is commonly referred to as haplotypic phase, is not generally observed. An expectation-maximization algorithm for estimating and testing associations between haplotypes and quantitative traits has been described for diploid (human) populations. We extend this method to account for both the uncertainty in haplotypic phase and the variable and unknown number of infections in the malaria setting. Further extensions are described for the human immunodeficiency virus quasi-species setting. A simulation study is presented to characterize performance of the method. Application of this approach to data arising from a cross-sectional study of n=126 multiply infected children in Uganda reveals some interesting associations requiring further investigation.
Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly ...
详细信息
Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.
作者:
Ohashi, JunUniv Tsukuba
Doctoral Program Life Syst Med Sci Grad Sch Comprehens Human Sci Tsukuba Ibaraki 3058575 Japan
The association between a copy number variant (CNV) and susceptibility to disease has drawn much attention. In this study, a case-control association test for a CNV locus with multiple alleles is proposed for detectin...
详细信息
The association between a copy number variant (CNV) and susceptibility to disease has drawn much attention. In this study, a case-control association test for a CNV locus with multiple alleles is proposed for detecting a single CNV allele associated with a disease. In the association test, CNV allele frequencies are estimated for cases and controls separately using an expectation-maximization (EM) algorithm, and the chi(2) values are calculated for each CNV allele to compare the estimated frequency between them. A permutation procedure is used to obtain an empirical P-value for each CNV allele and for controlling a global type I error rate. The statistical power of the present association test was evaluated by a computer simulation analysis with several parameter settings. The results revealed that the statistical power was markedly different among CNV alleles with different copy numbers, and a higher power could be achieved for a susceptible allele with the lowest or highest copy number in comparison with those with intermediate copy numbers. Journal of Human Genetics (2009) 54, 169-173;doi: 10.1038/jhg.2009.8;published online 6 February 2009
This paper proposes a Gamma-based state space model to predict engineering asset life when multiple degradation indicators are involved and the failure threshold on these indicators are uncertain. Monte Carlo-based pa...
详细信息
ISBN:
(纸本)9781424449033
This paper proposes a Gamma-based state space model to predict engineering asset life when multiple degradation indicators are involved and the failure threshold on these indicators are uncertain. Monte Carlo-based parameter estimation and model inference algorithms are developed to deal with the proposed Gamma-based state space model. A case study using real data from industry is conducted to compare the performance of the proposed model with the commonly used proportional hazard model (PHM). The result shows that the Gamma-based state space model is more appropriate to deal with the situation when the failure data is insufficient.
In the context of network traffic analysis, we address the problem of estimating the tail index of flow (or more generally of any group) size distribution from the observation of a, sampled population of packets (indi...
详细信息
ISBN:
(纸本)9781605585116
In the context of network traffic analysis, we address the problem of estimating the tail index of flow (or more generally of any group) size distribution from the observation of a, sampled population of packets (individuals). We give an exhaustive bibliography of the existing methods and show the relations between them. The main contribution of this work is then to propose a new method to estimate the tail index from sampled data, based on the resolution of the maximum likelihood problem. To assess the performance of our method, we present a full performance evaluation based on numerical simulations, and also on a real traffic trace corresponding to internet traffic recently acquired.
Digital image forensics has become a very important research topic This paper proposes a method to detect the forgery of digital image by (I) computing the interpolated coefficient for the Images using expectation-max...
详细信息
ISBN:
(纸本)9783642104664
Digital image forensics has become a very important research topic This paper proposes a method to detect the forgery of digital image by (I) computing the interpolated coefficient for the Images using expectation-maximization (EM) algorithm, (2) generating the probability map. (3) obtaining the frequency spectrum of the probability map, (4) determining whether an image has been tampered based on the periodicity characteristics of the spectrum The experimental results show that our approach is effective to detect Mice different image forgeries (a) an-brush or brush strokes, (b) different blurring filters. and (c) composite image taken from different cameras
This paper proposes a Gamma-based state space model to predict engineering asset life when multiple degradation indicators are involved and the failure threshold on these indicators are *** Carlo-based parameter estim...
详细信息
This paper proposes a Gamma-based state space model to predict engineering asset life when multiple degradation indicators are involved and the failure threshold on these indicators are *** Carlo-based parameter estimation and model inference algorithms are developed to deal with the proposed Gamma-based state space model.A case study using real data from industry is conducted to compare the performance of the proposed model with the commonly used proportional hazard model(PHM).The result shows that the Gamma-based state space model is more appropriate to deal with the situation when the failure data is insufficient.
In this article we propose a robust probabilistic multivariate calibration (RPMC) model in an attempt to identify linear relationships between two sets of observed variables contaminated with outliers Instead of the G...
详细信息
In this article we propose a robust probabilistic multivariate calibration (RPMC) model in an attempt to identify linear relationships between two sets of observed variables contaminated with outliers Instead of the Gaussian assumptions that predominate in classical statistical models, RPMC is closely related with the multivariate Student t-distribution over noises and latent variables. Thus RPMC diminishes the effect Of Outlying data points by regulating the thickness of the distribution tails. RPMC is essentially a robustified version of the supervised probabilistic principal component analysis (SPPCA) that has emerged recently, We show that RPMC encompasses probabilistic principal component analysis and SPPCA as limiting cases. We also derive an efficient EM algorithm for parameter estimation in RPMC. Based on a probabilistic description of latent variables, we present a procedure for the detection of Outliers. The experimental results from both simulated examples and real life data sets demonstrate the effectiveness and robustness of our proposed approach.
暂无评论