Metagenomic research uses sequencing technologies to investigate the genetic biodiversity of microbiomes presented in various ecosystems or animal tissues. The composition of a microbial community is highly associated...
详细信息
Metagenomic research uses sequencing technologies to investigate the genetic biodiversity of microbiomes presented in various ecosystems or animal tissues. The composition of a microbial community is highly associated with the environment in which the organisms exist. As large amount of sequencing short reads of microorganism genomes obtained, accurately estimating the abundance of microorganisms within a metagenomic sample is becoming an increasing challenge in bioinformatics. In this paper, we describe a hierarchical taxonomy tree-based mixture model (HTTMM) for estimating the abundance of taxon within a microbial community by incorporating the structure of the taxonomy tree. In this model, genome-specific short reads and homologous short reads among genomes can be distinguished and represented by leaf and intermediate nodes in the taxonomy tree, respectively. We adopt an expectation-maximization algorithm to solve this model. Using simulated and real-world data, we demonstrate that the proposed method is superior to both flat mixture model and lowest common ancestry-based methods. Moreover, this model can reveal previously unaddressed homologous genomes.
Recently, Yu, Lu, and Tian (2013) introduced a combination questionnaire model to investigate the association between one sensitive binary variable and another non-sensitive binary variable. However, in practice, we s...
详细信息
Recently, Yu, Lu, and Tian (2013) introduced a combination questionnaire model to investigate the association between one sensitive binary variable and another non-sensitive binary variable. However, in practice, we sometimes need to assess the association between one totally sensitive binary variable (e.g., the number of sex partners being <= 3 or >3, the annual income being <=$25,000 or >$25,000, and so on) and one non-sensitive binary variable (e.g., good or poor health status, with or without cervical cancer, and so on). Although we could directly adopt the four-category parallel model (Liu & Tian, 2013), the information contained in the non-sensitive binary variable cannot be utilized in the design. Intuitively, such information can be used to enhance the degree of privacy protection so that more respondents will not face the sensitive question. The objective of this paper is to propose a new survey design (called Type II combination questionnaire model, which consists of a four-category parallel questionnaire and a supplemental direct questionnaire) and to develop corresponding statistical methods for analyzing sensitive data collected by this technique. Likelihood-based methods including maximum likelihood estimates, asymptotic and bootstrap confidence intervals of parameters of interest are derived. A likelihood ratio test is provided to test the association between the two binary random variables. Bayesian methods are also presented. Simulation studies are performed and a cervical cancer data set in Atlanta is used to illustrate the proposed methods. (C) 2015 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
This paper is concerned with the problem of parameter estimation for nonlinear Wiener systems in the stochastic framework. Based on the expectation-maximization (EM) algorithm in dealing with the incomplete data, it i...
详细信息
This paper is concerned with the problem of parameter estimation for nonlinear Wiener systems in the stochastic framework. Based on the expectation-maximization (EM) algorithm in dealing with the incomplete data, it is applied to estimate the parameters of nonlinear Wiener models considering the randomly missing outputs. By means of the EM approach, the parameters and the missing outputs can be estimated simultaneously. To obtain the noise-free output in the linear subsystem of the Wiener model, the auxiliary model identification idea is adopted here. The simulation results indicate the effectiveness of the proposed approach for identification of a class of nonlinear Wiener models.
In this paper, we propose a new approach for a block-based lossless image compression using finite mixture models and adaptive arithmetic coding. Conventional arithmetic encoders encode and decode images sample-by-sam...
详细信息
In this paper, we propose a new approach for a block-based lossless image compression using finite mixture models and adaptive arithmetic coding. Conventional arithmetic encoders encode and decode images sample-by-sample in raster scan order. In addition, conventional arithmetic coding models provide the probability distribution for whole source symbols to be compressed or transmitted, including static and adaptive models. However, in the proposed scheme, an image is divided into non-overlapping blocks and then each block is encoded separately by using arithmetic coding. The proposed model provides a probability distribution for each block which is modeled by a mixture of non-parametric distributions by exploiting the high correlation between neighboring blocks. The expectation-maximization algorithm is used to find the maximum likelihood mixture parameters in order to maximize the arithmetic coding compression efficiency. The results of comparative experiments show that we provide significant improvements over the state-of-the-art lossless image compression standards and algorithms. In addition, experimental results show that the proposed compression algorithm beats JPEG-LS by 9.7 % when switching between pixel and prediction error domains.
For the linear modeling problem of multivariable system of aero-engine, considering the coupling between parameters, a multivariable maximum likelihood (ML) estimation method is researched. An improved expectation-max...
详细信息
For the linear modeling problem of multivariable system of aero-engine, considering the coupling between parameters, a multivariable maximum likelihood (ML) estimation method is researched. An improved expectation-maximization (EM) algorithm integrated genetic algorithm (GA) is proposed and applied to the process of ML identification of frequency domain. The amplitude, harmonic and phase vectors of odd-odd multi-sine exciting signal are designed and optimized. With the application of the proposed method, multivariable linear models of aero-engine at different operation states in flight envelope are established from nonlinear component-level model. The precision is demonstrated through simulations comparing to nonlinear model.
In this research, a dynamic linear spatio-temporal model (DLSTM) was developed and evaluated for monthly streamflow forecasting. For parameter estimation, coupled expectation-maximization (EM) algorithm and Kalman fil...
详细信息
In this research, a dynamic linear spatio-temporal model (DLSTM) was developed and evaluated for monthly streamflow forecasting. For parameter estimation, coupled expectation-maximization (EM) algorithm and Kalman filter was adopted. This combination enables the model to estimate the state vector and parameters concurrently. Different forecast scenarios including various combinations of upstream stations were considered for downstream station streamflow forecasting. Several statistical criteria, nonparametric and visual tests were used for model evaluation. Results indicated that the spatio-temporal model performed acceptably in almost all scenarios. The dynamic model was able to capitalize on coupled spatial and temporal information provided that there is spatial connectivity in the studied hydrometric stations network. Moreover, threshold level method was used for model evaluation in drought andwet periods. Results indicated that, in validation phase, the model was able to forecast the drought duration and volume deficit/over threshold, although volume deficit/over threshold could not be accurately simulated.
We consider a partially observable degrading system subject to condition monitoring and random failure. The system's condition is categorized into one of three states: a healthy state, a warning state, and a failu...
详细信息
We consider a partially observable degrading system subject to condition monitoring and random failure. The system's condition is categorized into one of three states: a healthy state, a warning state, and a failure state. Only the failure state is observable. While the system is operational, vector data that is stochastically related to the system state is obtained through condition monitoring at regular sampling epochs. The state process evolution follows a hidden semi-Markov model (HSMM) and Erlang distribution is used for modeling the system's sojourn time in each of its operational states. The expectation-maximization (EM) algorithm is applied to estimate the state and observation parameters of the HSMM. Explicit formulas for several important quantities for the system residual life estimation such as the conditional reliability function and the mean residual life are derived in terms of the posterior probability that the system is in the warning state. Numerical examples are presented to demonstrate the applicability of the estimation procedure and failure prediction method. A comparison results with hidden Markov modeling are provided to illustrate the effectiveness of the proposed model. (c) 2015 Wiley Periodicals, Inc. Naval Research Logistics 62: 190-205, 2015
To model binomial data with large frequencies of both zeros and right-endpoints, Deng and Zhang (in press) recently extended the zero-inflated binomial distribution to an endpoint-inflated binomial (EIB) distribution....
详细信息
To model binomial data with large frequencies of both zeros and right-endpoints, Deng and Zhang (in press) recently extended the zero-inflated binomial distribution to an endpoint-inflated binomial (EIB) distribution. Although they proposed the EIB mixed regression model, the major goal of Deng and Zhang (2015) is just to develop score tests for testing whether endpoint-inflation exists. However, the distributional properties of the EIB have not been explored, and other statistical inference methods for parameters of interest were not developed. In this paper, we first construct six different but equivalent stochastic representations for the EIB random variable and then extensively study the important distributional properties. Maximum likelihood estimates of parameters are obtained by both the Fisher scoring and expectation-maximization algorithms in the model without covariates. Bootstrap confidence intervals of parameters are also provided. Generalized and Fixed EIB regression models are proposed and the corresponding computational procedures are introduced. A real data set is analyzed and simulations are conducted to evaluate the performance of the proposed methods. All technical details are put in a supplemental document (see Appendix A). (C) 2015 Elsevier B.V. All rights reserved.
This article presents frequentist inference of accelerated life test data of series systems with independent log-normal component lifetimes. The means of the component log-lifetimes are assumed to depend on the stress...
详细信息
This article presents frequentist inference of accelerated life test data of series systems with independent log-normal component lifetimes. The means of the component log-lifetimes are assumed to depend on the stress variables through a linear stress translation function that can accommodate the standard stress translation functions in the literature. An expectation-maximization algorithm is developed to obtain the maximum likelihood estimates of model parameters. The maximum likelihood estimates are then further refined by bootstrap, which is also used to infer about the component and system reliability metrics at usage stresses. The developed methodology is illustrated by analyzing a real as well as a simulated dataset. A simulation study is also carried out to judge the effectiveness of the bootstrap. It is found that in this model, application of bootstrap results in significant improvement over the simple maximum likelihood estimates.
Nonlinear degradation trajectories are encountered frequently, and not all of them evolve homogeneously in practical systems. To take nonlinearity, heterogeneity, and the entire historical degradation data into accoun...
详细信息
Nonlinear degradation trajectories are encountered frequently, and not all of them evolve homogeneously in practical systems. To take nonlinearity, heterogeneity, and the entire historical degradation data into account, we propose a nonlinear heterogeneous Wiener process model with an adaptive drift to characterize degradation trajectories. A state-space based method is employed to delineate our model. Due to the introduction of the adaptive drift, it is difficult to directly apply Kalman filter methods to update the distribution of the estimated degradation drift. To address this issue, we develop an online filtering algorithm based on Bayes' theorem. The expectation-maximization (EM) algorithm, as well as a novel Bayes'-theorem-based smoother, are adopted to estimate the unknown parameters in our model. Moreover, the distribution of the predicted remaining useful life (RUL) incorporating the complete distribution of the estimated degradation drift is achieved analytically. Finally, a simulation, and a case study are provided to validate the proposed approach.
暂无评论