To analyze univariate truncated normal data, in this paper, we stochastically represent the normal random variable as a mixture of a truncated normal random variable and its complementary random variable. This stochas...
详细信息
To analyze univariate truncated normal data, in this paper, we stochastically represent the normal random variable as a mixture of a truncated normal random variable and its complementary random variable. This stochastic representation is a new idea and it is the first time to appear in literature. According to this stochastic representation, we derive important distributional properties for the truncated normal distribution and develop two new expectation-maximization algorithms to calculate the maximum likelihood estimates of parameters of interest for Type I data (without and with covariates) and Type II/III data. Bootstrap confidence intervals of parameters for small sample sizes are provided. To evaluate the performance of the proposed methods for the truncated normal distribution, in simulation studies, we first focus on the comparison of estimation results between including the unobserved data counts and excluding the unobserved data counts, and we next investigate the impact of the number of unobserved data on the estimation results. The plasma ferritin concentration data collected by Australian Institute of Sport and the blood fat content data are used to illustrate the proposed methods and to compare the truncated normal distribution with the half normal, the folded normal, and the folded normal slash distributions based on Akaike information criterion and Bayesian information criterion.
Restricted versions of the cointegrated vector autoregression are usually estimated using switching algorithms. These algorithms alternate between two sets of variables but can be slow to converge. Acceleration method...
详细信息
Restricted versions of the cointegrated vector autoregression are usually estimated using switching algorithms. These algorithms alternate between two sets of variables but can be slow to converge. Acceleration methods are proposed that combine simplicity and effectiveness. These methods also outperform existing proposals in some applications of the expectation-maximization method and parallel factor analysis.
Maintaining the desired interface level between the top froth layer and the liquid layer plays an important role in achieving high recovery of products in oil sands and related process industries. As varying throughpu...
详细信息
Maintaining the desired interface level between the top froth layer and the liquid layer plays an important role in achieving high recovery of products in oil sands and related process industries. As varying throughputs and downstream disturbances tend to change the interface level over time, it is an important indicator of the process behavior. In this paper, we propose an approach based on Gaussian mixture model and Markov Random Field (MRF) based unsupervised image segmentation to achieve the real-time accurate measurement of the interface. The image processing problem is solved as a Maximum a Posteriori (MAP) estimation problememploying the MRF framework and the parameters are estimated using the em algorithm. The proposed approach is validated using the images captured from a laboratory scale equipment designed to simulate the industrial PSV interface.
A spatial lattice model for binary data is constructed from two spatial scales linked through conditional probabilities. A coarse grid of lattice locations is specified, and all remaining locations (which we call the ...
详细信息
A spatial lattice model for binary data is constructed from two spatial scales linked through conditional probabilities. A coarse grid of lattice locations is specified, and all remaining locations (which we call the background) capture fine-scale spatial dependence. Binary data on the coarse grid are modelled with an autologistic distribution, conditional on the binary process on the background. The background behaviour is captured through a hidden Gaussian process after a logit transformation on its Bernoulli success probabilities. The likelihood is then the product of the (conditional) autologistic probability distribution and the hidden Gaussian-Bernoulli process. The parameters of the new model come from both spatial scales. A series of simulations illustrates the spatial-dependence properties of the model and likelihood-based methods are used to estimate its parameters. Presence-absence data of corn borers in the roots of corn plants are used to illustrate how the model is fitted.
To obtain reliable fish biomass estimates by acoustic methods, it is essential to filter out the signals from unwanted scatterers (e. g. zooplankton). When acoustic data are collected at more than one frequency, metho...
详细信息
To obtain reliable fish biomass estimates by acoustic methods, it is essential to filter out the signals from unwanted scatterers (e. g. zooplankton). When acoustic data are collected at more than one frequency, methods that exploit the differences in reflectivity of scatterers can be used to achieve the separation of targets. These methods cannot be applied with historical data nor recent data collected on board fishing vessels employed as scientific platforms, where only one transducer is available. Instead, a volume backscattering strength (S-v) threshold is set to separate fish from plankton, both for echogram visualisation or, more importantly, during echo-integration. While empirical methods exist for selecting a threshold, it often depends on the subjective decision of the user. A-47 dB threshold was empirically established in 2008 at the beginning of a series of surveys conducted by Mexico's National Fisheries Institute to assess the biomass of Pacific sardine in the Gulf of California. Until 2012, when a 120 kHz transducer was installed, only data collected at 38 kHz are available. Here, we propose a probabilistic procedure to estimate an optimal S-v threshold using the Expectation-Maximisation algorithm for fitting a mixture of Gaussian distributions to S-v data sampled from schools associated with small pelagic fish and their surrounding echoes. The optimal threshold is given by the Bayes decision function for classifying an S-v value in one of the two groups. The procedure was implemented in the R language environment. The optimal threshold found for 38 kHz data was -59.4 dB, more than 12 dB lower than the currently used value. This difference prompts the need to revise the acoustic biomass estimates of small pelagics in the Gulf of California.
With reference to a real data on cataract surgery, we discuss the problem of zero-inflated circular-circular regression when both covariate and response are circular random variables and a large proportion of the resp...
详细信息
With reference to a real data on cataract surgery, we discuss the problem of zero-inflated circular-circular regression when both covariate and response are circular random variables and a large proportion of the responses are zeros. The regression model is proposed, and the estimation procedure for the parameters is discussed. Some relevant test procedures are also suggested. Simulation studies and real data analysis are performed to illustrate the applicability of the model.
Discrete data in the form of proportions with overdispersion and zero inflation can arise in toxicology and other similar fields. In regression analysis of such data, another problem that also may arise in practice is...
详细信息
Discrete data in the form of proportions with overdispersion and zero inflation can arise in toxicology and other similar fields. In regression analysis of such data, another problem that also may arise in practice is that some responses may be missing. In this paper, we develop estimation procedure for the parameters of a zero-inflated overdispersed binomial model in the presence of missing responses under three different missing data mechanisms. A weighted expectation maximization algorithm is used for the maximum likelihood estimation of the parameters involved. Extensive simulations are conducted to study the properties of the estimates in terms of average of estimates, relative bias, variance, mean squared error, and coverage probability of estimates. Simulations show much superior properties of the estimates obtained using the weighted expectation maximization algorithm. Some illustrative examples and a discussion are given.
We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multip...
详细信息
We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid *** the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.
The goal of this paper is to address the issue of nonlinear regression with outliers, possibly in high dimension, without specifying the form of the link function and under a parametric approach. Nonlinearity is handl...
详细信息
The goal of this paper is to address the issue of nonlinear regression with outliers, possibly in high dimension, without specifying the form of the link function and under a parametric approach. Nonlinearity is handled via an underlying mixture of affine regressions. Each regression is encoded in a joint multivariate Student distribution on the responses and covariates. This joint modeling allows the use of an inverse regression strategy to handle the high dimensionality of the data, while the heavy tail of the Student distribution limits the contamination by outlying data. The possibility to add a number of latent variables similar to factors to the model further reduces its sensitivity to noise or model misspecification. The mixture model setting has the advantage of providing a natural inference procedure using an em algorithm. The tractability and flexibility of the algorithm are illustrated in simulations and real high-dimensional data with good performance that compares favorably with other existing methods. (C) 2017 Elsevier Inc. All rights reserved.
Deciding the number of clusters k is one of the most difficult problems in cluster analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the wel...
详细信息
Deciding the number of clusters k is one of the most difficult problems in cluster analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well-known Bayesian information criterion and integrated complete likelihood criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional em and Cem algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c >= 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails is proposed. Based on this criterion, a novel and fully automated procedure, leading to a small ranked list of optimal (k, c) couples is provided. A new plot called "car-bike," which provides a concise summary of the solutions, is introduced. The performance of the procedure is assessed both in empirical examples and through a simulation study as a function of cluster overlap. Supplementary materials for the article are available online.
暂无评论