Lynch Syndrome (LS) families harbor mutated mismatch repair genes,which predispose them to specific types of cancer. Because individuals within LS families can experience multiple cancers over their lifetime, we devel...
详细信息
Lynch Syndrome (LS) families harbor mutated mismatch repair genes,which predispose them to specific types of cancer. Because individuals within LS families can experience multiple cancers over their lifetime, we developed a progressive three-state model to estimate the disease risk from a healthy (state 0) to a first cancer (state 1) and then to a second cancer (state 2). Ascertainment correction of the likelihood was made to adjust for complex sampling designs with carrier probabilities for family members with missing genotype information estimated using their family's observed genotype and phenotype information in a one-step expectation-maximization algorithm. A sandwich variance estimator was employed to overcome possible model misspecification. The main objective of this paper is to estimate the disease risk (penetrance) for age at a second cancer after someone has experienced a first cancer that is also associated with a mutated gene. Simulation study results indicate that our approach generally provides unbiased risk estimates and low root mean squared errors across different family study designs, proportions of missing genotypes, and risk heterogeneities. An application to 12 large LS families from Newfoundland demonstrates that the risk for a second cancer was substantial and that the age at a first colorectal cancer significantly impacted the age at any LS subsequent cancer. This study provides new insights for developing more effective management of mutation carriers in LS families by providing more accurate multiple cancer risk estimates. Copyright (c) 2013 John Wiley & Sons, Ltd.
Generalized linear models are addressed to describe the dependence of data on explanatory variables when the binary outcome is subject to misclassification. Both probit and t-link regressions for misclassified binary ...
详细信息
Generalized linear models are addressed to describe the dependence of data on explanatory variables when the binary outcome is subject to misclassification. Both probit and t-link regressions for misclassified binary data under Bayesian methodology are proposed. The computational difficulties have been avoided by using data augmentation. The idea of using a data augmentation framework (with two types of latent variables) is exploited to derive efficient Gibbs sampling and expectation-maximization algorithms. Besides, this formulation has allowed to obtain the probit model as a particular case of the t-link model. Simulation examples are presented to illustrate the model performance when comparing with standard methods that do not consider misclassification. In order to show the potential of the proposed approaches, a real data problem arising when studying hearing loss caused by exposure to occupational noise is analysed.
Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and inte...
详细信息
Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique.
Although change-point analysis methods for longitudinal data have been developed, it is often of interest to detect multiple change points in longitudinal data. In this paper, we propose a linear mixed effects modelin...
详细信息
Although change-point analysis methods for longitudinal data have been developed, it is often of interest to detect multiple change points in longitudinal data. In this paper, we propose a linear mixed effects modeling framework for identifying multiple change points in longitudinal Gaussian data. Specifically, we develop a novel statistical and computational framework that integrates the expectation-maximization and the dynamic programming algorithms. We conduct a comprehensive simulation study to demonstrate the performance of our method. We illustrate our method with an analysis of data from a trial evaluating a behavioral intervention for the control of type I diabetes in adolescents with HbA1c as the longitudinal response variable. Copyright (c) 2013 John Wiley & Sons, Ltd.
We consider nonparametric maximum-likelihood estimation of a log-concave density in case of interval-censored, right-censored and binned data. We allow for the possibility of a subprobability density with an additiona...
详细信息
We consider nonparametric maximum-likelihood estimation of a log-concave density in case of interval-censored, right-censored and binned data. We allow for the possibility of a subprobability density with an additional mass at +infinity, which is estimated simultaneously. The existence of the estimator is proved under mild conditions and various theoretical aspects are given, such as certain shape and consistency properties. An EM algorithm is proposed for the approximate computation of the estimator and its performance is illustrated in two examples.
The article studies non-Gaussian extensions of a recently discovered link between certain Gaussian random fields, expressed as solutions to stochastic partial differential equations (SPDEs), and Gaussian Markov random...
详细信息
The article studies non-Gaussian extensions of a recently discovered link between certain Gaussian random fields, expressed as solutions to stochastic partial differential equations (SPDEs), and Gaussian Markov random fields. The focus is on non-Gaussian random fields with Matern covariance functions, and in particular, we show how the SPDE formulation of a Laplace moving-average model can be used to obtain an efficient simulation method as well as an accurate parameter estimation technique for the model. This should be seen as a demonstration of how these techniques can be used, and generalizations to more general SPDEs are readily available.
We used light and scanning electron microscope analyses to quantify morphometric features (valve length, width, stria density, lineola density and valve curvature) from the observation of valves representing Seminavis...
详细信息
We used light and scanning electron microscope analyses to quantify morphometric features (valve length, width, stria density, lineola density and valve curvature) from the observation of valves representing Seminavis pusilla. Cluster analysis based on Gaussian mixture models and the expectation-maximization algorithm was used for delineating two species, Seminavis pusilla sensu stricto and Seminavis lata (Krammer) Rioual comb. et stat. nov. By comparison with ***, S. lata is characterized by wider valves and lower stria density. The two species have also markedly different ecology. *** is most abundant in the most saline lakes of the dataset, while S. lata is most abundant in the less saline lakes. Our results indicate that combining the two species into *** sensu lato would lead to a loss of ecological information and a decrease of the performance of transfer functions developed for quantitative reconstruction of past salinity from fossil diatom assemblages in sediment cores.
The variance of the maximum penalized likelihood estimate obtained through the EM algorithm has not been explored in detail. We provide a simple and intuitive new representation for the variance that can be computed f...
详细信息
The variance of the maximum penalized likelihood estimate obtained through the EM algorithm has not been explored in detail. We provide a simple and intuitive new representation for the variance that can be computed from the EM algorithm directly. For pedagogical purposes, we illustrate the new formula with two examples where analytical solutions are possible.
I review a class of models for longitudinal data, showing how it may be applied in a meaningful way for the analysis of data collected by the administration of a series of items finalized to educational or psychologic...
详细信息
ISBN:
(数字)9783319066929
ISBN:
(纸本)9783319066929;9783319066912
I review a class of models for longitudinal data, showing how it may be applied in a meaningful way for the analysis of data collected by the administration of a series of items finalized to educational or psychological measurement. In this class of models, the unobserved individual characteristics of interest are represented by a sequence of discrete latent variables, which follows aMarkov chain. Inferential problems involved in the application of these models are discussed considering, in particular, maximum likelihood estimation based on the expectation-maximization algorithm, model selection, and hypothesis testing. Most of these problems are common to hidden Markov models for time-series data. The approach is illustrated by different applications in education and psychology.
The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvement...
详细信息
ISBN:
(纸本)9781479971299
The idea of using a data-driven phoneme confusion matrix (PCM) to enhance speech recognition and retrieval performance is not new to the speech community. Although empirical results show various degrees of improvements brought by introducing a PCM, the underlying data-driven processes introduced in most papers are rather ad-hoc and lack rigorous statistical justifications. In this paper we will focus on the statistical aspects of PCM generation, propose and justify a novel expectation-maximization based algorithm for data-driven PCM generation. We will evaluate the performance of the generated PCMs under the context of low-resource spoken term detection, with primary focus on out-of-vocabulary keywords.
暂无评论