We describe applications of computational algebra to statistical problems of parameter identifiability, sufficiency, and estimation. The methods work for a family of statistical models that includes Poisson and binomi...
详细信息
We describe applications of computational algebra to statistical problems of parameter identifiability, sufficiency, and estimation. The methods work for a family of statistical models that includes Poisson and binomial examples in network tomography.
We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion alpha of contaminating data to guarantee the ...
详细信息
We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion alpha of contaminating data to guarantee the robustness of the method. As a characteristic feature, restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones. The method covers a wide range of clustering approaches depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem.
Consider a lattice of locations in one dimension at which data are observed. We model the data as a random hierarchical process. The hidden process is assumed to have a (prior) distribution that is derived from a two-...
详细信息
Consider a lattice of locations in one dimension at which data are observed. We model the data as a random hierarchical process. The hidden process is assumed to have a (prior) distribution that is derived from a two-state Markov chain. The states correspond to the mean values (high and low) of the observed data. Conditional on the states, the observations are modelled, for example, as independent Gaussian random variables with identical variances. In this model, there are four free parameters: the Gaussian variance, the high and low mean values, and the transition probability in the Markov chain. A parametric empirical Bayes approach requires estimation of these four parameters from the marginal (unconditional) distribution of the data and we use the emalgorithm to do this. From the posterior of the hidden process, we use simulated annealing to find the maximum a posteriori (MAP) estimate. Using a Gibbs sampler, we also obtain the maximum marginal posterior probability (MMPP) estimate of the hidden process. We use these methods to determine where change-points occur in spatial transects through grassland vegetation, a problem of considerable interest to plant ecologists.
Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, ofte...
详细信息
Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (em)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard em-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the em-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this em, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new em-algorithm-based method is the first to account for the full genetic architecture of compl
The analysis of data generated by animal habitat selection studies, by family studies of genetic diseases, or by longitudinal follow-up of households often involves fitting a mixed conditional logistic regression mode...
详细信息
The analysis of data generated by animal habitat selection studies, by family studies of genetic diseases, or by longitudinal follow-up of households often involves fitting a mixed conditional logistic regression model to longitudinal data composed of clusters of matched case-control strata. The estimation of model parameters by maximum likelihood is especially difficult when the number of cases per stratum is greater than one. In this case, the denominator of each cluster contribution to the conditional likelihood involves a complex integral in high dimension, which leads to convergence problems in the numerical maximization. In this article we show how these computational complexities can be bypassed using a global two-step analysis for nonlinear mixed effects models. The first step estimates the cluster-specific parameters and can be achieved with standard statistical methods and software based on maximum likelihood for independent data. The second step uses the em-algorithm in conjunction with conditional restricted maximum likelihood to estimate the population parameters. We use simulations to demonstrate that the method works well when the analysis is based on a large number of strata per cluster, as in many ecological studies. We apply the proposed two-step approach to evaluate habitat selection by pairs of bison roaming freely in their natural environment. This article has supplementary material online.
The finite mixture model is an example of a non-regular parametric family, and most classical asymptotic results cannot be directly applied. In particular, the asymptotic properties of likelihood ratio statistics for ...
详细信息
The finite mixture model is an example of a non-regular parametric family, and most classical asymptotic results cannot be directly applied. In particular, the asymptotic properties of likelihood ratio statistics for testing for the number of subpopulations are complicated and difficult to establish. One approach that has been found to simplify the asymptotic results while preserving the power of the test is to modify the likelihood function by incorporating a penalty term to avoid boundary problems. The asymptotic properties and the use of likelihood ratio results are even more difficult when an unknown structural parameter is involved in the model. In this paper, we study an application of the modified likelihood approach to finite normal mixture models with a common and unknown variance in the mixing components and consider a test of the hypothesis of a homogeneous model versus a mixture on two or more components. We show that the X-2(2) distribution is a stochastic lower bound to the limiting distribution of the likelihood ratio statistic. This same distribution is also shown to be a stochastic upper bound to the limiting distribution of the modified likelihood ratio statistic. A small simulation study suggests that both bounds are relatively tight and practically useful. An example from genetics is used to illustrate the technique. (C) 2004 Elsevier B.V. All rights reserved.
We present a novel frailty model for modeling clustered survival data. In particular, we consider the Birnbaum-Saunders (BS) distribution for the frailty terms with a new directly parameterized on the variance of the ...
详细信息
We present a novel frailty model for modeling clustered survival data. In particular, we consider the Birnbaum-Saunders (BS) distribution for the frailty terms with a new directly parameterized on the variance of the frailty distribution. This allows, among other things, compare the estimated frailty terms among traditional models, such as the gamma frailty model. Some mathematical properties of the new model are studied including the conditional distribution of frailties among the survivors, the frailty of individuals dying at time t, and the Kendall's tau\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} measure. Furthermore, an explicit form to the derivatives of the Laplace transform for the BS distribution using the di Bruno's formula is found. Parametric, non-parametric and semiparametric versions of the BS frailty model are studied. We use a simple Expectation-Maximization (em) algorithm to estimate the model parameters and evaluate its performance under different censoring proportion by a Monte Carlo simulation study. We also show that the BS frailty model is competitive over the gamma and weighted Lindley frailty models under misspecification. We illustrate our methodology by using a real data sets.
The existence of items not susceptible to the event of interest is of both theoretical and practical importance. Although researchers may provide, for example, biological, medical, or sociological evidence for the pre...
详细信息
The existence of items not susceptible to the event of interest is of both theoretical and practical importance. Although researchers may provide, for example, biological, medical, or sociological evidence for the presence of such items (cured), statistical models performing well under the existence or not of a cured proportion, frequently offer a necessary flexibility. This work introduces a new reparameterization of a flexible family of cure models, which not only includes among its special cases, the most studied cure models (such as the mixture, bounded cumulative hazard, and negative binomial cure model) but also classical survival models (ie, without cured items). One of the main properties of the proposed family, apart from its computationally tractable closed form, is that the case of zero cured proportion is not found at the boundary of the parameter space, as it typically happens to other families. A simulation study examines the (finite) performance of the suggested methodology, focusing to the estimation through emalgorithm and model discrimination, by the aid of the likelihood ratio test and Akaike information criterion;for illustrative purposes, analysis of two real life datasets (on recidivism and cutaneous melanoma) is also carried out.
Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals an...
详细信息
Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this article, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful, and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed covariate adaptive multiple testing procedure is implemented in the R package CAMT. Supplementary materials for this article are available online.
We develop several methods for estimating the treatment effect difference defined as the overall log-odds ratio of favourable response in a multicentre clinical trial comparing two treatments with binary response. A s...
详细信息
We develop several methods for estimating the treatment effect difference defined as the overall log-odds ratio of favourable response in a multicentre clinical trial comparing two treatments with binary response. A simulation study compares the bias and mean squared error of the point estimates and the exact coverage probabilities of confidence intervals obtained distributions.
暂无评论