A model is developed for chronic diseases with an indolent phase that is followed by a phase with more active disease resulting in progression and damage. The time scales for the intensity functions for the active pha...
详细信息
A model is developed for chronic diseases with an indolent phase that is followed by a phase with more active disease resulting in progression and damage. The time scales for the intensity functions for the active phase are more naturally based on the time since the start of the active phase, corresponding to a semi-Markov formulation. This two-phase model enables one to fit a separate regression model for the duration of the indolent phase and intensity-based models for the more active second phase. In cohort studies for which the disease status is only known at a series of clinical assessment times, transition times are interval-censored, whichmeans the time origin for phase II is interval-censored. Weakly parametric models with piecewise constant baseline hazard and rate functions are specified, and an expectation-maximization algorithm is described for model fitting. Simulation studies examining the performance of the proposed model show good performance under maximum likelihood and two-stage estimation. An application to data from the motivating study of disease progression in psoriatic arthritis illustrates the procedure and identifies new human leukocyte antigens associated with the duration of the indolent phase. Copyright (C) 2017 John Wiley & Sons, Ltd.
With the increasing availability of large prospective disease registries, scientists studying the course of chronic conditions often have access to multiple data sources, with each source generated based on its own en...
详细信息
With the increasing availability of large prospective disease registries, scientists studying the course of chronic conditions often have access to multiple data sources, with each source generated based on its own entry conditions. The different entry conditions of the various registries may be explicitly based on the response process of interest, in which case the statistical analysis must recognize the unique truncation schemes. Moreover, intermittent assessment of individuals in the registries can lead to interval-censored times of interest. We consider the problem of selecting important prognostic biomarkers from a large set of candidates when the event times of interest are truncated and right- or interval-censored. Methods for penalized regression are adapted to handle truncation via a Turnbull-type complete data likelihood. An expectation-maximization algorithm is described which is empirically shown to perform well. Inverse probability weights are used to adjust for the selection bias when assessing predictive accuracy based on individuals whose event status is known at a time of interest. Application to the motivating study of the development of psoriatic arthritis in patients with psoriasis in both the psoriasis cohort and the psoriatic arthritis cohort illustrates the procedure.
Cure rate models or long-term survival models play an important role in survival analysis and some other applied fields. In this article, by assuming a Conway-Maxwell-Poisson distribution under a competing cause scena...
详细信息
Cure rate models or long-term survival models play an important role in survival analysis and some other applied fields. In this article, by assuming a Conway-Maxwell-Poisson distribution under a competing cause scenario, we study a flexible cure rate model in which the lifetimes of non-cured individuals are described by a Cox's proportional hazard model with a Weibull hazard as the baseline function. Inference is then developed for a right censored data by the maximum likelihood method with the use of expectation-maximization algorithm and a profile likelihood approach for the estimation of the dispersion parameter of the Conway-Maxwell-Poisson distribution. An extensive simulation study is performed, under different scenarios including various censoring proportions, sample sizes, and lifetime parameters, in order to evaluate the performance of the proposed inferential method. Discrimination among some common cure rate models is then done by using likelihood-based and information-based criteria. Finally, for illustrative purpose, the proposed model and associated inferential procedure are applied to analyze a cutaneous melanoma data.
HLA haplotype frequencies are of use in a variety of settings. Such data is typically derived either from family pedigree data by targeted typing or statistical analysis of large population-specific genotype samples. ...
详细信息
HLA haplotype frequencies are of use in a variety of settings. Such data is typically derived either from family pedigree data by targeted typing or statistical analysis of large population-specific genotype samples. As established tools for the latter approach lacked ability to treat the amount, ambiguity, and inhomogeneity found in genotype data in hematopoietic stem cell donor registries, we developed Hapl-o-Mat to alleviate these specific shortcomings. less
A novel algorithm is presented for macro model estimation of the dynamics of traffic flow in a junction having multiple input lanes for each turning direction. The proposed algorithm jointly estimates the states descr...
详细信息
ISBN:
(纸本)9789897582424
A novel algorithm is presented for macro model estimation of the dynamics of traffic flow in a junction having multiple input lanes for each turning direction. The proposed algorithm jointly estimates the states describing the traffic flow under different traffic conditions, together with model parameters and their uncertainties of the measurement and process noise. Use is made of the expectation-maximization methodology with a sliding window over time in order to obtain quasi real-time estimation.
In this paper, estimation of unknown parameters of an inverted exponentiated Pareto distribution is considered under progressive Type-II censoring. Maximum likelihood estimates are obtained from the expectation-maximi...
详细信息
In this paper, estimation of unknown parameters of an inverted exponentiated Pareto distribution is considered under progressive Type-II censoring. Maximum likelihood estimates are obtained from the expectation-maximization algorithm. We also compute the observed Fisher information matrix. In the sequel, asymptotic and bootstrap-p intervals are constructed. Bayes estimates are derived using the importance sampling procedure with respect to symmetric and asymmetric loss functions. Highest posterior density intervals of unknown parameters are constructed as well. The problem of one- and two-sample prediction is discussed in Bayesian framework. Optimal plans are obtained with respect to two information measure criteria. We assess the behavior of suggested estimation and prediction methods using a simulation study. A real dataset is also analyzed for illustration purposes. Finally, we present some concluding remarks.
Background: Haplotypes are important in anti-malarial drug resistance because genes encoding drug resistance may accumulate mutations at several codons in the same gene, each mutation increasing the level of drug resi...
详细信息
Background: Haplotypes are important in anti-malarial drug resistance because genes encoding drug resistance may accumulate mutations at several codons in the same gene, each mutation increasing the level of drug resistance and, possibly, reducing the metabolic costs of previous mutation. Patients often have two or more haplotypes in their blood sample which may make it impossible to identify exactly which haplotypes they carry, and hence to measure the type and frequency of resistant haplotypes in the malaria population. Results: This study presents two novel statistical methods expectation-maximization (EM) and Markov chain Monte Carlo (MCMC) algorithms to investigate this issue. The performance of the algorithms is evaluated on simulated datasets consisting of patient blood characterized by their multiplicity of infection (MOI) and malaria genotype. The datasets are generated using different resistance allele frequencies (RAF) at each single nucleotide polymorphisms (SNPs) and different limit of detection (LoD) of the SNPs and the MOI. The EM and the MCMC algorithm are validated and appear more accurate, faster and slightly less affected by LoD of the SNPs and the MOI compared to previous related statistical approaches. Conclusions: The EM and the MCMC algorithms perform well when analysing malaria genetic data obtained from infected human blood samples. The results are robust to genotyping errors caused by LoDs and function well even in the absence of MOI data on individual patients.
Background: Knowledge of HLA haplotypes is helpful in many settings as disease association studies, population genetics, or hematopoietic stem cell transplantation. Regarding the recruitment of unrelated hematopoietic...
详细信息
Background: Knowledge of HLA haplotypes is helpful in many settings as disease association studies, population genetics, or hematopoietic stem cell transplantation. Regarding the recruitment of unrelated hematopoietic stem cell donors, HLA haplotype frequencies of specific populations are used to optimize both donor searches for individual patients and strategic donor registry planning. However, the estimation of haplotype frequencies from HLA genotyping data is challenged by the large amount of genotype data, the complex HLA nomenclature, and the heterogeneous and ambiguous nature of typing records. Results: To meet these challenges, we have developed the open-source software Hapl-o-Mat. It estimates haplotype frequencies from population data including an arbitrary number of loci using an expectation-maximization algorithm. Its key features are the processing of different HLA typing resolutions within a given population sample and the handling of ambiguities recorded via multiple allele codes or genotype list strings. Implemented in C++, Hapl-o-Mat facilitates efficient haplotype frequency estimation from large amounts of genotype data. We demonstrate its accuracy and performance on the basis of artificial and real genotype data. Conclusions: Hapl-o-Mat is a versatile and efficient software for HLA haplotype frequency estimation. Its capability of processing various forms of HLA genotype data allows for a straightforward haplotype frequency estimation from typing records usually found in stem cell donor registries.
Various solutions to the parameter estimation problem of a recently introduced multivariate Pareto distribution are developed and exemplified numerically. Namely, a density of the aforementioned multivariate Pareto di...
详细信息
Various solutions to the parameter estimation problem of a recently introduced multivariate Pareto distribution are developed and exemplified numerically. Namely, a density of the aforementioned multivariate Pareto distribution with respect to a dominating measure, rather than the corresponding Lebesgue measure, is specified and then employed to investigate the maximum likelihood estimation (MLE) approach. Also, in an attempt to fully enjoy the common shock origins of the multivariate model of interest, an adapted variant of the expectation-maximization (EM) algorithm is formulated and studied. The method of moments is discussed as a convenient way to obtain starting values for the numerical optimization procedures associated with the MLE and EM methods.
Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within indiv...
详细信息
Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never "finished". Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals' happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams. (C) 2016 Elsevier B.V. All rights reserved.
暂无评论