In many medical applications, combining information from multiple biomarkers could yield a better diagnosis than any single one on its own. When there is a lack of a gold standard, an algorithm of classifying subjects...
详细信息
In many medical applications, combining information from multiple biomarkers could yield a better diagnosis than any single one on its own. When there is a lack of a gold standard, an algorithm of classifying subjects into the case and non-case status is necessary for combining multiple markers. The aim of this paper is to develop a method to construct a composite test from multiple applicable tests and derive an optimal classification rule under the absence of a gold standard. Rather than combining the tests, we treat the tests as a sequence. This sequential composite test is based on a mixture of two multivariate normal latent models for the distribution of the test results in case and non-case groups, and the optimal classification rule is derived returning the greatest sensitivity at a given specificity. This method is applied to a real-data example and simulation studies have been carried out to assess the statistical properties and predictive accuracy of the proposed composite test. This method is also attainable to implement nonparametrically. Copyright (c) 2015 John Wiley & Sons, Ltd.
This paper discusses models for circular responses with a spike at zero. Maximum likelihood estimation for the underlying parameters and a test for checking a spike are also carried out. Simulations and a real data ex...
详细信息
This paper discusses models for circular responses with a spike at zero. Maximum likelihood estimation for the underlying parameters and a test for checking a spike are also carried out. Simulations and a real data example are considered for illustrations. (C) 2015 Elsevier B.V. All rights reserved.
The Erlang mixture model has been widely used in modeling insurance losses due to its desirable distributional properties. In this paper, we consider the problem of efficient estimation of the Erlang mixture model. We...
详细信息
The Erlang mixture model has been widely used in modeling insurance losses due to its desirable distributional properties. In this paper, we consider the problem of efficient estimation of the Erlang mixture model. We present a new thresholding penalty function and a corresponding em algorithm to estimate model parameters and to determine the order of the mixture. Using simulation studies and a real data application, we demonstrate the efficiency of the em algorithm.
Joint models are statistical tools for estimating the association between time-to-event and longitudinal outcomes. One challenge to the application of joint models is its computational complexity. Common estimation me...
详细信息
Joint models are statistical tools for estimating the association between time-to-event and longitudinal outcomes. One challenge to the application of joint models is its computational complexity. Common estimation methods for joint models include a two-stage method, Bayesian and maximum-likelihood methods. In this work, we consider joint models of a time-to-event outcome and multiple longitudinal processes and develop a maximum-likelihood estimation method using the expectation-maximization algorithm. We assess the performance of the proposed method via simulations and apply the methodology to a data set to determine the association between longitudinal systolic and diastolic blood pressure measures and time to coronary artery disease.
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed "doubly-censored data". This model extends current survival literature by b...
详细信息
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed "doubly-censored data". This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left-censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the non parametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left-censoring feature of the data. The new algorithm not only resolves this challenge but also accommodates the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data.
Crowdsourcing approaches rely on the collection of multiple individuals to solve problems that require analysis of large data sets in a timely accurate manner. The inexperience of participants or annotators motivates ...
详细信息
ISBN:
(纸本)9781509041183
Crowdsourcing approaches rely on the collection of multiple individuals to solve problems that require analysis of large data sets in a timely accurate manner. The inexperience of participants or annotators motivates well robust techniques. Focusing on clustering setups, the data provided by all annotators is suitably modeled here as a mixture of Gaussian components plus a uniformly distributed random variable to capture outliers. The proposed algorithm is based on the expectation-maximization algorithm and allows for soft assignments of data to clusters, to rate annotators according to their performance, and to estimate the number of Gaussian components in the non-Gaussian/Gaussian mixture model, in a jointly manner.
Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. W...
详细信息
Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an em-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand.
The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors. In this w...
详细信息
The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors. In this work we present a proposal to deal with these issues simultaneously in the context of the mixture regression by extending the classic normal model by assuming that the random errors follow a scale mixtures of skew-normal distributions. This approach allows us to model data with great flexibility, accommodating skewness and heavy tails. The main virtue of considering the mixture regression models under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows easy implementation of inference. We develop a simple em-type algorithm to perform maximum likelihood inference of the parameters of the proposed model. In order to examine the robust aspect of this flexible model against outlying observations, some simulation studies are also presented. Finally, a real data set is analyzed, illustrating the usefulness of the proposed method.
In some acquired immunodeficiency syndrome (AIDS) clinical trials, the human immunodeficiency virus-1 ribonucleic acid measurements are collected irregularly over time and are often subject to some upper and lower det...
详细信息
In some acquired immunodeficiency syndrome (AIDS) clinical trials, the human immunodeficiency virus-1 ribonucleic acid measurements are collected irregularly over time and are often subject to some upper and lower detection limits, depending on the quantification assays. Linear and nonlinear mixed-effects models, with modifications to accommodate censored observations, are routinely used to analyze this type of data (Vaida and Liu, J Comput Graph Stat 18:797-817, 2009;Matos et al., Comput Stat Data Anal 57(1):450-464, 2013a). This paper presents a framework for fitting LMEC/NLMEC with response variables recorded at irregular intervals. To address the serial correlation among the within-subject errors, a damped exponential correlation structure is considered in the random error and an em-type algorithm is developed for computing the maximum likelihood estimates, obtaining as a byproduct the standard errors of the fixed effects and the likelihood value. The proposed methods are illustrated with simulations and the analysis of two real AIDS case studies.
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language d...
详细信息
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.
暂无评论