We discuss regression analysis of current status data with the additive hazards model when the failure status may suffer misclassification. Such data occur commonly in many scientific fields involving the diagnosis te...
详细信息
We discuss regression analysis of current status data with the additive hazards model when the failure status may suffer misclassification. Such data occur commonly in many scientific fields involving the diagnosis test with imperfect sensitivity and specificity. In particular, we consider the situation where the sensitivity and specificity are known and propose a nonparametric maximum likelihood approach. For the implementation of the method, a novel em algorithm is developed, and the asymptotic properties of the resulting estimators are established. Furthermore, the estimated regression parameters are shown to be semiparametrically efficient. We demonstrate the empirical performance of the proposed methodology in a simulation study and show its substantial advantages over the naive method. Also an application to a motivated study on chlamydia is provided.
A content delivery network(CDN)aims to reduce the content delivery latency to end-users by using distributed cache ***,deploying and maintaining cache servers on a large scale is very *** solve this problem,CDN provid...
详细信息
A content delivery network(CDN)aims to reduce the content delivery latency to end-users by using distributed cache ***,deploying and maintaining cache servers on a large scale is very *** solve this problem,CDN providers have developed a new content delivery strategy:allowing end-users’s IoT edge devices to share their storage/bandwidth *** new edge CDN platform must address two core questions:(1)how can we incentivize end users to share IoT devices?(2)how can we facilitate a safe and transparent content transaction environment for end users?This paper introduces SmartSharing,a new content delivery network solution to address these *** smartSharing,the over-the-top(OTT)IoT devices belonging to end-users are used as mini-cache *** motivate end users to share the idle devices and storage/bandwidth resources,SmartSharing designs the content delivery schedule and the pricing scheme based on game theory and machine learning algorithms(specifically,a tailored expectation-maximization(em)algorithm).To facilitate content trading among end users,SmartSharing creates a secure and transparent transaction platform based on smart contracts in *** addition,SmartSharing’s performance evaluation is through trace-driven simulations in the real world and a prototype using content metadata and the achieved pricing *** evaluation results show that CDN providers,end users and content providers can all benefit from our SmartSharing framework.
The validity of statistical inference for panel count data with time-varying covariates depends on the correct specification of within-subject correlation structures;misspecification often leads to questionable infere...
详细信息
The validity of statistical inference for panel count data with time-varying covariates depends on the correct specification of within-subject correlation structures;misspecification often leads to questionable inference. To alleviate, robust inference has been proposed for mean models, which implicitly assume monotone mean functions. When covariate values fluctuate with time, however, the assumed monotonicity becomes unrealistic. In this research, we propose a robust inference based on rate models that are free of such constraints. Since the asymptotic variance has no closed form under the rate model, we further develop computationally efficient robust variance estimators using the Expectation-Maximization (em) algorithm, thus sidestepping the need for computationally intensive numerical methods, which could undermine the robustness. Rigorous theoretical development is provided in support of parameter estimation and inference. Extensive simulation studies demonstrate the superiority of the proposed method. We present a real clinical application to illustrate the use of the proposed method.
Compositional data (CoDa) often appear in various fields such as biology, medicine, geology, chemistry, economics, ecology and sociology. Although existing Dirichlet and related models are frequently employed in CoDa ...
详细信息
Compositional data (CoDa) often appear in various fields such as biology, medicine, geology, chemistry, economics, ecology and sociology. Although existing Dirichlet and related models are frequently employed in CoDa analysis, sometimes they may provide unsatisfactory performances in modelling CoDa as shown in our first real data example. First, this paper develops a multivariate compositional inverse Gaussian (CIG) model as a new tool for analysing CoDa. By incorporating the stochastic representation (SR), the expectation-maximization (em) algorithm (aided by a one-step gradient descent algorithm) can be established to solve the parameter estimation for the proposed distribution (model). Next, zero observations may be often encountered in the real CoDa analysis. Therefore, the second aim of this paper is to propose a new model (called as ZCIG model) through a novel mixture SR based on both the CIG random vector and a so-called zero-truncated product Bernoulli random vector to model CoDa with zeros. Corresponding statistical inference methods are also developed for both cases without/with covariates. Two real data sets are analysed to illustrate the proposed statistical methods by comparing the proposed CIG and ZCIG models with existing Dirichlet and logistic-normal models.
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expe...
详细信息
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (em) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an l(infinity)-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the em sequence achieves the sharp rate of estimation in the l(2)-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the em algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and *** the situation,another issue tha...
详细信息
Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and *** the situation,another issue that may occur is that the observation time may be correlated with the interested failure time,which is often referred to as informative censoring or observation *** is well-known that in the presence of informative censoring,the analysis that ignores it could yield biased or even misleading *** this paper,the authors consider such data and propose a frailty-based inference *** particular,an em algorithm based on Poisson latent variables is developed and the asymptotic properties of the resulting estimators are *** numerical results show that the proposed method works well in practice and an application to a set of real data is provided.
The continuously updated database of failures and censored data of numerous products has become large, and on some covariates, information regarding the failure times is missing in the database. As the dataset is larg...
详细信息
The continuously updated database of failures and censored data of numerous products has become large, and on some covariates, information regarding the failure times is missing in the database. As the dataset is large and has missing information, the analysis tasks become complicated and a long time is required to execute the programming codes. In such situations, the divide and recombine (D&R) approach, which has a practical computational performance for big data analysis, can be applied. In this study, the D&R approach was applied to analyze the real field data of an automobile component with incomplete information on covariates using the Weibull regression model. Model parameters were estimated using the expectation maximization algorithm. The results of the data analysis and simulation demonstrated that the D&R approach is applicable for analyzing such datasets. Further, the percentiles and reliability functions of the distribution under different covariate conditions were estimated to evaluate the component performance of these covariates. The findings of this study have managerial implications regarding design decisions, safety, and reliability of automobile components.
Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is ...
详细信息
Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.
Osteoporosis is a metabolic bone disorder that is characterized by reduced bone mineral density (BMD) and deterioration of bone microarchitecture. Osteoporosis is highly prevalent among women over 50, leading to skele...
详细信息
Osteoporosis is a metabolic bone disorder that is characterized by reduced bone mineral density (BMD) and deterioration of bone microarchitecture. Osteoporosis is highly prevalent among women over 50, leading to skeletal fragility and risk of fracture. Early diagnosis and treatment of those at high risk for fracture is very important in order to avoid morbidity, mortality and economic burden from preventable fractures. The province of Manitoba established a BMD testing program in 1997. The Manitoba BMD registry is now the largest population-based BMD registry in the world, and has detailed information on fracture outcomes and other covariates for over 160,000 BMD assessments. In this paper, we develop a number of methodologies based on ranked-set type sampling designs to estimate the prevalence of osteoporosis among women of age 50 and older in the province of Manitoba. We use a parametric approach based on finite mixture models, as well as the usual approaches using simple random and stratified sampling designs. Results are obtained under perfect and imperfect ranking scenarios while the sampling and ranking costs are incorporated into the study. We observe that rank-based methodologies can be used as cost-efficient methods to monitor the prevalence of osteoporosis.
In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary dat...
详细信息
In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch's Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection but has received little attention. We introduce two 2PCMPM-based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for (categorical) person covariates. Estimation methods are provided and satisfactory statistical properties are observed in simulations. Two examples illustrate how the models help understand tests and underlying constructs.
暂无评论