The emerging field of precision medicine is transforming statistical analysis from the classical paradigm of population-average treatment effects into that of personal treatment effects. This new scientific mission ha...
详细信息
The emerging field of precision medicine is transforming statistical analysis from the classical paradigm of population-average treatment effects into that of personal treatment effects. This new scientific mission has called for adequate statistical methods to assess heterogeneous covariate effects in regression analysis. This paper focuses on a subgroup analysis that consists of two primary analytic tasks: identification of treatment effect subgroups and individual group memberships, and statistical inference on treatment effects by subgroup. We propose an approach to synergizing supervised clustering analysis via alternating direction method of multipliers (ADMM) algorithm and statistical inference on subgroup effects via expectation-maximization (em) algorithm. Our proposed procedure, termed as hybrid operation for subgroup analysis (HOSA), enjoys computational speed and numerical stability with interpretability and reproducibility. We establish key theoretical properties for both proposed clustering and inference procedures. Numerical illustration includes extensive simulation studies and analyses of motivating data from two randomized clinical trials to learn subgroup treatment effects.
In this paper, a new multivariate zero-inflated binomial (MZIB) distribution is proposed to analyse the correlated proportional data with excessive zeros. The distributional properties of purposed model are studied. T...
详细信息
In this paper, a new multivariate zero-inflated binomial (MZIB) distribution is proposed to analyse the correlated proportional data with excessive zeros. The distributional properties of purposed model are studied. The Fisher scoring algorithm and em algorithm are given for the computation of estimates of parameters in the proposed MZIB model with/without covariates. The score tests and the likelihood ratio tests are derived for assessing both the zero-inflation and the equality of multiple binomial probabilities in correlated proportional data. A limited simulation study is performed to evaluate the performance of derived em algorithms for the estimation of parameters in the model with/without covariates and to compare the nominal levels and powers of both score tests and likelihood ratio tests. The whitefly data is used to illustrate the proposed methodologies.
In a run-off triangle external factors can have a similar influence on all incremental losses of the same calendar year. This can distort the triangle such that reserving methods like chain ladder or the loss ratio me...
详细信息
In a run-off triangle external factors can have a similar influence on all incremental losses of the same calendar year. This can distort the triangle such that reserving methods like chain ladder or the loss ratio method do not work properly. A very recent example of such an external factor is the Covid-19 pandemic. In many countries, the insurance industry is in the process of establishing market knowledge about the impact of the pandemic on premiums and losses. We extend the additive claims reserving model to allow for calendar year effects and develop a variant of the incremental loss ratio method (also known as the additive method) that can make use of such market knowledge. We derive formulas for the mean squared error of prediction and provide a detailed numerical example.
This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investigated, and its asymptotic norma...
详细信息
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific fun...
详细信息
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.
We consider the problem of estimating the maximum posterior probability (MAP) state sequence for a finite state and finite emission alphabet hidden Markov model (HMM) in the Bayesian setup, where both emission and tra...
详细信息
We consider the problem of estimating the maximum posterior probability (MAP) state sequence for a finite state and finite emission alphabet hidden Markov model (HMM) in the Bayesian setup, where both emission and transition matrices have Dirichlet priors. We study a training set consisting of thousands of protein alignment pairs. The training data is used to set the prior hyperparameters for Bayesian MAP segmentation. Since the Viterbi algorithm is not applicable any more, there is no simple procedure to find the MAP path, and several iterative algorithms are considered and compared. The main goal of the paper is to test the Bayesian setup against the frequentist one, where the parameters of HMM are estimated using the training data.
In this article, we consider nonparametric estimation of the cumulative incidence function (CIF) for left-truncated and interval-censored competing risks (LT-ICC) data. To reduce the bias of the pseudo-likelihood esti...
详细信息
In this article, we consider nonparametric estimation of the cumulative incidence function (CIF) for left-truncated and interval-censored competing risks (LT-ICC) data. To reduce the bias of the pseudo-likelihood estimator (PLE) of CIF in the literature, we proposed two alternative estimators. The first estimator, called the modified PLE (MPLE), is obtained based on the modified NPMLE of F(t). The second estimator, called the modified maximum likelihood estimator (MMLE), is derived using modified likelihood functions for LT-ICC data, where the left endpoints of the intervals for left-censored observations with failure type j are the maximum of left-truncated variables and the estimated left endpoint of the support of the observations. Simulation studies show that the MPLE and MMLE are less biased than the PLE for most of the cases considered and their standard deviations are significantly smaller than that of the PLE.
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the und...
详细信息
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast em algorithm and show that the estimator is asymptotically normally distributed. Aglobal test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS.
The identification of factors associated with mental and behavioural disorders in early childhood is critical both for psychopathology research and the support of primary health care practices. Motivated by the Millen...
详细信息
The identification of factors associated with mental and behavioural disorders in early childhood is critical both for psychopathology research and the support of primary health care practices. Motivated by the Millennium Cohort Study, in this paper we study the effect of a comprehensive set of covariates on children's emotional and behavioural trajectories in England. To this end, we develop a quantile mixed hidden Markov model for joint estimation of multiple quantiles in a linear regression setting for multivariate longitudinal data. The novelty of the proposed approach is based on the multivariate asymmetric Laplace distribution which allows to jointly estimate the quantiles of the univariate conditional distributions of a multivariate response, accounting for possible correlation between the outcomes. Sources of unobserved heterogeneity and serial dependency due to repeated measures are modelled through the introduction of individual-specific, time-constant random coefficients and time-varying parameters evolving over time with a Markovian structure respectively. The inferential approach is carried out through the construction of a suitable expectation-maximization algorithm without parametric assumptions on the random effects distribution.
Statistical inference for normal mixture models with unknown number of components has long been challenging due to the issues of non-identifiability, degenerated Fisher matrix, and boundary parameters. In this paper, ...
详细信息
Statistical inference for normal mixture models with unknown number of components has long been challenging due to the issues of non-identifiability, degenerated Fisher matrix, and boundary parameters. In this paper, a penalized likelihood estimation procedure is proposed for mixtures of normals with unknown number of components to achieve both the order selection consistency and the root -n convergence rate for the component pa-rameters estimators. We show that the proposed new estimator could avoid being trapped in certain degenerated regions of the nonidentifiable subset of the parameter space for over-fitted normal mixture models so that a reg-ular asymptotic quadratic Taylor expansion of the mixture log-likelihood could be derived. With a suitable penalty function on mixing proportions, the new estimator is proved to be consistent on the order selection, and have an asymptotic normal distribution. Our derived sparsity conditions also reveal some surprising but interesting differences among some com-monly used penalty functions and explain why the performance of some popularly used penalty functions, such as Lasso and SCAD, provide un-satisfactory results in the order selection. Extensive simulations and a real data analysis are conducted to demonstrate the effectiveness of the newly proposed estimator.
暂无评论