We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample ...
详细信息
We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (em) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized em estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
With the progress of information technology, large amounts of asymmetric, leptokurtic, and heavy-tailed data are arising in various fields, such as finance, engineering, genetics, and medicine. It is very challenging ...
详细信息
With the progress of information technology, large amounts of asymmetric, leptokurtic, and heavy-tailed data are arising in various fields, such as finance, engineering, genetics, and medicine. It is very challenging to model those kinds of data, especially for extremely skewed data, accompanied by very high kurtosis or heavy tails. In this article, we propose a class of novel skewed generalized t distribution (SkeGTD) as a scale mixture of skewed generalized normal. The proposed SkeGTD has excellent adaptiveness to various data, because of its capability of allowing for a large range of skewness and kurtosis and its compatibility of the separated location, scale, skewness, and shape parameters. We investigate some important properties of this family of distributions. The maximum likelihood estimation, L-moments estimation, and two-step estimation for the SkeGTD are explored. To illustrate the usefulness of the proposed methodology, we present simulation studies and analyze two real datasets.
In this paper, we consider the stochastic versions of three classical growth models given by ordinary differential equations (ODEs). Indeed we use the stochastic versions of Gompertz, von Bertalanffy, and logistic dif...
详细信息
In this paper, we consider the stochastic versions of three classical growth models given by ordinary differential equations (ODEs). Indeed we use the stochastic versions of Gompertz, von Bertalanffy, and logistic differential equations as models. We assume that each stochastic differential equation (SDE) has some crucial parameters to be estimated, and we use maximum likelihood estimation (MLE) to estimate them. For estimating the diffusion parameter, we use the MLE for two cases and the quadratic variation of the data for one of the SDEs. We apply the Akaike information criterion (AIC) to choose the best model for the simulated data. We consider that the AIC is a function of the drift parameter. We conduct numerical experiments to validate our selection method. Subsequently, we also apply it to actual data. The proposed methodology could be applied to datasets with discrete observations, including highly sparse data. Indeed, we can use this method even in the extreme case where we have observed only one point for each path, under the condition that we observed a sufficient number of trajectories. For the last two cases, the data can be viewed as incomplete observations of a model with a tractable likelihood function;then, we propose a version of the expectation maximization (em) algorithm to estimate these parameters. This type of dataset typically appears in fishery, for instance.
The modeling of personal accident insurance data has been a topic of high relevance in the insurance literature. This type of data often exhibits positive skewness and heavy tails. In this work, we propose a new quant...
详细信息
The modeling of personal accident insurance data has been a topic of high relevance in the insurance literature. This type of data often exhibits positive skewness and heavy tails. In this work, we propose a new quantile regression model based on the scale-mixture Birnbaum-Saunders distribution for modeling personal accident insurance data. The maximum likelihood estimates of the model parameters are obtained via the em algorithm. Two Monte Carlo simulation studies are performed using the R software. The first study aims to analyze the performances of the em algorithm to obtain the maximum likelihood estimates, and the randomized quantile and generalized Cox-Snell residuals. In the second simulation study, the size and power of the Wald, likelihood ratio, score and gradient tests are evaluated. The two simulation studies are conducted considering different quantiles of interest and sample sizes. Finally, a real insurance data set is analyzed to illustrate the proposed approach.
This paper investigates the identification of multiple dipole sound sources using sound pressures measured from a microphone array. The problem is addressed in the maximum likelihood (ML) framework, where the location...
详细信息
This paper investigates the identification of multiple dipole sound sources using sound pressures measured from a microphone array. The problem is addressed in the maximum likelihood (ML) framework, where the locations, orientations, and powers of multiple dipole sound sources are unknown parameters to be estimated. By the consistency property of ML, the estimated parameters converge to their actual values, which implies an asymptotically perfect spatial resolution, if a sufficiently high signal-to-noise ratio can be achieved. In order to reduce the dimension of the optimization problem of ML, the contribution of each dipole source to the measured pressures is assumed to be a latent variable and the ML problem is equivalently solved via the expectation-maximization (em) algorithm, which iteratively and sequentially updates each source contribution and the associated sound source parameters. The number of sound sources can also be determined by the model selection approaches which add a penalty of model dimension to the ML objective function. The proposed method is assessed via a laboratory experiment where the sound field is produced by dipole speakers and a wind tunnel experiment where airframe aerodynamic noise is generated at a high Reynolds number. Experimental results show that the proposed method outperforms existing approaches in the sense of higher spatial resolution, more accurate localization, and the capacity to identify the orientations of multiple dipole sound sources.
This work proposes a statistical model for crossover trials with multiple skewed responses measured in each period. A 3 x 3 crossover trial data where different doses of a drug were administered to subjects with a his...
详细信息
This work proposes a statistical model for crossover trials with multiple skewed responses measured in each period. A 3 x 3 crossover trial data where different doses of a drug were administered to subjects with a history of seasonal asthma rhinitis to grass pollen is used for motivation. In each period, gene expression values for 10 genes were measured from each subject. It considers a linear mixed effect model with skew normally distributed random effect or random error term to model the asymmetric responses in the crossover trials. The article examines cases (i) when a random effect follows a skew-normal distribution, as well as (ii) when a random error follows a skew-normal distribution. The expectation-maximization algorithm is used in both cases to compute maximum likelihood estimates of parameters. Simulations and crossover data from the gene expression study illustrate the proposed approach.
Traditional statistical analysis is challenged by modern massive data sets, which have huge sample size and dimension. Quantile regression has become a popular alternative to least squares method for providing compreh...
详细信息
Traditional statistical analysis is challenged by modern massive data sets, which have huge sample size and dimension. Quantile regression has become a popular alternative to least squares method for providing comprehensive description of the response distribution and robustness against heavy-tailed error distributions. On the other hand, non-smooth quantile loss poses a new challenge to massive data sets. To address the problem, we transform the non-differentiable quantile loss function into a convex quadratic loss function based on Expectation-maximization (em) algorithm using an asymmetric Laplace distribution. Both simulations and real data application are conducted to illustrate the performance of the proposed methods.
In the drilling of oil wells, the need to accurately detect downhole formation pressure transitions has long been established as critical for safety and economics. In this article, we examine the application of Hidden...
详细信息
In the drilling of oil wells, the need to accurately detect downhole formation pressure transitions has long been established as critical for safety and economics. In this article, we examine the application of Hidden Markov Models (HMMs) to oilwell drilling processes with a focus on the real time evolution of downhole formation pressures in its partially observed state. The downhole drilling pressure system can be viewed as a nonlinear, non-degrading stochastic process whose optimum performance is in a region in its warning state prior to random failure in time. The differential pressure system ( increment P)$$ \left(\Delta P\right) $$ is modeled as a hidden 3 state continuous time Markov process. States 0 and 1 are not observable and represent the normally pressured (initiating increment P$$ \Delta P $$) and abnormally pressured or warning (reducing increment P$$ \Delta P $$) states respectively. State 2 is the observable failure state (from negative increment P$$ \Delta P $$ and loss of well control). The signal process of the evolution of differential pressure ( increment P)$$ \left(\Delta P\right) $$ is identified in the changes in the observable rate of penetration (ROP) encoded in drilling performance data. The state and observation parameters of the HMM are estimated using the Expectation Maximization (em) algorithm and we show, for a univariate system with a depth dependent time relationship, that the model parameter updates of the em algorithm equation have explicit solutions. A Bayesian inference model, to determine the safety threshold of the system and early failure prediction at each sampling epoch, is thereafter proposed. The application of our stochastic model of the dynamic evolution of downhole pressures in operational time is illustrated with a hindcast case example. The analysis showed strong early indication of probable failure in real time and was validated in the field post drilling system failure that resulted in significant recovery costs. T
A more flexible type of mixture autoregressive model, namely the Burr mixture autoregressive, BMAR model is studied in this article for modeling non linear time series. The model consists of a mixture of K autoregress...
详细信息
A more flexible type of mixture autoregressive model, namely the Burr mixture autoregressive, BMAR model is studied in this article for modeling non linear time series. The model consists of a mixture of K autoregressive components with each conditional distribution of the component following a Burr distribution. The BMAR model enjoys some nice statistical properties which allow it to capture time series with: (1) unimodal or multimodal;(2) asymmetry or symmetry conditional distribution;(3) conditional heteroscedasticity;(4) cyclical or seasonal;and (5) conditional leptokurtic distribution. Sufficient and less restrictive conditions for the ergodicity of the BMAR model are derived and discussed. A more robust constrained optimization algorithm (em - sequential quadratic programming method) is proposed for the non linear optimization problem. From the simulation studies carried out, the parameters estimation method showed satisfying results. The variance of the estimated parameters is also addressed with the missing information principle. Real datasets from two different fields of study are used to assess the performance of the BMAR model compared to other competing models. The comparison done in the empirical examples reveals the supremacy of the BMAR model in capturing the data behavior.
Many stochastic models in economics and finance are described by distributions with a lognormal body. Testing for a possible Pareto tail and estimating the parameters of the Pareto distribution in these models is an i...
详细信息
Many stochastic models in economics and finance are described by distributions with a lognormal body. Testing for a possible Pareto tail and estimating the parameters of the Pareto distribution in these models is an important topic. Although the problem has been extensively studied in the literature, most applications are characterized by some weaknesses. We propose a method that exploits all the available information by taking into account the data generating process of the whole population. After estimating a lognormal-Pareto mixture with a known threshold via the em algorithm, we exploit this result to develop an unsupervised tail estimation approach based on the maximization of the profile likelihood function. Monte Carlo experiments and two empirical applications to the size of US metropolitan areas and of firms in an Italian district confirm that the proposed method works well and outperforms two commonly used techniques. Simulation results are available in an online supplementary appendix.
暂无评论