In this article, we propose and study the class of multivariate log-normal/independent distributions and linear regression models based on this class. The class of multivariate log-normal/independent distributions is ...
详细信息
In this article, we propose and study the class of multivariate log-normal/independent distributions and linear regression models based on this class. The class of multivariate log-normal/independent distributions is very attractive for robust statistical modeling because it includes several heavy-tailed distributions suitable for modeling correlated multivariate positive data that are skewed and possibly heavy-tailed. Besides, expectation-maximization (em)-type algorithms can be easily implemented for maximum likelihood estimation. We model the relationship between quantiles of the response variables and a set of explanatory variables, compute the maximum likelihood estimates of parameters through em-type algorithms, and evaluate the model fitting based on Mahalanobis-type distances. The satisfactory performance of the quantile estimation is verified by simulation studies. An application to newborn data is presented and discussed.
State space models have been extensively applied to model and control dynamical systems in disciplines including neuroscience, target tracking, and audio processing. A common modeling assumption is that both the state...
详细信息
State space models have been extensively applied to model and control dynamical systems in disciplines including neuroscience, target tracking, and audio processing. A common modeling assumption is that both the state and data noise are Gaussian because it simplifies the estimation of the system's state and model parameters. However, in many real-world scenarios where the noise is heavy-tailed or includes outliers, this assumption does not hold, and the performance of the model degrades. In this paper, we present a new approximate inference algorithm for state space models with Laplace-distributed multivariate data that is robust to a wide range of non-Gaussian noise. Exact inference is combined with an expectation propagation algorithm, leading to filtering and smoothing that outperforms existing approximate inference methods for Laplace-distributed data, while retaining a fast speed similar to the Kalman filter. Further, we present a maximum posterior expectation-maximization (em) algorithm that learns the parameters of the model in an unsupervised way, automatically avoids over-fitting the data, and provides better model estimation than existing methods for the Gaussian model. The quality of the inference and learning algorithms are exemplified through a diverse set of experiments and an application to non-linear tracking of audio frequency.
Finite mixture models are powerful tools for modeling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation-Maximization (em) algorithm...
详细信息
Finite mixture models are powerful tools for modeling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation-Maximization (em) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the em algorithm for these models involves complicated expressions that are time-consuming to evaluate numerically. In this paper, we describe a parallel implementation of the em algorithm suitable for both single-threaded and multi-threaded processors and for both single machine and multiple-node systems. Numerical experiments are performed to demonstrate the potential performance gain in different settings. Comparison is also made across two commonly used platforms-R and MATLAB. For illustration, a fairly general mixture model is used in the comparison.
Traditionally, the Gaussian assumption, implied by the Wiener process, is widely admitted for modeling degradation processes. However, when degradation data exhibit heavy tails, this assumption is not suitable. To ove...
详细信息
Traditionally, the Gaussian assumption, implied by the Wiener process, is widely admitted for modeling degradation processes. However, when degradation data exhibit heavy tails, this assumption is not suitable. To overcome this limitation, this article proposes a novel class of tail-weighted multivariate degradation model, which is built upon Student-t process. The model is able to account for both between-unit variability and process dependency, while allowing the adjustment of tail heaviness through tuning the parameter of the degree of freedom. For reliability assessment, we derive the system reliability function and present an efficient Monte Carlo method for its evaluation. Further, we introduce an expectation-maximization algorithm for parameter estimation and design a bootstrap method for interval estimation. Comprehensive simulation studies are conducted to validate the effectiveness of the inference method. Finally, the proposed methodology is applied to analyze two real-world degradation datasets.
Heterogeneity among patients commonly exists in clinical studies and leads to challenges in medical research. It is widely accepted that there exist various sub -types in the population and they are distinct from each...
详细信息
Heterogeneity among patients commonly exists in clinical studies and leads to challenges in medical research. It is widely accepted that there exist various sub -types in the population and they are distinct from each other. The approach of identifying the sub -types and thus tailoring disease prevention and treatment is known as precision medicine. The mixture model is a classical statistical model to cluster the heterogeneous population into homogeneous subpopulations. However, for the highly heterogeneous population with multiple components, its parameter estimation and clustering results may be ambiguous due to the dependence of the em algorithm on the initial values. For sub -typing purposes, the finite mixture of regression models with concomitant variables is considered and a novel statistical method is proposed to identify the main components with large proportions in the mixture sequentially. Compared to existing typical statistical inferences, the new method not only requires no pre -specification on the number of components for model fitting, but also provides more reliable parameter estimation and clustering results. Simulation studies demonstrated the superiority of the proposed method. Real data analysis on the drug response prediction illustrated its reliability in the parameter estimation and capability to identify the important subgroup.
In the current state data, each individual is observed only once, and the only available information is whether the failure event of interest occured during the observation time. In other words, the current state data...
详细信息
ISBN:
(数字)9781665482905
ISBN:
(纸本)9781665482905
In the current state data, each individual is observed only once, and the only available information is whether the failure event of interest occured during the observation time. In other words, the current state data cannot observe any individual's specific survival time or the failure time, therefore, it is significant different from the normal right-censored data. In this paper, we use the Cox model to construct the model of interested failure time and observation time, because the model contains not only regression coefficient of finite dimension, but also the unknown function of infinite dimension, and there are covariables which cannot be observed, so it is difficult to directly maximize the likelihood function. Therefore, the non-observable latent variable is introduced to describe the dependence of two kinds of time, the step function is used to approximate the unknown function to reduce the difficulty of non-parametric part, further the parameter estimation is given by the em algorithm, the consistency and asymptotic of the estimators are also certified. Some data simulations are performed, whose results show that the method presented here performed well under a limited sample. In the following paper, a group of mouse experiments demonstrating that the sterile environment has no significant effect on tumor inhibition. This paper only considered the current state data and the Cox model, In the futher, the statistical inference problem under other more general and more complex models can be further considered.
The new model class of mixtures of generalised nonlinear models (GNMs) is introduced. The model is specified, identifiability issues discussed, the fitting in a maximum likelihood framework using the expectation-maxim...
详细信息
The new model class of mixtures of generalised nonlinear models (GNMs) is introduced. The model is specified, identifiability issues discussed, the fitting in a maximum likelihood framework using the expectation-maximisation (em) algorithm outlined and an appropri-ate computational implementation introduced. The new model class is applied to capture cross-country heterogeneity when considering the augmented Solow model including hu-man capital accumulation as underlying model structure. The inherent heterogeneity is attributed to multiple regimes being present within the selected country data set. The re-sults highlight that country-specific differences lead to distinct components. Countries be-longing to the same component exhibit convergence to a homogeneous steady state. The components differ in the initial technological endowment and the contribution of the eco-nomic variables to economic growth.(c) 2021 EcoSta Econometrics and Statistics. Published by Elsevier B.V. All rights reserved.
Count data is a type of data derived from the number of times an event occurs per unit of time, and zero-truncated count data refers to count data without zero, which often appears in various fields. In this paper, a ...
详细信息
Count data is a type of data derived from the number of times an event occurs per unit of time, and zero-truncated count data refers to count data without zero, which often appears in various fields. In this paper, a new zero-truncated Bell (ZTBell) distribution is proposed on the basis of Bell distribution. We studied its statistical properties, exploring methods such as maximum likelihood estimation (MLE), expectation-maximization (em) algorithm, and minimization-maximization (MM) algorithm for parameter estimation, as well as conducting likelihood ratio tests. In addition, we used the Bootstrap method to calculate the standard errors and confidence intervals of the parameters. The simulation results found that all of the MLE, MM algorithm and em algorithm are effective. And, as the sample size increases, the estimates of the parameters are closer to the true values and the root mean square error is smaller. Finally, applying the model to a set of factory accident data, we found that the ZTBell distribution fits better than the other models and is close to the fitting results of the zero-truncated generalized Poisson distribution. But ZTBell distribution has only one parameter, so it's even simpler compared to the latter. Therefore, the ZTBell distribution can be a good alternative to other zero-truncated distributions, which provides more options available for statistical analysis in this domain.
Model error covariances play a central role in the performance of data assimilation methods applied to nonlinear state-space models. However, these covariances are largely unknown in most of the applications. A misspe...
详细信息
Model error covariances play a central role in the performance of data assimilation methods applied to nonlinear state-space models. However, these covariances are largely unknown in most of the applications. A misspecification of the model error covariance has a strong impact on the computation of the posterior probability density function, leading to unreliable estimations and even to a total failure of the assimilation procedure. In this work, we propose the combination of the expectation maximization (em) algorithm with an efficient particle filter to estimate the model error covariance using a batch of observations. Based on the em algorithm principles, the proposed method encompasses two stages: the expectation stage, in which a particle filter is used with the present updated value of the model error covariance as given to find the probability density function that maximizes the likelihood, followed by a maximization stage, in which the expectation under the probability density function found in the expectation step is maximized as a function of the elements of the model error covariance. This novel algorithm here presented combines the em algorithm with a fixed point algorithm and does not require a particle smoother to approximate the posterior densities. We demonstrate that the new method accurately and efficiently solves the linear model problem. Furthermore, for the chaotic nonlinear Lorenz-96 model the method is stable even for observation error covariance 10 times larger than the estimated model error covariance matrix and also is successful in moderately large dimensional situations where the dimension of the estimated matrix is 40 x 40.
With the widespread abuse of information technology, pornographic and gambling websites develop rapidly. They affect the physical and mental health of children and endanger personal property. Therefore, it is necessar...
详细信息
ISBN:
(纸本)9781665494250
With the widespread abuse of information technology, pornographic and gambling websites develop rapidly. They affect the physical and mental health of children and endanger personal property. Therefore, it is necessary to detect them. However, the existing detection methods ignored that imperfect datasets are common in the scenario of pornographic and gambling websites which are hence adverse to the detection. Those imperfections specifically include sparse samples, mismatch and imbalanced datasets. In addition, over-reliance on visual features incurred high overhead. To overcome these shortcomings, we innovatively propose a lightweight graph-based method to detect pornographic and gambling websites through semi-supervised learning of textual content. The semi-supervised learning is to solve sparse samples and mismatch datasets, while the graph-based approach can combine the semi-supervised part with community discovery to deal with imbalanced datasets. Specifically, we perform the detection process with the utilization of modified TF-IDF and Louvain during the iteration and updating by the em algorithm. The experimental results show that our method achieves the best 92.01% Macro-Avg-F1 with the shortest CPU time and outperforms all baselines. We also illustrate that the designed components in our model do contribute to the detection.
暂无评论