Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we p...
详细信息
ISBN:
(纸本)9781728176055
Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such as independent low-rank matrix analysis, and estimates the source model parameters simultaneously. In the conventional RCSCME, a direction of the deficient basis is fixed in advance and only the scale is estimated;however, the candidate of this deficient basis is not unique in general. In the proposed RCSCM model, the deficient basis itself can be accurately estimated as a vector variable by solving a vector optimization problem. Also, we derive new update rules based on the em algorithm. We confirm that the proposed method outperforms conventional methods under several noise conditions.
As a generalization of the Poisson distribution and a common alternative to other discrete distributions, the Conway-Maxwell-Poisson (CMP) distribution has the flexibility to explicitly characterize data over- or unde...
详细信息
As a generalization of the Poisson distribution and a common alternative to other discrete distributions, the Conway-Maxwell-Poisson (CMP) distribution has the flexibility to explicitly characterize data over- or under-dispersion. The mean-parameterized version of the CMP has received increasing attention in the literature due to its ability to directly model the data mean. When the mean further depends on covariates, then the mean-parameterized CMP regression model can be treated in a generalized linear models framework. In this work, we propose a mixture of mean-parameterized CMP regressions model to apply on data which are potentially comprised of subpopulations with different conditional means and varying degrees of dispersions. An em algorithm is constructed to find maximum likelihood estimates of the model. A simulation study is performed to test the proposed mixture of mean-parameterized CMP regressions model, and to compare it to model fits using mixtures of Poisson regressions and mixtures of negative binomial regressions. We show the mixture of mean-parameterized CMP regressions to be a competitive model in analyzing two real datasets.
This manuscript estimates the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrice...
详细信息
The problem of classifying an observation into mixtures of two multivariate t-distributions is studied when the location parameters and covariance matrices are unknown. For mixtures of two multivariate t-distributions...
详细信息
The problem of classifying an observation into mixtures of two multivariate t-distributions is studied when the location parameters and covariance matrices are unknown. For mixtures of two multivariate t-distributions, we propose classification rules based on the maximum likelihood estimators and Bayes estimators of the parameters. The maximum penalized likelihood estimators of the parameters are derived using some informative penalty function. We derive shrinkage estimators of the covariance matrices using a regularized parameter and propose the corresponding classification rule. The kernel density-based rule is proposed considering the diagonal matrix as a bandwidth parameter. A simulation study is carried out to compare the rules in terms of the expected probability of misclassification. Applications of the rules are described using real data sets arising in clinical studies and stock market. (C) 2022 Elsevier B.V. All rights reserved.
This paper focuses on how to identify normal, derated power and abnormal data in operation data, which is key to intelligent operation and maintenance applications such as wind turbine condition diagnosis and performa...
详细信息
This paper focuses on how to identify normal, derated power and abnormal data in operation data, which is key to intelligent operation and maintenance applications such as wind turbine condition diagnosis and performance evaluation. Existing identification methods can distinguish normal data from the original data, but usually remove power curtailment data as outliers. A multi-Gaussian-discrete probability distribution model was used to characterize the joint probability distribution of wind speed and power from wind turbine SCADA data, taking the derated power of the wind turbine as a hidden random variable. The maximum expectation algorithm (em), an iterative algorithm derived from model parameters estimation, was applied to achieve the maximum likelihood estimation of the proposed probability model. According to the posterior probability of the wind-power scatter points, the normal, derated power and abnormal data in the wind turbine SCADA data were identified. The validity of the proposed method was verified by three wind turbine operational data sets with different distribution characteristics. The results are that the proposed method has a degree of universality with regard to derated power operational data with different distribution characteristics, and in particular, it is able to identify the operating data with clustered distribution effectively.
Motivated by a recent result of Daskalakis et al. (2018), we analyze the population version of Expectation-Maximization (em) algorithm for the case of truncated mixtures of two Gaussians. Truncated samples from a d-di...
详细信息
Motivated by a recent result of Daskalakis et al. (2018), we analyze the population version of Expectation-Maximization (em) algorithm for the case of truncated mixtures of two Gaussians. Truncated samples from a d-dimensional mixture of two Gaussians 1/2 N (mu, Sigma) + 1/2 N (-mu, Sigma) means that a sample is only revealed if it falls in some subset S subset of R-d of positive (Lebesgue) measure. We show that for d = 1, em converges almost surely (under random initialization) to the true mean (variance sigma(2) is known) for any measurable set S. Moreover, for d > 1 we show em almost surely converges to the true mean for any measurable set S when the map of em has only three fixed points, namely -mu, 0, mu (covariance matrix Sigma is known), and prove local convergence if there are more than three fixed points. We also provide convergence rates of our findings. Our techniques deviate from those of Daskalakis et al. (2017), which heavily depend on symmetry that the untruncated problem exhibits. For example, for an arbitrary measurable set S, it is impossible to compute a closed form of the update rule of em. Moreover, arbitrarily truncating the mixture, induces further correlations among the variables. We circumvent these challenges by using techniques from dynamical systems, probability and statistics;implicit function theorem, stability analysis around the fixed points of the update rule of em and correlation inequalities (FKG).
Accurate segmentation of brain magnetic resonance images is a key step in quantitative analysis of brain images. Finite mixture model is one of the most widely used methods in brain magnetic resonance image segmentati...
详细信息
Accurate segmentation of brain magnetic resonance images is a key step in quantitative analysis of brain images. Finite mixture model is one of the most widely used methods in brain magnetic resonance image segmentation. However, due to the presence of intensity inhomogeneity artifact and noise, the image his-togram distribution of brain MR images may follow a heavy tailed distribution or asymmetric distribution, which makes traditional finite mixture model, such as Gaussian mixture model, hard to achieve accurate segmentation results. To alleviate these problems, a novel spatially constrained finite skew student's-t mixture model is proposed in this paper. Firstly, we propose anisotropic two-level spatial information, which combines the prior and posterior probabilities, to reduce the impact of noise. The proposed spa-tial information can preserve rich details, such as edges and corners. Secondly, we couple the anisotropic spatial information into the skew student's-t distribution to fit the intensity distribution of observation data with heavy tail distribution or asymmetric distribution. Thirdly, we use a linear combination of a set of orthogonal basis functions to model the intensity inhomogeneities. Finally, the objective function integrates both tissue segmentation and the bias field estimation. In the implementation, we used an improved expectation maximization (em) algorithm to estimate the model parameters. The experimen-tal results of our model on synthetic data and brain magnetic resonance images are better than other state-of-the-art segmentation methods. (c) 2022 Elsevier Ltd. All rights reserved.
Customer behaviour within business processes can change over time, making it difficult for market understanding and decision making. Detecting such variations, also referred to as concept drift, can provide insight in...
详细信息
ISBN:
(纸本)9781665412360
Customer behaviour within business processes can change over time, making it difficult for market understanding and decision making. Detecting such variations, also referred to as concept drift, can provide insight into the evolution of the business environment, offer opportunities for model refinement and provide target-oriented services to improve customer satisfaction. Concept drift in the control-flow perspective has been extensively studied but there is a research gap in detecting process duration drift. In this paper, we use gamma mixture models (GMMs) with an expectation-maximization (em) algorithm to fit process durations and then detect variations in their histogram, density and cumulative distributions. Specifically, three metrics: the overall difference in back-to-back histograms, the Kullback-Leibler (KL) divergence and the maximum difference in cumulative distributions are used to evaluate how different the process durations are. Furthermore, three corresponding statistical tests: the multinomial test, log-likelihood ratio (LLR) test and Kolmogorov-Smirnov (KS) test are applied to determine whether, or not, the differences are statistically significant. The approach is applied to a public real-life hospital billing process where two concept drift occurrences are discovered. The main contribution of this paper is the approach aiming for detecting process duration changes.
Identifying the Markov jump systems accurately and rapidly is a challenging task due to the complexity of hidden state expectation exponentially increases along with the data length. This paper presents a special non-...
详细信息
Identifying the Markov jump systems accurately and rapidly is a challenging task due to the complexity of hidden state expectation exponentially increases along with the data length. This paper presents a special non-homogeneous and non-stationary linear Markov jump system with input control, where the hidden states are tractable, thus implementing optimal hidden state estimator is practical. A parameter identification algorithm relies on the optimal estimator and expectation-maximization (em) algorithm is proposed for this special model, meanwhile, the local optima problem of em is moderated via proper method. Numerical examples show the proposed algorithm can rapidly approximate the parameters that well describe the data, and outperforms other related approaches. (C) 2021 Published by Elsevier Ltd.
We consider a multiple hypotheses testing problem with directional alternatives in a decision theoretic framework. Considering non-symmetric alternative hypotheses, we show that the skewness in the alternatives permit...
详细信息
We consider a multiple hypotheses testing problem with directional alternatives in a decision theoretic framework. Considering non-symmetric alternative hypotheses, we show that the skewness in the alternatives permits the Bayes rule to make more correct discoveries than if the alternatives are symmetric. We obtain a Bayes rule under a Lebesgue prior (non-informative) subject to a constraint on mixed directional false discovery rate mdFDR <= alpha. The proposed Bayes rule is compared through simulation against rules proposed by Benjamini and Yekutieli (J Am Stat Assoc 100(469):71-80, 2005) and Efron (Ann Stat 35(4):1531-1377, 2007;J Am Stat Assoc 102(477):93-103, 2007). We illustrate the proposed methodology for two sets of data from biological experiments: HIV-transfected cell-line mRNA expression data and a quantitative trait genome-wide SNP data set.
暂无评论