The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. O...
详细信息
The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scaled Mixture Models (Spectral-GSMMs) or Spectral Non-negative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problems is that using a classical expectation-maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining different probabilistic models that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can significantly improve the BSS performance compared to the state-of-the-art approaches. (c) 2012 Elsevier B.V. All rights reserved.
We use direct numerical simulation data to study the identification of coherent vortical structures that generate strong scalar flux at the free surface of an open-channel turbulent flow. Using conventional conditiona...
详细信息
We use direct numerical simulation data to study the identification of coherent vortical structures that generate strong scalar flux at the free surface of an open-channel turbulent flow. Using conventional conditional averaging of events with strong scalar surface flux or large vorticity components, we characterize the correlation of surface flux with a variety of subsurface vortical structures. We then present a clustering method based on the expectation-maximization algorithm which is shown to be effective in identifying dominant turbulence structure patterns. Using this method, clustering modes are obtained for different characteristic vorticity distributions on spanwise and streamwise vertical planes. It is found that each clustering mode can be constructed by a linear combination of a small number of enstrophy-containing eigenvectors obtained by proper orthogonal decomposition (POD). Compared with the POD eigenvectors, the clustering modes have a more direct correspondence to the turbulence structures in physical space. It is shown that ring-like and asymmetric cane vortices are the dominant vortical structures related to strong scalar surface flux in open-channel flow. The clustering method is general and can also be used for other types of flows and for applications beyond interfacial scalar transport. (C) 2012 Elsevier Ltd. All rights reserved.
In this study, we applied Bayesian networks to prioritize the factors that influence hazardous material (Hazmat) transportation accidents. The Bayesian network structure was built based on expert knowledge using Demps...
详细信息
In this study, we applied Bayesian networks to prioritize the factors that influence hazardous material (Hazmat) transportation accidents. The Bayesian network structure was built based on expert knowledge using Dempster-Shafer evidence theory, and the structure was modified based on a test for conditional independence. We collected and analyzed 94 cases of Chinese Hazmat transportation accidents to compute the posterior probability of each factor using the expectation-maximization learning algorithm. We found that the three most influential factors in Hazmat transportation accidents were human factors, the transport vehicle and facilities, and packing and loading of the Hazmat. These findings provide an empirically supported theoretical basis for Hazmat transportation corporations to take corrective and preventative measures to reduce the risk of accidents. (C) 2011 Elsevier Ltd. All rights reserved.
In this paper we derive the maximum likelihood problem for missing data from a Gaussian model. We present in total eight different equivalent formulations of the resulting optimization problem, four out of which are n...
详细信息
In this paper we derive the maximum likelihood problem for missing data from a Gaussian model. We present in total eight different equivalent formulations of the resulting optimization problem, four out of which are nonlinear least squares formulations. Among these formulations are also formulations based on the expectation-maximization algorithm. Expressions for the derivatives needed in order to solve the optimization problems are presented. We also present numerical comparisons for two of the formulations for an ARMAX model. (C) 2012 Elsevier Ltd. All rights reserved.
The creation of semantically relevant clusters is vital in bag-of-visual words models which are known to be very successful to achieve image classification tasks. Generally, unsupervised clustering algorithms, such as...
详细信息
The creation of semantically relevant clusters is vital in bag-of-visual words models which are known to be very successful to achieve image classification tasks. Generally, unsupervised clustering algorithms, such as K-means, are employed to create such clusters from which visual dictionaries are deduced. K-means achieves a hard assignment by associating each image descriptor to the cluster with the nearest mean. By this way, the within-cluster sum of squares of distances is minimized. A limitation of this approach in the context of image classification is that it usually does not use any supervision that limits the discriminative power of the resulting visual words (typically the centroids of the clusters). More recently, some supervised dictionary creation methods based on both supervised information and data fitting were proposed leading to more discriminative visual words. But, none of them consider the uncertainty present at both image descriptor and cluster levels. In this paper, we propose a supervised learning algorithm based on a Gaussian mixture model which not only generalizes the K-means algorithm by allowing soft assignments, but also exploits supervised information to improve the discriminative power of the clusters. Technically, our algorithm aims at optimizing, using an EM-based approach, a convex combination of two criteria: the first one is unsupervised and based on the likelihood of the training data;the second is supervised and takes into account the purity of the clusters. We show on two well-known datasets that our method is able to create more relevant clusters by comparing its behavior with the state of the art dictionary creation methods. (C) 2011 Elsevier Ltd. All rights reserved.
Left- and right-censored life time data arise naturally in one-shot device testing. An experimenter is often interested in identifying the effects of several stress variables on the lifetime of a device, and furthermo...
详细信息
Left- and right-censored life time data arise naturally in one-shot device testing. An experimenter is often interested in identifying the effects of several stress variables on the lifetime of a device, and furthermore multiple-stress experiments controlling simultaneously several variables, result in reducing the experimental time as well as the cost of the experiment. Here, we present an expectation-maximization (EM) algorithm for developing inference on the reliability at a specific time, as well as the mean lifetime of the device based on one-shot device testing data under the exponential distribution when there are multiple stress factors. We use the log-linear link function for this purpose. Unlike in the typical EM algorithm, it is not necessary to obtain maximum likelihood estimates (MLEs) of the parameters at each step of the iteration. By using the one-step Newton-Raphson method, we observe that the convergence occurs quickly. We also use the jackknife technique to reduce the bias of the estimate obtained from the EM algorithm. In addition, we discuss the construction of confidence intervals for some reliability characteristics by using the asymptotic properties of the MLEs based on the observed Fisher information matrix, as well as by the jackknife technique, the parametric bootstrap methods, and a transformation technique. Finally, we present an example to illustrate all the inferential methods developed here.
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without a...
详细信息
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial.
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers ca...
详细信息
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data, which translates to sparsity in a judiciously chosen domain. Leveraging sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
Multivariate outcomes are often measured longitudinally. For example, in hearing loss studies, hearing thresholds for each subject are measured repeatedly over time at several frequencies. Thus, each patient is associ...
详细信息
Multivariate outcomes are often measured longitudinally. For example, in hearing loss studies, hearing thresholds for each subject are measured repeatedly over time at several frequencies. Thus, each patient is associated with a multivariate longitudinal outcome. The multivariate mixed-effects model is a useful tool for the analysis of such data. There are situations in which the parameters of the model are subject to some restrictions or constraints. For example, it is known that hearing thresholds, at every frequency, increase with age. Moreover, this age-related threshold elevation is monotone in frequency, that is, the higher the frequency, the higher, on average, is the rate of threshold elevation. This means that there is a natural ordering among the different frequencies in the rate of hearing loss. In practice, this amounts to imposing a set of constraints on the different frequencies regression coefficients modeling the mean effect of time and age at entry to the study on hearing thresholds. The aforementioned constraints should be accounted for in the analysis. The result is a multivariate longitudinal model with restricted parameters. We propose estimation and testing procedures for such models. We show that ignoring the constraints may lead to misleading inferences regarding the direction and the magnitude of various effects. Moreover, simulations show that incorporating the constraints substantially improves the mean squared error of the estimates and the power of the tests. We used this methodology to analyze a real hearing loss study. Copyright (C) 2012 John Wiley & Sons, Ltd.
Inferring Granger-causal interactions between processes promises deeper insights into mechanisms underlying network phenomena, e.g. in the neurosciences where the level of connectivity in neural networks is of particu...
详细信息
Inferring Granger-causal interactions between processes promises deeper insights into mechanisms underlying network phenomena, e.g. in the neurosciences where the level of connectivity in neural networks is of particular interest. Renormalized partial directed coherence has been introduced as a means to investigate Granger causality in such multivariate systems. A major challenge in estimating respective coherences is a reliable parameter estimation of vector autoregressive processes. We discuss two shortcomings typical in relevant applications, i.e. non-stationarity of the processes generating the time series and contamination with observational noise. To overcome both, we present a new approach by combining renormalized partial directed coherence with state space modeling. A numerical efficient way to perform both the estimation as well as the statistical inference will be presented. (C) 2011 Elsevier B.V. All rights reserved.
暂无评论