Accurate estimation of fractional vegetation cover (FVC) from digital images taken by commercially available cameras is of great significance in order to monitor the vegetation growth status, especially when plants ar...
详细信息
Accurate estimation of fractional vegetation cover (FVC) from digital images taken by commercially available cameras is of great significance in order to monitor the vegetation growth status, especially when plants are under water stress. Two classic threshold-based methods, namely, the intersection method (T-1 method) and the equal misclassification probability method (T-2 method), have been widely applied to Red-Green-Blue (RGB) images. However, the high coverage and severe water stress of crops in the field make it difficult to extract FVC stably and accurately. To solve this problem, this paper proposes a fixed-threshold method based on the statistical analysis of thresholds obtained from the two classic threshold approaches. Firstly, a Gaussian mixture model (GMM), including the distributions of green vegetation and backgrounds, was fitted on four color features: excessive green index, H channel of the Hue-Saturation-Value (HSV) color space, a* channel of the CIE L*a*b* color space, and the brightness-enhanced a* channel (denoted as a*_I). Secondly, thresholds were calculated by applying the T-1 and T-2 methods to the GMM of each color feature. Thirdly, based on the statistical analysis of the thresholds with better performance between T-1 and T-2, the fixed-threshold method was proposed. Finally, the fixed-threshold method was applied to the optimal color feature a*_I to estimate FVC, and was compared with the two classic approaches. Results showed that, for some images with high reference FVC, FVC was seriously underestimated by 0.128 and 0.141 when using the T-1 and T-2 methods, respectively, but this problem was eliminated by the proposed fixed-threshold method. Compared with the T-1 and T-2 methods, for images taken in plots under severe water stress, the mean absolute error of FVC obtained by the fixed-threshold method was decreased by 0.043 and 0.193, respectively. Overall, the FVC estimation using the proposed fixed-threshold method has the advantages o
The human gut microbiome is one of the fundamental components of our physiology, and exploring the relationship between biological and environmental covariates and the resulting taxonomic composition of a given microb...
详细信息
The human gut microbiome is one of the fundamental components of our physiology, and exploring the relationship between biological and environmental covariates and the resulting taxonomic composition of a given microbial community is an active area of research. Previously, a Dirichlet-multinomial regression framework has been suggested to model this relationship, but it did not account for any underlying latent group structure. An underlying group structure of guts (such as enterotypes) has been observed across gut microbiome samples in which guts in the same group share similar biota compositions. In the paper, a finite mixture of Dirichlet-multinomial regression models is proposed that accounts for this underlying group structure and to allow for a probabilistic investigation of the relationship between bacterial abundance and biological and/or environmental covariates within each inferred group. Furthermore, finite mixtures of regression models which incorporate the concomitant effect of the covariates on the resulting mixing proportions are also proposed and examined within the Dirichlet-multinomial framework. We utilize the proposed mixture model to gain insight on underlying subgroups in a microbiome data set comprising tumour and healthy samples and the relationships between covariates and microbial abundance in those subgroups.
In a two-mode network, the nodes are divided into two types (primary nodes and secondary nodes), and connections exist only between nodes of different types. In reality, in such a two-mode network, one-mode network co...
详细信息
In a two-mode network, the nodes are divided into two types (primary nodes and secondary nodes), and connections exist only between nodes of different types. In reality, in such a two-mode network, one-mode network connections may also exist among primary nodes, and these two kinds of networks are usually not independent and coexistent. In this paper, we first propose a group Rasch mixture network model that focuses on the connections between primary nodes and secondary nodes, while incorporating the group structure and linkage information of primary nodes. We then develop a modified expectation-maximization algorithm to estimate the proposed model with a lambda-BIC method for selecting the tuning parameter. Additionally, we provide a likelihood-ratio test statistic to examine whether the two kinds of networks are independent and implement the leave-one-out method to construct a network prediction rule. Finally, we establish asymptotic results and demonstrate the numerical performance of the proposed methods using both simulations and the *** dataset.
We give convergence guarantees for estimating the coefficients of a symmetric mixture of two linear regressions by expectationmaximization (EM). In particular, we show that the empirical EM iterates converge to the t...
详细信息
We give convergence guarantees for estimating the coefficients of a symmetric mixture of two linear regressions by expectationmaximization (EM). In particular, we show that the empirical EM iterates converge to the target parameter vector at the parametric rate, provided the algorithm is initialized in an unbounded cone. In particular, if the initial guess has a sufficiently large cosine angle with the target parameter vector, a sample-splitting version of the EM algorithm converges to the true coefficient vector with high probability. Interestingly, our analysis borrows from tools used in the problem of estimating the centers of a symmetric mixture of two Gaussians by EM. We also show that the population of EM operator for mixtures of two regressions is anti-contractive from the target parameter vector if the cosine angle between the input vector and the target parameter vector is too small, thereby establishing the necessity of our conic condition. Finally, we give empirical evidence supporting this theoretical observation, which suggests that the sample-based EM algorithm may not converge to the target vector when initial guesses are drawn accordingly. Our simulation study also suggests that the EM algorithm performs well even under model misspecification (i.e., when the covariate and error distributions violate the model assumptions).
The expectation-maximization (EM) algorithm is a popular tool in a wide variety of statistical settings, in particular in the maximum likelihood estimation of parameters when clustering using mixture models. A serious...
详细信息
The expectation-maximization (EM) algorithm is a popular tool in a wide variety of statistical settings, in particular in the maximum likelihood estimation of parameters when clustering using mixture models. A serious pitfall is that in the case of a multimodal likelihood function the algorithm may become trapped at a local maximum, resulting in an inferior clustering solution. In addition, convergence to an optimal solution can be very slow. Methods are proposed to address these issues: optimizing starting values for the algorithm and targeting maximization steps efficiently. It is demonstrated that these approaches can produce superior outcomes to initialization via random starts or hierarchical clustering and that the rate of convergence to an optimal solution can be greatly improved. (C) 2012 Elsevier B.V. All rights reserved.
The four-parameter logistic model (4PLM) has recently attracted much interest in various applications. Motivated by recent studies that re-express the four-parameter model as a mixture model with two levels of latent ...
详细信息
The four-parameter logistic model (4PLM) has recently attracted much interest in various applications. Motivated by recent studies that re-express the four-parameter model as a mixture model with two levels of latent variables, this paper develops a new expectation-maximization (EM) algorithm for marginalized maximum a posteriori estimation of the 4PLM parameters. The mixture modelling framework of the 4PLM not only makes the proposed EM algorithm easier to implement in practice, but also provides a natural connection with popular cognitive diagnosis models. Simulation studies were conducted to show the good performance of the proposed estimation method and to investigate the impact of the additional upper asymptote parameter on the estimation of other parameters. Moreover, a real data set was analysed using the 4PLM to show its improved performance over the three-parameter logistic model.
In this article, I provide an illustrative, step-by-step implementation of the expectation-maximization algorithm for the nonparametric estimation of mixed logit models. In particular, the proposed routine allows user...
详细信息
In this article, I provide an illustrative, step-by-step implementation of the expectation-maximization algorithm for the nonparametric estimation of mixed logit models. In particular, the proposed routine allows users to fit straight-forwardly latent-class logit models with an increasing number of mass points so as to approximate the unobserved structure of the mixing distribution.
Bathymetry is a key element in the modeling of river systems for flood mapping, geomorphology, or stream habitat characterization. Standard practices rely on the interpolation of in situ depth measurements obtained wi...
详细信息
Bathymetry is a key element in the modeling of river systems for flood mapping, geomorphology, or stream habitat characterization. Standard practices rely on the interpolation of in situ depth measurements obtained with differential GPS or total station surveys, while more advanced techniques involve bathymetric LiDAR or acoustic soundings. However, these high-resolution active techniques are not so easily applied over large areas. Alternative methods using passive optical imagery present an interesting trade-off: they rely on the fact that wavelengths composing solar radiation are not attenuated at the same rates in water. Under certain assumptions, the logarithm of the ratio of radiances in two spectral bands is linearly correlated with depth. In this study, we go beyond these ratio methods in defining a multispectral hue that retains all spectral information. Given n coregistered bands, this spectral invariant lies on the (n-2)-sphere embedded in Rn-1, denoted Sn-2 and tagged 'hue hypersphere'. It can be seen as a generalization of the RGB 'color wheel' (S1) in higher dimensions. We use this mapping to identify a hue-depth relation in a 35 km reach of the Garonne River, using high resolution (0.50 m) airborne imagery in four bands and data from 120 surveyed cross-sections. The distribution of multispectral hue over river pixels is modeled as a mixture of two components: one component represents the distribution of substrate hue, while the other represents the distribution of 'deep water' hue;parameters are fitted such that membership probability for the 'deep' component correlates with depth.
Structure function, which quantitatively represents the relation between system states and unit states, is essential for system reliability assessment and oftentimes may not be known in advance due to complicated inte...
详细信息
Structure function, which quantitatively represents the relation between system states and unit states, is essential for system reliability assessment and oftentimes may not be known in advance due to complicated interactions among units. In this article, a dynamic Bayesian network (DBN) model is put forth to leverage incomplete observation sequences of hierarchical multi-state systems for structure function learning. To achieve a consistent structure function at different time instants, a customized expectation-maximization (EM) algorithm with parameter modularization is proposed and executed by two steps: (1) filling the missing values in the incomplete observation sequences with their expectations to break the dependencies among nodes;(2) decomposing the graphical network into V-shape structures, and then integrating the identical V-shape structures at different time slices to learn the parameters in the DBN model. Based on the learned DBN model, system state distribution and reliability function over time can be readily assessed. Two illustrative examples are presented and the results demonstrate that the structure function of a hierarchical multi-state system can be accurately learned despite the incompleteness of observation sequences.
Bayesian methods have been extended for the linear system identification problem in the past ten years. The traditional Bayesian identification selects a Gaussian prior and considers the tuning of kernels, i.e., the c...
详细信息
Bayesian methods have been extended for the linear system identification problem in the past ten years. The traditional Bayesian identification selects a Gaussian prior and considers the tuning of kernels, i.e., the covariance matrix of a Gaussian prior. However, Gaussian priors cannot express the system information appropriately for identifying a positive finite impulse response (FIR) model. This paper exploits the truncated Gaussian prior and develops Bayesian identification procedures for positive FIR models. The proposed parameterizations in the truncated Gaussian prior can reflect the decay rate and the correlation of the impulse response of the system to be identified. The expectation-maximization (EM) algorithm is tailored to the hyperparameter estimation problem of positive system identification with the truncated Gaussian prior. Numerical experiments compare the truncated Gaussian prior to the traditional Gaussian prior for positive FIR system identification. The simulation results demonstrate that the truncated Gaussian prior outperforms the Gaussian prior. (C) 2020 Elsevier B.V. All rights reserved.
暂无评论