In this paper, a two-stage unsupervised segmentation approach based on ensemble clustering is proposed to extract the focused regions from low depth-of-field (DOE) images. The first stage is to cluster image blocks in...
详细信息
In this paper, a two-stage unsupervised segmentation approach based on ensemble clustering is proposed to extract the focused regions from low depth-of-field (DOE) images. The first stage is to cluster image blocks in a joint contrast-energy feature space into three constituent groups. To achieve this, we make use of a normal mixture-based model along with standard expectation-maximization (EM) algorithm at two consecutive levels of block size. To avoid the common problem of local optima experienced in many models, an ensemble EM clustering algorithm is proposed. As a result, relevant blocks closely conforming to image objects are extracted. In stage two, a binary saliency map is constructed from the relevant blocks at the pixel level, which is based on difference of Gaussian (DOG) and binarization methods. Then, a set of morphological operations is employed to create the region-of-interest (ROI) from the map. Experimental results demonstrate that the proposed approach achieves an F-measure of 91.3% and is computationally 3 times faster than the existing state-of-the-art approach. (C) 2013 Elsevier Ltd. All rights reserved.
In this paper, we propose a new approach for a block-based lossless image compression using finite mixture models and adaptive arithmetic coding. Conventional arithmetic encoders encode and decode images sample-by-sam...
详细信息
In this paper, we propose a new approach for a block-based lossless image compression using finite mixture models and adaptive arithmetic coding. Conventional arithmetic encoders encode and decode images sample-by-sample in raster scan order. In addition, conventional arithmetic coding models provide the probability distribution for whole source symbols to be compressed or transmitted, including static and adaptive models. However, in the proposed scheme, an image is divided into non-overlapping blocks and then each block is encoded separately by using arithmetic coding. The proposed model provides a probability distribution for each block which is modeled by a mixture of non-parametric distributions by exploiting the high correlation between neighboring blocks. The expectation-maximization algorithm is used to find the maximum likelihood mixture parameters in order to maximize the arithmetic coding compression efficiency. The results of comparative experiments show that we provide significant improvements over the state-of-the-art lossless image compression standards and algorithms. In addition, experimental results show that the proposed compression algorithm beats JPEG-LS by 9.7 % when switching between pixel and prediction error domains.
Driver support systems (DSS) of intelligent vehicles will predict potentially dangerous situations in heavy traffic, help with navigation and vehicle guidance and interact with a human driver. Important information ne...
详细信息
Driver support systems (DSS) of intelligent vehicles will predict potentially dangerous situations in heavy traffic, help with navigation and vehicle guidance and interact with a human driver. Important information necessary for traffic situation understanding is presented by road signs. A new kernel rule has been developed for road sign classification using the Laplace probability density. Smoothing parameters of the Laplace kernel are optimized by the pseudo-likelihood cross-validation method. To maximize the pseudo-likelihood function, an expectation-maximization algorithm is used. The algorithm has been tested on a dataset with more than 4900 noisy images. A comparison to other classification methods is also given. (C) 2000 Elsevier Science B.V. All rights reserved.
One goal of human microbiome studies is to relate host traits with human microbiome compositions. The analysis of microbial community sequencing data presents great statistical challenges, especially when the samples ...
详细信息
One goal of human microbiome studies is to relate host traits with human microbiome compositions. The analysis of microbial community sequencing data presents great statistical challenges, especially when the samples have different library sizes and the data are overdispersed with many zeros. To address these challenges, we introduce a new statistical framework, called predictive analysis in metagenomics via inverse regression (PAMIR), to analyze microbiome sequencing data. Within this framework, an inverse regression model is developed for overdispersed microbiota counts given the trait, and then a prediction rule is constructed by taking advantage of the dimension-reduction structure in the model. An efficient Monte Carlo expectation-maximization algorithm is proposed for maximum likelihood estimation. The method is further generalized to accommodate other types of covariates. We demonstrate the advantages of PAMIR through simulations and two real data examples.
In this paper, we propose a semiparametric regression model that is built upon an isotonic regression model with the assumption that the random error follows a skewed distribution. We develop an expectation-maximizati...
详细信息
In this paper, we propose a semiparametric regression model that is built upon an isotonic regression model with the assumption that the random error follows a skewed distribution. We develop an expectation-maximization algorithm for obtaining the maximum likelihood estimates of the model parameters, examine the asymptotic properties of the estimators, conduct simulation studies to explore the performance of the proposed model, and apply the method to evaluate the DNA-RNA-protein relationship and identify genes that are key factors in tumor progression.
Multi-level nonlinear mixed effects (ML-NLME) models have received a great deal of attention in recent years because of the flexibility they offer in handling the repeated-measures data arising from various discipline...
详细信息
Multi-level nonlinear mixed effects (ML-NLME) models have received a great deal of attention in recent years because of the flexibility they offer in handling the repeated-measures data arising from various disciplines. In this study, we propose both maximum likelihood and restricted maximum likelihood estimations of ML-NLME models with two-level random effects, using first order conditional expansion (FOCE) and the expectation-maximization (EM) algorithm. The FOCE EM algorithm was compared with the most popular Lindstrom and Bates (LB) method in terms of computational and statistical properties. Basal area growth series data measured from Chinese fir (Cunninghamia lanceolata) experimental stands and simulated data were used for evaluation. The FOCE EM and LB algorithms given the same parameter estimates and fit statistics for models that converged by both. However, FOCE EM converged for all the models, while LB did not, especially for the models in which two-level random effects are simultaneously considered in several base parameters to account for between-group variation. We recommend the use of FOCE EM in ML-NLME models, particularly when convergence is a concern in model selection. (C) 2013 Elsevier B.V. All rights reserved.
Background: Copy number variations (CNVs), including amplification and deletion, are alterations of DNA copy number compared to a reference genome. CNVs play a crucial role in tumourigenesis and progression, including...
详细信息
Background: Copy number variations (CNVs), including amplification and deletion, are alterations of DNA copy number compared to a reference genome. CNVs play a crucial role in tumourigenesis and progression, including amplification of oncogenes and deletion of tumor suppressor genes that may significantly increase the risk of cancer. CNVs are also reported to be closely related with non-cancer diseases, such as Down syndrome, Parkinson disease, and Alzheimer disease. Objective: Whole-exome sequencing (WES) has been successfully applied to the discovery of gene mutations as well as clinical diagnosis. But it is quite challenging to evaluate the copy number using WES data due to read depth bias, exons' distribution pattern and normal cell contamination. Our aim is develop an efficient method to overcome these challenges and detect CNVs using WES data. Method: In this study, we present ExomeHMM, a hidden Markov model (HMM) based CNV detecting algorithm. ExomeHMM exploits relative read depth, a ratio based signal, to mitigate read depth distortion and employs exponential attenuated transition matrix to handle sparsely and non-uniformly distributed exons. expectation-maximization algorithm is used to optimize parameters for the proposed model. Finally, we use standard Viterbi algorithm to infer the copy number of exons. Results: Using previously identified CNVs in 1000 Genome Project data as golden standard, ExomeHMM achieves the highest F-score among the four methods compared in this study. When applied to triple-negative breast cancer data, ExomeHMM is capable to find abnormal genes that are significantly associated with breast cancer. Conclusion: In conclusion, ExomeHMM is a suitable tool for CNV detections in both healthy samples as well as clinic tumor samples on whole-exome sequencing data.
Rough clustering is one of the principal research areas in data mining, machine learning, pattern recognition, and bioinformatics. Among different variants of rough clustering, rough-probabilistic clustering is a new ...
详细信息
Rough clustering is one of the principal research areas in data mining, machine learning, pattern recognition, and bioinformatics. Among different variants of rough clustering, rough-probabilistic clustering is a new concept introduced recently. In rough-probabilistic clustering, a class is defined as the union of two disjoint regions, namely, a crisp lower approximation region and a probabilistic boundary region. In this regard, stomped normal (SN) distribution provides a statistical modeling of the data set in rough-probabilistic clustering framework. The SN distribution models the central tendency, dispersion, and width of the lower approximation region of each class using its mean, variance, and width parameter, respectively. However, it does not take into consideration the property of kurtosis of the class distribution, which controls the concentration of values around mean and shape of the tail of data distribution. In this background, a novel probability distribution, named stomped-t (St) distribution, is introduced in the paper for rough-probabilistic clustering. The proposed probability distribution incorporates the property of kurtosis into the SN framework. The proposed St probability distribution is then integrated within the rough-probabilistic clustering framework for precise and robust clustering of the data. The efficacy of the proposed clustering algorithm is demonstrated for unsupervised data clustering and image segmentation problems, along with a comparative performance analysis with related algorithms. (C) 2017 Elsevier Inc. All rights reserved.
In cancer research, interest frequently centers on factors influencing a latent event that must precede a terminal event. In practice it is often impossible to observe the latent event precisely, making inference abou...
详细信息
In cancer research, interest frequently centers on factors influencing a latent event that must precede a terminal event. In practice it is often impossible to observe the latent event precisely, making inference about this process difficult. To address this problem, we propose a joint model for the unobserved time to the latent and terminal events, with the two events linked by the baseline hazard. Covariates enter the model parametrically as linear combinations that multiply, respectively, the hazard for the latent event and the hazard for the terminal event conditional on the latent one. We derive the partial likelihood estimators for this problem assuming the latent event is observed, and propose a profile likelihood-based method for estimation when the latent event is unobserved. The baseline hazard in this case is estimated nonparametrically using the EM algorithm, which allows for closed-form Breslow-type estimators at each iteration, bringing improved computational efficiency and stability compared with maximizing the marginal likelihood directly. We present simulation studies to illustrate the finite-sample properties of the method;its use in practice is demonstrated in the analysis of a prostate cancer data set.
The evaluation of nursing homes is usually based on the administration of questionnaires made of a large number of polytomous items to their patients. In such a context, the latent class model represents a useful tool...
详细信息
The evaluation of nursing homes is usually based on the administration of questionnaires made of a large number of polytomous items to their patients. In such a context, the latent class model represents a useful tool for clustering subjects in homogenous groups corresponding to different degrees of impairment of the health conditions. It is known that the performance of model-based clustering and the accuracy of the choice of the number of latent classes may be affected by the presence of irrelevant or noise variables. In this paper, we show the application of an item selection algorithm to a dataset collected within a project, named ULISSE, on the quality-of-life of elderly patients hosted in Italian nursing homes. This algorithm, which is closely related to that proposed by Dean and Raftery in 2010, is aimed at finding the subset of items which provides the best clustering according to the Bayesian Information Criterion. At the same time, it allows us to select the optimal number of latent classes. Given the complexity of the ULISSE study, we perform a validation of the results by means of a sensitivity analysis, with respect to different specifications of the initial subset of items, and of a resampling procedure.
暂无评论