The completeness of gene expression data is essential to many gene expression data analysis issues. In this paper, inspired by the idea of semi-supervised learning with tri-training, a hybrid iterative imputation meth...
详细信息
ISBN:
(纸本)9781509016129
The completeness of gene expression data is essential to many gene expression data analysis issues. In this paper, inspired by the idea of semi-supervised learning with tri-training, a hybrid iterative imputation method called tri-imputation is proposed to estimate the missingvalues in gene expression data. In detail, in each round of tri-imputation, any two imputation methods are collaborating with each other to firstly estimate an initial imputation value, and then to be applied to the rest imputation method for providing different available information. Finally, all these three results are combined with their respective pre-trained confidence values. Experimental results on real microarray matrices indicate that tri-imputation achieves more accurate estimation for missingvalues in terms of the lowest normalized root-mean-square error.
Data generated from microarray experiments often suffer from missingvalues. As most downstream analyses need full matrices as input, these missingvalues have to be estimated. Bayesian principal component analysis (B...
详细信息
Data generated from microarray experiments often suffer from missingvalues. As most downstream analyses need full matrices as input, these missingvalues have to be estimated. Bayesian principal component analysis (BPCA) is a well-known microarray missing value estimation method, but its performance is not satisfactory on datasets with strong local similarity structure. A bicluster-based BPCA (bi-BPCA) method is proposed in this paper to fully exploit local structure of the matrix. In a bicluster, the most correlated genes and experimental conditions with the missing entry are identified, and BPCA is conducted on these biclusters to estimate the missingvalues. An automatic parameter learning scheme is also developed to obtain optimal parameters. Experimental results on four real microarray matrices indicate that bi-BPCA obtains the lowest normalized root-mean-square error on 82.14% of all missing rates.
暂无评论