版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Sanofi Aventis Biostat & Programming Bridgewater NJ 08807 USA Univ Michigan Dept Biostat Ann Arbor MI 48109 USA Univ Michigan Dept Pathol & Urol Ann Arbor MI 48109 USA
出 版 物:《BIOINFORMATICS》 (生物信息学)
年 卷 期:2006年第22卷第21期
页 面:2635-2642页
核心收录:
学科分类:0710[理学-生物学] 08[工学] 0714[理学-统计学(可授理学、经济学学位)] 0836[工学-生物工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:NIGMS NIH HHS [R01 GM072007 GM72007 R01 GM072007-04] Funding Source: Medline
主 题:Artificial Intelligence Computer Simulation Diagnosis Computer-Assisted Discriminant Analysis Gene Expression Profiling Humans Linear Models Models Genetic Multivariate Analysis Neoplasm Proteins Neoplasms Oligonucleotide Array Sequence Analysis Pattern Recognition Automated Reproducibility of Results Sensitivity and Specificity Tumor Markers Biological
摘 要:Motivation: The nearest shrunken centroids classifier has become a popular algorithm in tumor classification problems using gene expression microarray data. Feature selection is an embedded part of the method to select top-ranking genes based on a univariate distance statistic calculated for each gene individually. The univariate statistics summarize gene expression profiles outside of the gene co-regulation network context, leading to redundant information being included in the selection procedure. Results: We propose an Eigengene-based Linear Discriminant Analysis (ELDA) to address gene selection in a multivariate framework. The algorithm uses a modified rotated Spectral Decomposition (SpD) technique to select hub genes that associate with the most important eigenvectors. Using three benchmark cancer microarray datasets, we show that ELDA selects the most characteristic genes, leading to substantially smaller classifiers than the univariate feature selection based analogues. The resulting de-correlated expression profiles make the gene-wise independence assumption more realistic and applicable for the shrunken centroids classifier and other diagonal linear discriminant type of models. Our algorithm further incorporates a misclassification cost matrix, allowing differential penalization of one type of error over another. In the breast cancer data, we show false negative prognosis can be controlled via a cost-adjusted discriminant function.