Combining biology, computer science, mathematics, and statistics, the field of bioinformatics has become a hot new discipline with profound impacts on all aspects of biology and industrial application. Now, Computatio...
详细信息
ISBN:
(数字)9780470199091
ISBN:
(纸本)9780470105269
Combining biology, computer science, mathematics, and statistics, the field of bioinformatics has become a hot new discipline with profound impacts on all aspects of biology and industrial application. Now, Computational Intelligence in Bioinformatics offers an introduction to the topic, covering the most relevant and popular CI methods, while also encouraging the implementation of these methods to readers' research.
Over the last decade, gene expression microarrays have had a profound impact on biomedical research. The diversity of platforms and analytical methods available to researchers have made the comparison of data from mul...
详细信息
Over the last decade, gene expression microarrays have had a profound impact on biomedical research. The diversity of platforms and analytical methods available to researchers have made the comparison of data from multiple platforms challenging. In this study, we describe a framework for comparisons across platforms and laboratories. We have attempted to include nearly all the available commercial and 'in-house' platforms. Using probe sequences matched at the exon level improved consistency of measurements across the different microarray platforms compared to annotation-based matches. Generally, consistency was good for highly expressed genes, and variable for genes with lower expression values as confirmed by quantitative real-time (QRT)-PCR. Concordance of measurements was higher between laboratories on the same platform than across platforms. We demonstrate that, after stringent preprocessing, commercial arrays were more consistent than in- house arrays, and by most measures, one-dye platforms were more consistent than two-dye platforms.
In this article, we consider the problem of testing a linear hypothesis in a multivariate linear regression model which includes the case of testing the equality of mean vectors of several multivariate normal populati...
详细信息
In this article, we consider the problem of testing a linear hypothesis in a multivariate linear regression model which includes the case of testing the equality of mean vectors of several multivariate normal populations with common covariance matrix Sigma, the so-called multivariate analysis of variance or MANOVA problem. However, we have fewer observations than the dimension of the random vectors. Two tests are proposed and their asymptotic distributions under the hypothesis as well as under the alternatives are given under some mild conditions. A theoretical comparison of these powers is made. (c) 2006 Elsevier Inc. All rights reserved.
This paper proposes a new method for tumor classification using gene expression data, which mainly contains three steps. Firstly, the original dna inicroarray gene expression data are selected using t-statistics. Seco...
详细信息
This paper proposes a new method for tumor classification using gene expression data, which mainly contains three steps. Firstly, the original dna inicroarray gene expression data are selected using t-statistics. Secondly, the selected genes are modeled by Independent Component Analysis (ICA). Finally, Support Vector Machine (SVM) is used to classify the modeling data. To show the validity of the proposed method, we apply it to classify two dna microarray data sets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.
gene selection is one of the major challenges of biochip technology for resolution of curse of dimentionality which occurs especially in dna microarray dataset where there are more than thousands of genes and only a f...
详细信息
ISBN:
(纸本)1424404657
gene selection is one of the major challenges of biochip technology for resolution of curse of dimentionality which occurs especially in dna microarray dataset where there are more than thousands of genes and only a few experiments (samples), and for gene diagnosis where only a gene subset is enough for diagnosis of diseases. This paper presents a gene selection method by training linear SVM (support vector machine)/nonlinear MLP (multi-layer perceptron) classifiers and testing them with cross validation for finding gene subset which is optimal/suboptimal for diagnosis of binary/multiple disease classes. The process is to select genes with linear SVM classifier incrementally for the diagnosis of each binary disease class pair, by testing its generalization ability with leave-one-out cross validation;the union of them is used as initialized gene subset for the discrimination of all the disease classes, from which genes are deleted one by (one decrementally by removing the gene which brings the greatest decrease of the generalization power after the removal, where generalization is measured by leave-one-out and leave-4-out cross validation. For real dna microarray data with 2308 genes and only 64 labelled samples belonging to 4 disease classes, only 6 genes are selected to be diagnostic genes. The diagnostic genes are tested with 6-2-4 MLP with both leave-one-out and leave-4-out cross validation, resulting in no misclassification.
作者:
Molinaro, AMSimon, RPfeiffer, RMNCI
Biostat Branch Div Canc Epidemiol & Genet NIH Rockville MD 20852 USA NCI
Biometr Res Branch Div Canc Treatment & Diagnost NIH Rockville MD 20852 USA Yale Univ
Sch Med Dept Epidemiol & Publ Hlth New Haven CT 06520 USA
Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three i...
详细信息
Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase. Contact: ***@*** Supplementary Information: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://***/brb/***).
In this paper, we try to provide a global view of dnamicroarray gene expression data analysis and modeling process by combining novel and effective visualization techniques with data mining algorithms. An integrated ...
详细信息
ISBN:
(纸本)081945642X
In this paper, we try to provide a global view of dnamicroarray gene expression data analysis and modeling process by combining novel and effective visualization techniques with data mining algorithms. An integrated framework has been proposed to model and visualize short, high-dimensional gene expression data. The framework reduces the dimensionality of variables before applying appropriate temporal modeling method. Prototype has been built using Java3D to visualize the framework. The prototype takes gene expression data as input, clusters the genes, displays the clustering results using a novel graph layout algorithm, models individual gene clusters using Dynamic Bayesian Network and then visualizes the modeling results using simple but effective visualization techniques.
Dizertační práce se zabývá predikcí vysokodimenzionálních dat genových expresí. Množství dostupných genomických dat významně vzrostlo v průběhu posled...
详细信息
Dizertační práce se zabývá predikcí vysokodimenzionálních dat genových expresí. Množství dostupných genomických dat významně vzrostlo v průběhu posledního desetiletí. Kombinování dat genových expresí s dalšími daty nachází uplatnění v mnoha oblastech. Například v klinickém řízení rakoviny (clinical cancer management) může přispět k přesnějšímu určení prognózy nemocí. Hlavní část této dizertační práce je zaměřena na kombinování dat genových expresí a klinických dat. Používáme logistické regresní modely vytvořené prostřednictvím různých regularizačních technik. Generalizované lineární modely umožňují kombinování modelů s různou strukturou dat. V dizertační práci je ukázáno, že kombinování modelu dat genových expresí a klinických dat může vést ke zpřesnění výsledku predikce oproti vytvoření modelu pouze z dat genových expresí nebo klinických dat. Navrhované postupy přitom nejsou výpočetně náročné. Testování je provedeno nejprve se simulovanými datovými sadami v různých nastaveních a následně s~reálnými srovnávacími daty. Také se zde zabýváme určením přídavné hodnoty microarray dat. Dizertační práce obsahuje porovnání příznaků vybraných pomocí klasifikátoru genových expresí na pěti různých sadách dat týkajících se rakoviny prsu. Navrhujeme také postup výběru příznaků, který kombinuje data genových expresí a znalosti z genových ontologií.
暂无评论