The number of genes in microarray data is much larger than the number of available effective samples. Therefore, dealing with a small number of microarray data which are represented by high dimensional features, but o...
详细信息
ISBN:
(纸本)9781509055210
The number of genes in microarray data is much larger than the number of available effective samples. Therefore, dealing with a small number of microarray data which are represented by high dimensional features, but of high correlations and strong interferences of redundancy and noise, has become one of the important tasks in gene microarray data extraction and classification. In this paper, a new method is proposed. We use wavelet decomposition to extract gene microarray data, and then use the t-test, ReliefF, Wilcoxon test, and other algorithms to select the data after wavelet transform. Finally, the Borda method is used to merge the sorted values. Three datasets were used in the experiment, namely the leukemia dataset, the prostate dataset, and the lung cancer dataset. Experimental results show that the method proposed in this paper can effectively classify the cancer gene microarray data.
We present an “upstream analysis” strategy for causal analysis of multiple “-omics” data. It analyzes promoters using the TRANSFAC database, combines it with an analysis of the upstream signal transduction pathway...
详细信息
microarray experiments usually generate multiple missing values in gene expression data sets due to several reasons. In the paper, a robust method has been proposed to estimate the missing values in microarray data us...
详细信息
ISBN:
(纸本)9781467301275
microarray experiments usually generate multiple missing values in gene expression data sets due to several reasons. In the paper, a robust method has been proposed to estimate the missing values in microarray data using biological knowledge of the genes. Missing values are imputed based on the similarity in their characteristics patterns observed among the co-regulated genes. In this approach, first the microarray data are normalized and then the normalized data is discretized to measure similarity between the genes. The estimation accuracy of the proposed method is compared with the existing K-Nearest Neighbor based method and Pattern Similarity Matching (PSM) considering 4000 genes generated from 192 experiments. The experimental results exhibit that the proposed method outperforms other methods in terms of accuracy.
Feature subset selection is a well-known pattern recognition problem, which aims to reduce the number of features used in *** or recognition. This reduction is expected to improve the performance of *** algorithms in ...
详细信息
Feature subset selection is a well-known pattern recognition problem, which aims to reduce the number of features used in *** or recognition. This reduction is expected to improve the performance of *** algorithms in terms of speed, accuracy and simplicity. Most existing feature selection investigations are not suitable for microarray data, so this paper focuses on gene selection problem. The main contributions of this paper are that a new feature selection method A-score is introduced and constructed an improved fuzzy Bayesian ***. We evaluate the performance of Ascore using three well-known benchmark data sets: the iris data, the wine data, and the Wisconsin breast cancer data and two microarray data: ALL-AML Leukemia and colon cancer. In general, A-score can *** reduce the number of genes, and perform better than T-score and C-score.
Many bioinformatics analytical tools, especially for cancer classification and prediction, require complete sets of data matrix. Having missing values in gene expression studies significantly influences the interpreta...
详细信息
Many bioinformatics analytical tools, especially for cancer classification and prediction, require complete sets of data matrix. Having missing values in gene expression studies significantly influences the interpretation of final data. However, to most analysts' dismay, this has become a common problem and thus, relevant missing value imputation algorithms have to be developed and/or refined to address this matter. This paper intends to present a review of preferred and available missing value imputation methods for the analysis and imputation of missing values in gene expression data. Focus is placed on the abilities of algorithms in performing local or global data correlation to estimate the missing values. Approaches of the algorithms mentioned have been categorized into global approach, local approach, hybrid approach, and knowledge assisted approach. The methods presented are accompanied with suitable performance evaluation. The aim of this review is to highlight possible improvements on existing research techniques, rather than recommending new algorithms with the same functional aim.
This paper studies differential equation-based mathematical models and their numerical solutions for genetic regulatory network identification. The primary objectives are to design, analyze, and test a general variati...
详细信息
This paper studies differential equation-based mathematical models and their numerical solutions for genetic regulatory network identification. The primary objectives are to design, analyze, and test a general variational framework and numerical methods for seeking its approximate solutions for reverse engineering genetic regulatory networks from microarray datasets. In the proposed variational framework, no structure assumption on the genetic network is presumed, instead, the network is solely determined by the microarray profile of the network components and is identified through a well chosen variational principle which minimizes an energy functional. The variational principle serves not only as a selection criterion to pick up the right solution of the underlying differential equation model but also provides an effective mathematical characterization of the small-world property of genetic regulatory networks which has been observed in lab experiments. Five specific models within the variational framework and efficient numerical methods and algorithms for computing their solutions are proposed and analyzed. Model validations using both synthetic network datasets and subnetwork datasets of Saccharomyces cerevisiae (yeast) and E. coli are performed on all five proposed variational models and a performance comparison versus some existing genetic regulatory network identification methods is also provided.
Most of current false discovery rate (FDR) procedures in a microarray experiment assume restrictive dependence structures, resulting in being less reliable. FDR controlling procedure under suitable dependence structur...
详细信息
Most of current false discovery rate (FDR) procedures in a microarray experiment assume restrictive dependence structures, resulting in being less reliable. FDR controlling procedure under suitable dependence structures based on Poisson distributional approximation is shown. Unlike other procedures, the distribution of false null hypotheses is estimated by using kernel density estimation allowing for dependent structures among the genes. Furthermore, we develop an FDR framework that minimizes the false nondiscovery rate (FNR) with a constraint on the controlled level of the FDR. The performance of the proposed FDR procedure is compared with that of other existing FDR controlling procedures, with an application to the microarray study of simulated data.
The family of discriminant neighborhood embedding (DNE) methods is typical graph-based methods for dimension reduction, and has been successfully applied to face recognition. This paper proposes a new variant of DNE, ...
详细信息
The family of discriminant neighborhood embedding (DNE) methods is typical graph-based methods for dimension reduction, and has been successfully applied to face recognition. This paper proposes a new variant of DNE, called similarity-balanced discriminant neighborhood embedding (SBDNE) and applies it to cancer classification using gene expression data. By introducing a novel similarity function, SBDNE deals with two data points in the same class and the different classes with different ways. The homogeneous and heterogeneous neighbors are selected according to the new similarity function instead of the Euclidean distance. SBDNE constructs two adjacent graphs, or between-class adjacent graph and within-class adjacent graph, using the new similarity function. According to these two adjacent graphs, we can generate the local between-class scatter and the local within-class scatter, respectively. Thus, SBDNE can maximize the between-class scatter and simultaneously minimize the within-class scatter to find the optimal projection matrix. Experimental results on six microarray datasets show that SBDNE is a promising method for cancer classification. (C) 2015 Elsevier Ltd. All rights reserved.
Cancer is a systemic disease involving dysregulated biological processes of cell proliferation, metabolism, and apoptosis. It is known that some types of cancer have longer life span, and they are even curable if they...
详细信息
Cancer is a systemic disease involving dysregulated biological processes of cell proliferation, metabolism, and apoptosis. It is known that some types of cancer have longer life span, and they are even curable if they are diagnosed and treated properly in the early stage. So it is essential to find biomarkers to detect these cancers in their early stages. With the rapid development of high-throughput microarray and sequencing technologies, many biomarker-based cancer early diagnosis assays are proposed and some are already available in the market. Most of the cancer biomarkers are detected through comparing cancer samples versus normal samples in a certain cancer type, but most of them are not in the comparison against other cancer types. In this research, we propose a novel computational method to comprehensively detect highly accurate cancer biomarkers for different groups of cancer types, with a special emphasis on the detection specificity against the control samples including both those from healthy persons and those from other cancer types. Such biomarkers are called specific biomarkers for a given cancer group, which may be defined as cancers of the same type, cancers with similar survival rates, grade, development stage, or cancers in the same human body systems, etc. The proposed algorithm is extensively evaluated across eight cancer types, and the detection performance shows that the specific biomarkers have reasonable sensitivities and very high specificities. The main contributions of this work are (a) the detection of highly specific biomarkers for eight cancer types and (b) the detection of specific biomarkers for cancers with the similar survival rates. The proposed algorithm may also be used to detect specific biomarkers for cancers of given stages, grades or belonging systems, etc.
暂无评论