An evolutionary rough feature selection algorithm is proposed for classifying gene expression patterns. Since the data typically consist of a large number of redundant features, an initial redundancy reduction of the ...
详细信息
ISBN:
(纸本)9780769548012
An evolutionary rough feature selection algorithm is proposed for classifying gene expression patterns. Since the data typically consist of a large number of redundant features, an initial redundancy reduction of the attributes is done to enable faster convergence. Rough set theory is employed to generate the distinction table that enable PSO to find reducts, which represent the minimal sets of non-redundant features capable of discerning between all objects. The effectiveness of the algorithm is demonstrated on three benchmark cancer datasets viz. Colon, Lymphoma and Leukemia using MOGA.
Recent technological progress on high-throughput measurements for gene expression such as microarray analysis enables us to collect time-series gene expression data for each of tens of thousands of genes. Although a g...
详细信息
ISBN:
(纸本)9788993215045
Recent technological progress on high-throughput measurements for gene expression such as microarray analysis enables us to collect time-series gene expression data for each of tens of thousands of genes. Although a genomic analysis with those data has identified key genes relating to various diseases, few results on estimation of gene regulatory networks with real microarray data are available so far. Recently, the immediately early response (IER) genes upon epidermal growth factor stimulation in a human breast cancer cell line, MCF-7, have been identified in which time-course microarray data were measured during 90 minutes and 63 IER genes were chosen from tens of thousands of genes by using statistical analysis. In this paper, we estimate the gene regulatory networks among the 63 IER genes. To this end, we apply an estimation method based on a mixed logic dynamical modeling developed in an earlier study to the microarray data. However, the original method is executable for continuous gene expression time-series data whereas the real microarray time-course data have very few time points. In addition, some presetting parameters in the model are critical for a successful result on a network estimation. Then, we add a preprocessing and Monte Carlo-based calculation for the original method.
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this paper, we propose a non-parametric Bayesian clustering algorithm based on the hierarchical Dirich...
详细信息
ISBN:
(纸本)9781467301831
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this paper, we propose a non-parametric Bayesian clustering algorithm based on the hierarchical Dirichlet processes (HDP). The proposed clustering algorithm captures the hierarchical features prevalent in biological data such as the gene express data by introducing a hierarchical structure in the model. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor. We conduct experiments on the yeast galactose datasets and yeast cell cycle datasets by comparing our clustering results to the standard results. The proposed clustering algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. The experiments also show that the proposed clustering algorithm provides more information and reduces the unnecessary clustering fragments than the clustering algorithm based on Dirichlet mixture model.
Uncovering transcription factor (TF) mediated regulatory networks from microarray expression data and prior knowledge is considered in this paper. Bayesian factor models that models direct TF regulation are formulated...
详细信息
ISBN:
(纸本)9781467301831
Uncovering transcription factor (TF) mediated regulatory networks from microarray expression data and prior knowledge is considered in this paper. Bayesian factor models that models direct TF regulation are formulated. To address the enormous computational complexity of the factor for modeling large networks, a novel, efficient basis-expansion factor (BE-FaM) model has been proposed, where the loading (regulatory) matrix is modeled as an expansion of basis functions of much lower dimension. Great reduction is achieved with BE-FaM as the inference involves instead estimation of expansion coefficients with much reduced dimensions. We also address the issue of incorporating the prior knowledge of TF regulation to constrain the factor loading matrix. A Gibbs sampling solution has been developed to estimate the unknowns. The proposed model was validated by the simulation and then applied to the genomic data of the breast cancer to uncover the corresponding TF regulatory networks.
As a high dimensional problem, analysis of microarray data sets is a challenging task, where many weakly relevant or redundant features hurt generalization performance of classifiers. The previous works used redundant...
详细信息
ISBN:
(纸本)9781467327466;9781467327459
As a high dimensional problem, analysis of microarray data sets is a challenging task, where many weakly relevant or redundant features hurt generalization performance of classifiers. The previous works used redundant feature detection methods to select discriminative compact gene set, which only considered the relationship among features, not the redundancy of classification ability among features. Here, we propose a novel algorithm named RESI (Redundant fEature Selection depending on Instance), which considers label information in the measure of feature subset redundancy. Experimental results on benchmark data sets show that RESI performs better than the previous state-of-arts algorithms on redundant feature selection methods like mRMR.
Objective: With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunction...
详细信息
Gene association plays important roles in complex genetic pathology of cancer. However, development of methods for finding cancer-related gene associations is still in its infancy. Based on a biological concept of gen...
详细信息
ISBN:
(纸本)9783642245527;9783642245534
Gene association plays important roles in complex genetic pathology of cancer. However, development of methods for finding cancer-related gene associations is still in its infancy. Based on a biological concept of gene association module (GAM) comprising a center gene and its expression-related genes, this paper proposes a gene association detection model called kernel GAM (kGAM). In the model, we assume that the expression of the center gene can be predicted by the expression-related genes. Based on defining a cost function, a kernel ridge regression algorithm is developed to solve the kGAM model. Finally, to identify a compact GAM for a given center gene, a heuristic search procedure is designed. Experimental results on three publicly available gene expression data sets show the effectiveness and efficiency of the proposed kGAM model in identifying cancer-related gene association patterns.
In this paper, an integration model of cancer patients data types such as microarray DNA and clinical data will be experimentally explored. The data of integration will be used for cancer subtype identification using ...
详细信息
ISBN:
(纸本)9781467308946;9788994364261
In this paper, an integration model of cancer patients data types such as microarray DNA and clinical data will be experimentally explored. The data of integration will be used for cancer subtype identification using kernel based classification methods which is the extension of Support Vector Machine (SVM) approach with Kernel Dimensionality Reduction (KDR). KDR-SVM method will be implemented in Lymphoma cancer database and the relevant clinical information. data type representation will be modeled in an appropriate kernel matrix. The results of the experiment show that the KDR-IO dimensions and data integration can improve the accuracy of the identification of subtype cancer.
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The...
详细信息
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.
Modified correlation Technique has been proposed to analyze the microarray data and to search gene related information. We have used mean absolute deviation as a new approach for the differential expression analysis i...
详细信息
暂无评论