The prediction of cancer progression is one of the most challenging problems in oncology. In this paper, we apply the penalized logistic model to microarray data in combination with co-expression genes to identify pat...
详细信息
ISBN:
(纸本)9781467311830
The prediction of cancer progression is one of the most challenging problems in oncology. In this paper, we apply the penalized logistic model to microarray data in combination with co-expression genes to identify patients with prostate cancer progression. Compared with conventional methods, penalized logistic regression (PLR) has some advantages such as providing an estimate of the probability in classification label, genetic interpretation of regression coefficients, and short computation time. We employed the top score pair (TSP) approach to select genes for PLR. The TSP method was originally proposed for binary classification of phenotypes according to the relative expression of one gene pair. In the proposed algorithm of this paper, we first identified co-expressed TSP genes and then used PLR to the microarray data for predicting prostate cancer. We applied the framework to the microarray analysis on prostate cancer progression. We have identified three gene pairs associated with prostate cancer progression for PLR model. We compared our approach with the standard classification techniques such as support vector machines (SVMs), Lasso, and Fisher discriminative analysis (FDA). We found that our method yielded better performance in terms of classification and prediction. Furthermore, it has the advantages to provide the underlying probability of predicting the classification, robust biomarker genes and interpretable regression coefficients.
In this paper, we propose a novel method based on support vector machine (SVM) for microarray classification and gene (feature) selection. The proposed method, called similaritybased SVM (SSVM), incorporates the prior...
详细信息
In this paper, we propose a novel method based on support vector machine (SVM) for microarray classification and gene (feature) selection. The proposed method, called similaritybased SVM (SSVM), incorporates the prior knowledge of gene similarity into the standard SVM by combining the standard l 2 norm and the similarity penalty of all the genes. The preliminary experiments show that our method performs better than the standard SVM, l 2 l 0 SVM and SVMRFE, especially when the features are highly similar.
The correct inference of gene regulatory networks for the understanding of the intricacies of the complex biological regulations remains an intriguing task for researchers. With the availability of large dimensional m...
详细信息
The correct inference of gene regulatory networks for the understanding of the intricacies of the complex biological regulations remains an intriguing task for researchers. With the availability of large dimensional microarray data, relationships among thousands of genes can be simultaneously extracted. Among the prevalent models of reverse engineering genetic networks, S-system is considered to be an efficient mathematical tool. In this paper, Bat algorithm, based on the echolocation of bats, has been used to optimize the S-system model parameters. A decoupled S-system has been implemented to reduce the complexity of the algorithm. Initially, the proposed method has been successfully tested on an artificial network with and without the presence of noise. Based on the fact that a real-life genetic network is sparsely connected, a novel Accumulative Cardinality based decoupled S-system has been proposed. The cardinality has been varied from zero up to a maximum value, and this model has been implemented for the reconstruction of the DNA SOS repair network of Escherichia coli. The obtained results have shown significant improvements in the detection of a greater number of true regulations, and in the minimization of false detections compared to other existing methods.
In this paper we aim to infer a model of genetic networks from time series data of gene expression profiles by using a new gene expression programming algorithm. Gene expression networks are modelled by differential e...
详细信息
ISBN:
(纸本)9781920682729
In this paper we aim to infer a model of genetic networks from time series data of gene expression profiles by using a new gene expression programming algorithm. Gene expression networks are modelled by differential equations which represent temporal gene expression relations. Gene Expression Programming is a new extension of genetic programming. Here we combine a local search method with gene expression programming to form a memetic algorithm in order to find not only the system of differential equations but also fine tune its constant parameters. The effectiveness of the proposed method is justified by comparing its performance with that of conventional genetic programming applied to this problem in previous studies.
The identification of marker genes trigger the growth of mutated cells has received a significant attention from both medical and computing communities. Through the identified genes, the pathology of mutated cells can...
详细信息
The identification of marker genes trigger the growth of mutated cells has received a significant attention from both medical and computing communities. Through the identified genes, the pathology of mutated cells can be revealed and precautions can be taken to prevent further proliferation of abnormal cells. In this paper, we propose an innovative gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia cancer. Our approach able to provide a sharper focus on a group of highly expressed genes in leukaemia dataset and the identified genes have been proven significant to the study of leukaemia cancer development.
In this paper, an integration model of cancer patients data types such as microarray DNA and clinical data will be experimentally explored. The data of integration will be used for cancer subtype identification using ...
详细信息
ISBN:
(纸本)9781467308946
In this paper, an integration model of cancer patients data types such as microarray DNA and clinical data will be experimentally explored. The data of integration will be used for cancer subtype identification using kernel based classification methods which is the extension of Support Vector Machine (SVM) approach with Kernel Dimensionality Reduction (KDR). KDR-SVM method will be implemented in Lymphoma cancer database and the relevant clinical information. data type representation will be modeled in an appropriate kernel matrix. The results of the experiment show that the KDR-10 dimensions and data integration can improve the accuracy of the identification of subtype cancer.
Insecticide resistance, a character inherited that encompasses alteration in one or more of insect's genes is now a major public health challenge combating world efforts on malaria control strategies. Anopheles ha...
详细信息
ISBN:
(纸本)9781479925797
Insecticide resistance, a character inherited that encompasses alteration in one or more of insect's genes is now a major public health challenge combating world efforts on malaria control strategies. Anopheles has developed heavy resistance to pyrethroids, the only World Health Organization (WHO) recommended class for Indoor Residual Spray (IRS) and Long-Lasting Insecticide Treated Nets (LLITNs) through P450 pathways. We used the biochemical network of Anopheles gambiae (henceforth Ag) to deduce its resistance mechanism(s) using two expression data (when Ag is treated with pyrethroid and when controlled). The employed computational techniques are accessible by a robust, multi-faceted and friendly automated graphic user interface (GUI) tagged 'workbench' with JavaFX Scenebuilder. In this work, we introduced a computational platform to determine and also elucidate for the first time resistance mechanism to a commonly used class of insecticide, Pyrethroid. Significantly, our work is the first computational work to identify genes associated or involved in the efflux system in Ag and as a resistance mechanism in the Anopheles.
Motivation: When analyzing expression experiments, researchers are often interested in identifying the set of biological processes that are up-or down-regulated under the experimental condition studied. Current approa...
详细信息
Motivation: When analyzing expression experiments, researchers are often interested in identifying the set of biological processes that are up-or down-regulated under the experimental condition studied. Current approaches, including clustering expression profiles and averaging the expression profiles of genes known to participate in specific processes, fail to provide an accurate estimate of the activity levels of many biological processes. Results: We introduce a probabilistic continuous hidden process Model (CHPM) for time series expression data. CHPM can simultaneously determine the most probable assignment of genes to processes and the level of activation of these processes over time. To estimate model parameters, CHPM uses multiple time series datasets and incorporates prior biological knowledge. Applying CHPM to yeast expression data, we show that our algorithm produces more accurate functional assignments for genes compared to other expression analysis methods. The inferred process activity levels can be used to study the relationships between biological processes. We also report new biological experiments confirming some of the process activity levels predicted by CHPM.
Finding the minimum number of appropriate biomarkers for specific targets such as a lung cancer has been a challenging issue in bioinformatics. We propose a hierarchical two-phase framework for selecting appropriate b...
详细信息
Finding the minimum number of appropriate biomarkers for specific targets such as a lung cancer has been a challenging issue in bioinformatics. We propose a hierarchical two-phase framework for selecting appropriate biomarkers that extracts candidate biomarkers from the cancer microarray datasets and then selects the minimum number of appropriate biomarkers from the extracted candidate biomarkers datasets with a specific neuro-fuzzy algorithm, which is called a neural network with weighted fuzzy membership function (NEWFM). In this context, as the first phase, the proposed framework is to extract candidate biomarkers by using a Bhattacharyya distance method that measures the similarity of two discrete probability distributions. Finally, the proposed framework is able to reduce the cost of finding biomarkers by not receiving medical supplements and improve the accuracy of the biomarkers in specific cancer target datasets.
暂无评论