We compare methods based on the Signed Distance Function (SDF) a new tool for binary classification with standard Support Vector Machine (SVM) methods. We demonstrate on several sets of micro-array data that the perfo...
详细信息
ISBN:
(纸本)9780769536859
We compare methods based on the Signed Distance Function (SDF) a new tool for binary classification with standard Support Vector Machine (SVM) methods. We demonstrate on several sets of micro-array data that the performance of the SDF based methods can match or exceed that of SVM methods.
We introduce a simple and computationally trivial method for binary classification based on the evaluation of potential functions. We demonstrate that despite the conceptual and computational simplicity of the method ...
详细信息
ISBN:
(纸本)9780769536859
We introduce a simple and computationally trivial method for binary classification based on the evaluation of potential functions. We demonstrate that despite the conceptual and computational simplicity of the method its performance can match or exceed that of standard Support Vector Machine methods.
Association mining tasks, when put to microarray data, normal trend is to highlight amount of discovered knowledge while quality analysis goes to backseat. Ideally, two more information is equally important: a) accura...
详细信息
ISBN:
(纸本)9780769535203
Association mining tasks, when put to microarray data, normal trend is to highlight amount of discovered knowledge while quality analysis goes to backseat. Ideally, two more information is equally important: a) accuracy of knowledge extracted in a rule with respect to known biological functions, and b) predictability of biological interactions from discovered rules. Most of the support and/or confidence-based techniques address only predictability or neither of them. It requires tedious post-processing to unearth the actually interesting ones from the bulky output set. In the present work, we exploit the notion of direct interaction (DI) and cohesion to develop a sound methodology for binding genes under common affinity groups and mine intra-group associations. To evaluate soundness, we apply the method in cell cycle data of yeast and analyze result with the help of known biological interactions in BM. We found impressive values for both accuracy and predictability.
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in ...
详细信息
ISBN:
(纸本)9783642024801
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in the data that contributes to a disease. This selection process is difficult because of the availability of the small number of samples compared to the huge number of genes, many irrelevant genes, and noisy genes. Therefore, this paper proposes an improved binary particle swarm optimisation to select a near-optimal (smaller) subset of informative genes that is relevant for cancer classification. Experimental results show that the performance of the proposed method is superior to a standard version of particle swarm optimisation and other related previous works in terms of classification accuracy and the number of selected genes.
We present a methodology to analyze zebrafish knock-out experiment replicated tune series microarray data. The knock-out experiment aimed to elucidate the transcriptomal regulators underlying the glycocalyx-regulation...
详细信息
ISBN:
(纸本)9783642025037
We present a methodology to analyze zebrafish knock-out experiment replicated tune series microarray data. The knock-out experiment aimed to elucidate the transcriptomal regulators underlying the glycocalyx-regulation of vasculogenesis by performing global gene expression analysis of NDST mutants and wild-type siblings at three distinct tune points during development. Cluster analysis and the construction of a genetic interaction network allows to identify groups of genes acting ill the process of early stage vasculogenesis. We report;the following findings: we found a large number of gene clusters, particularly glycans, during the three developmental steps of the zebrafish organism. In each step;genes connectivity changes according to two different powerlaws. The clusters are highlighted ill such a way;that it is possible to see the dynamics of the interactions through the time points recorded ill tire microarray experiment. Vegf-related genes seem riot to be involved at transcriptomics level, suggesting alternative regulative pathways do exist, to modulate transcriptomal signatures ill developing zebrafish. Our results show that, there are several glycan-related genies which may be involved in early processes such as vasculogenesis.
microarray data are expected to be useful for cancer classification. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in the data that contributes t...
详细信息
microarray data are expected to be useful for cancer classification. The main problem that needs to be addressed is the selection of a smaller subset of genes from the thousands of genes in the data that contributes to a cancer disease. This selection process is difficult due to many irrelevant genes, noisy data, and the availability of the small number of samples compared to the huge number of genes (higher-dimensional data). Hence, this paper aims to select a smaller subset of informative genes that is the most relevant for the cancer classification. To achieve the aim, a cyclic hybrid method has been proposed. Five real microarray data sets are used to test the effectiveness of the method. Experimental results show that the performance of the proposed method is superior to other experimental methods and related previous works in terms of classification accuracy and the number of selected genes. In addition, a scatter gene graph and a list of informative genes in the best gene subsets are also presented for biological usage.
Background: The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount o...
详细信息
Background: The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount of data generated, has implications when it comes to effective storage, analysis and sharing of these data. A number of software tools have been developed to store, analyze, and share microarray data. However, a majority of these tools do not offer all of these features nor do they specifically target the commonly used two color Agilent DNA microarray platform. Thus, the motivating factor for the development of EDGE(3) was to incorporate the storage, analysis and sharing of microarray data in a manner that would provide a means for research groups to collaborate on Agilent-based microarray experiments without a large investment in software-related expenditures or extensive training of end-users. Results: EDGE(3) has been developed with two major functions in mind. The first function is to provide a workflow process for the generation of microarray data by a research laboratory or a microarray facility. The second is to store, analyze, and share microarray data in a manner that doesn't require complicated software. To satisfy the first function, EDGE(3) has been developed as a means to establish a well defined experimental workflow and information system for microarray generation. To satisfy the second function, the software application utilized as the user interface of EDGE(3) is a web browser. Within the web browser, a user is able to access the entire functionality, including, but not limited to, the ability to perform a number of bioinformatics based analyses, collaborate between research groups through a user-based security model, and access to the raw data files and quality control files generated by the software used to extract the signals from an array image. Conclusion: Here, we present EDGE(3), an open-source, web-based application that allows f
data mining algorithms are extensively used to classify gene expression data, in which prediction of disease plays a vital role. This paper aims to develop a new classification algorithm for cancer gene expression dat...
详细信息
data mining algorithms are extensively used to classify gene expression data, in which prediction of disease plays a vital role. This paper aims to develop a new classification algorithm for cancer gene expression data using minimal number of gene combinations i. e. minimum gene subsets. The model uses classical statistical technique for gene ranking and two different classifiers for gene selection and prediction. The proposed method proves the capability of producing very high accuracy with very minimum number of genes. The methodology was tried with three publicly available cancer databases and the results were compared with the earlier approaches and proven better and promising prediction strength with less computational burden. This paper also focuses on the importance of applying an efficient gene selection method prior to classification can lead to good performance and the results are proven to be the best.
In this paper we aim to infer a model of genetic networks from time series data of gene expression profiles by using a new gene expression programming algorithm. Gene expression networks are modelled by differential e...
详细信息
ISBN:
(纸本)9781920682729
In this paper we aim to infer a model of genetic networks from time series data of gene expression profiles by using a new gene expression programming algorithm. Gene expression networks are modelled by differential equations which represent temporal gene expression relations. Gene Expression Programming is a new extension of genetic programming. Here we combine a local search method with gene expression programming to form a memetic algorithm in order to find not only the system of differential equations but also fine tune its constant parameters. The effectiveness of the proposed method is justified by comparing its performance with that of conventional genetic programming applied to this problem in previous studies.
The identification of marker genes trigger the growth of mutated cells has received a significant attention from both medical and computing communities. Through the identified genes, the pathology of mutated cells can...
详细信息
The identification of marker genes trigger the growth of mutated cells has received a significant attention from both medical and computing communities. Through the identified genes, the pathology of mutated cells can be revealed and precautions can be taken to prevent further proliferation of abnormal cells. In this paper, we propose an innovative gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia cancer. Our approach able to provide a sharper focus on a group of highly expressed genes in leukaemia dataset and the identified genes have been proven significant to the study of leukaemia cancer development.
暂无评论