In their study, McKinney et al. showed the high potential of artificial intelligence for breast cancer screening. However, the lack of detailed methods and computer code undermines its scientific value. We identify ob...
详细信息
Feature selection in which most informative variables are selected for model generation is an important step in pattern recognition. Here, one often tries to optimize multiple criteria such as discriminating power of ...
详细信息
Feature selection in which most informative variables are selected for model generation is an important step in pattern recognition. Here, one often tries to optimize multiple criteria such as discriminating power of the descriptor, performance of model and cardinality of a subset. In this paper we propose a fuzzy criterion in multi-objective unsupervised feature selection by applying the hybridized filter-wrapper approach (FC-MOFS). These formulations allow for an efficient way to pick features from a pool and to avoid misunderstanding of overlapping features via crisp clustered learning in a conventional multi-objective optimization procedure. Moreover, the optimization problem is solved by using non-dominated sorting genetic algorithm, type two (NSGA-II). The performance of the proposed approach is then examined on six benchmark datasets from multiple disciplines and different numbers of features. Systematic comparisons of the proposed method and representative non-fuzzified approaches are illustrated in this work. The experimental studies show a superior performance of the proposed approach in terms of accuracy and feasibility.
Genomic structural variations play key roles in genetic diversity and disease. Despite recent advances in structural variation discovery, many variants are yet to be discovered. Midsize insertions and deletions pose p...
详细信息
Peptide search engines are algorithms that are able to identify peptides (i.e., short proteins or parts of proteins) from mass spectra of biological samples. These identification algorithms report the best matching pe...
详细信息
COnstraint-Based Reconstruction and Analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental data and quantitative prediction of physicochemically and biochemically feasible...
详细信息
Cancer classification through high-throughput gene expression profiles has been widely used in biomedical research. Most recently, we portrayed a multivariate method for large scale gene selection based on information...
详细信息
ISBN:
(纸本)9789897580154
Cancer classification through high-throughput gene expression profiles has been widely used in biomedical research. Most recently, we portrayed a multivariate method for large scale gene selection based on information theory with the central issue of feature interdependence, and we validated its effectiveness using a colon cancer benchmark. The present paper further develops our previous work on feature interdependence. Firstly, we have refined the method and proposed a complete framework to select a gene signature for a certain disease phenotype prediction under high-throughput technologies. The framework has then been applied to a brain cancer gene expression profile derived from Affymetrix Human Genome U95Av2 Array, where the number of interrogated genes is six times larger than that in the previously studied colon cancer data set. Three information theory based filters were used for comparison. Our experimental results show that the framework outperforms them in terms of classification performance based upon three performance measures. Additionally, to demonstrate how effectively feature interdependence can be tackled within the framework, two sets of enrichment analysis have also been performed. The results also show that more statistically significant gene sets and regulatory interactions could be found in our gene signature. Therefore, this framework could be promising for high-throughput gene selection around gene synergy.
The Konstanz Information Miner is a user-friendly graphical workflow designer with a broad user base in industry and academia. Its broad range of embedded tools and its powerful data mining and visualization tools ren...
详细信息
The Konstanz Information Miner is a user-friendly graphical workflow designer with a broad user base in industry and academia. Its broad range of embedded tools and its powerful data mining and visualization tools render it ideal for scientific workflows. It is thus used more and more in a broad range of applications. However, the free version typically runs on a desktop computer, restricting users if they want to tap into computing power. The grid and cloud User Support Environment is a free and open source project created for parallelized and distributed systems, but the creation of workflows with the included components has a steeper learning curve. In this work we suggest an easy to implement solution combining the ease-of-use of the Konstanz Information Miner with the computational power of distributed computing infrastructures. We present a solution permitting the conversion of workflows between the two platforms. This enables a convenient development, debugging, and maintenance of scientific workflows on the desktop. These workflows can then be deployed on a cloud or grid, thus permitting large-scale computation. To achieve our goals, we relied on a Common Tool Description XML file format which describes the execution of arbitrary programs in a structured and easily readable and parseable way. In order to integrate external programs into we employed the Generic KNIME Nodes extension.
Standard patient parameters, tumor markers, and tumor diagnosis records are used for identifying prediction models for tumor markers as well as cancer diagnosis predictions. In this paper we present a hybrid clusterin...
详细信息
暂无评论