micro-array technology generates high-dimensional data. The high dimensionality of data hampers the learning capability of machine learning algorithms. Dimensionality can be reduced using feature selection (FS) techni...
详细信息
micro-array technology generates high-dimensional data. The high dimensionality of data hampers the learning capability of machine learning algorithms. Dimensionality can be reduced using feature selection (FS) techniques, which is an important and essential pre-processing step to process high dimensional data. In this work, a hybrid filter?wrapper approach is proposed for feature selection. The multi-attribute decision-making method called Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) is used as a filter for informative feature extraction. Further, Binary Jaya algorithm with time-varying transfer function is proposed as a wrapper feature selector to find the optimal subset of features. The proposed approach is tested on 10 benchmark micro-array datasets and compared with state-of-the-art methods. Experimental results suggest that the proposed approach performs better in terms of classification accuracy and it is 10 times faster than existing approaches.
In bioinformatics, biclustering is a crucial optimization task that can reveal hidden patterns and identify groups of genes that behave similarly under certain conditions. This study aims to efficiently identify high-...
详细信息
In bioinformatics, biclustering is a crucial optimization task that can reveal hidden patterns and identify groups of genes that behave similarly under certain conditions. This study aims to efficiently identify high-quality and cohesive biclusters that share common characteristics across two data dimensions. To achieve this, we propose the first biclustering approach that utilizes Multi-objective Differential Evolution (DE), which is a novel technique for gene group discovery. Additionally, we introduce the Biclustering Binary Differential Evolution (BBDE), a new mutation operator that combines node addition and deletion, guided by an adaptive factor F. We thoroughly tested our method's effectiveness taking into account biological relevance, noise, overlap resistance, and statistics. We compared our results to state of the art algorithms using both synthetic and real datasets like Yeast Cell Cycle, Saccharomyces cerevisiae, and Human B Cell. Our algorithms outperformed the comparisons and effectively identified significant biclusters.
Today, high-dimensional data have become one of the most important challenges in machine learning. Among thousands of features which exist in such data, some are redundant or unrelated and selecting a few of them impr...
详细信息
ISBN:
(纸本)9781538649787
Today, high-dimensional data have become one of the most important challenges in machine learning. Among thousands of features which exist in such data, some are redundant or unrelated and selecting a few of them improves classifier performance. micro-array data which are one of the most important high-dimensional data in medicine have a large number of features and a few number of samples. Thus, old simple methods can be used to select features of such data effectively. Among several methods which have been proposed for selecting features of high-dimensional data, Swarm intelligence-based methods have attracted attentions more than ever. These methods are suitable to solve time-consuming and complex problems such that they search near-optimal solution with desirable computational cost. In this paper, a filter based Swarm intelligence-based search method based on Improved Binary Gravitational Search Algorithm (IBSGA) is proposed to integrate filter approaches with Swarm intelligence-based methods to improve feature selection process in micro-array data. The proposed method is applied to 5 high-dimensional micro-array databases and the obtained results are compared with one of the up-to-date methods used for feature selection in micro-array data. Experimental results verify efficiency of the proposed algorithm.
Nowadays, with the emergence of high-dimensional data, feature selection plays an important role in the domain of machine learning, particularly, classification problems, such that feature selection can be known as it...
详细信息
ISBN:
(纸本)9781509043309
Nowadays, with the emergence of high-dimensional data, feature selection plays an important role in the domain of machine learning, particularly, classification problems, such that feature selection can be known as its vital and irremovable component. With the increase in the number of data dimensions, simple traditional methods show poor performance and cannot be used for effective and proper feature selection. Using embedded methods, this study first discusses data dimension reduction using a filter based approach. Two state-of-the-art meta-heuristic methods are then applied on the selected features and final desirable features are selected from the aggregation of their selected features. The proposed method is evaluated on 5 high-dimensional micro-array datasets and results are compared with several state-of-the-art feature selection approaches for high-dimensional data. Experimental results confirm the efficiency of the proposed method.
Background: Photosynthetic (PS) gene expression in Rhodobacter sphaeroides is regulated in response to changes in light and redox conditions mainly by PrrB/A, FnrL and AppA/PpsR systems. The PrrB/A and FnrL systems ac...
详细信息
Background: Photosynthetic (PS) gene expression in Rhodobacter sphaeroides is regulated in response to changes in light and redox conditions mainly by PrrB/A, FnrL and AppA/PpsR systems. The PrrB/A and FnrL systems activate the expression of them under anaerobic conditions while the AppA/PpsR system represses them under aerobic conditions. Recently, two mathematical models have been developed for the AppA/PpsR system and demonstrated how the interaction between AppA and PpsR could lead to a phenotype in which PS genes are repressed under semi-aerobic conditions. These models have also predicted that the transition from aerobic to anaerobic growth mode could occur via a bistable regime. However, they lack experimentally quantifiable inputs and outputs. Here, we extend one of them to include such quantities and combine all relevant micro-array data publically available for a PS gene of this bacterium and use that to parameterise the model. In addition, we hypothesise that the AppA/PpsR system alone might account for the observed trend of PS gene expression under semi-aerobic conditions. Results: Our extended model of the AppA/PpsR system includes the biological input of atmospheric oxygen concentration and an output of photosynthetic gene expression. Following our hypothesis that the AppA/PpsR system alone is sufficient to describe the overall trend of PS gene expression we parameterise the model and suggest that the rate of AppA reduction in vivo should be faster than its oxidation. Also, we show that despite both the reduced and oxidised forms of PpsR binding to the PS gene promoters in vitro, binding of the oxidised form as a repressor alone is sufficient to reproduce the observed PS gene expression pattern. Finally, the combination of model parameters which fit the biological data well are broadly consistent with those which were previously determined to be required for the system to show (i) the repression of PS genes under semi-aerobic conditions, and (ii) bista
micro-array data are typically characterized by high dimensional features with a small number of samples. Several problems in identifying genes causing diseases from micro-array data can be transformed into the proble...
详细信息
micro-array data are typically characterized by high dimensional features with a small number of samples. Several problems in identifying genes causing diseases from micro-array data can be transformed into the problem of classifying the features extracted from gene expression in microarraydata. However, too many features can cause low prediction accuracy as well as high computational complexity. Dimensional reduction is a method to eliminate irrelevant features to improve the prediction accuracy. Typically, the eigenvalues or dimensional data variance from principal component analysis are used as criteria to select relevant features. This approach is simple but not efficient since it does not concern the degree of data overlap in each dimension in the feature space. A new method to select relevant features based on degree of dimensional data overlap with proper feature selection was introduced. Furthermore, our study concentrated on small sized data sets which usually occur in reality. The experimental results signified that this new approach can achieve substantially higher prediction accuracy when compared with other methods. (C) 2015 Elsevier Ltd. All rights reserved.
Modern high-throughput technologies allow us to simultaneously measure the expressions of a huge number of candidate predictors, some of which are likely to be associated with survival. One difficult task is to search...
详细信息
Modern high-throughput technologies allow us to simultaneously measure the expressions of a huge number of candidate predictors, some of which are likely to be associated with survival. One difficult task is to search among an enormous number of potential predictors and to correctly identify most of the important ones, without mistakenly identifying too many spurious associations. Mere variable selection is insufficient, however, for the information from the multiple predictors must be intelligently combined and calibrated to form the final composite predictor. Many commonly used procedures overfit the training data, miss many important predictors, or both. Although it is impossible to simultaneously adjust for a huge number of predictors in an unconstrained way, we propose a method that offers a middle ground where some partial multivariate adjustments can be made in an adaptive fashion, regardless of the number of candidate predictors. We demonstrate the performance of our proposed procedure in a simulation study within the Cox proportional hazards regression framework, and we apply our new method to a publicly available data set to construct a novel prognostic gene signature for breast cancer survival. less
In this paper, we propose a novel model, namely g-Cluster, to mine biologically significant co-regulated gene clusters. The proposed model can (1) discover extra co-expressed genes that cannot be found by current patt...
详细信息
ISBN:
(纸本)3540362975
In this paper, we propose a novel model, namely g-Cluster, to mine biologically significant co-regulated gene clusters. The proposed model can (1) discover extra co-expressed genes that cannot be found by current pattern/tendency-based methods, and (2) discover inverted relationship overlooked by pattern/tendency-based methods. We also design two tree-based algorithms to mine all qualified g-Clusters. The experimental results show: (1) our approaches are effective and efficient, and (2) our approaches can find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance.
Mining large data & deriving meaning from the mined data in Bioinformatics is a computationally intensive & relevant issue. In this paper we present an efficient algorithm to cluster genes into similar functio...
详细信息
ISBN:
(纸本)0780375769
Mining large data & deriving meaning from the mined data in Bioinformatics is a computationally intensive & relevant issue. In this paper we present an efficient algorithm to cluster genes into similar functional groups. This is a technique for extracting and characterizing rhythmic expression profiles from genome-wide DNA micro-array hybridization data. These patterns are clues to discovering rhythmic genes implicated in cell-cycle, circadian, or other biological processes. These functionalities are defined in the paper (anti-correlated, similar time expression etc). We present a signal-processing approach to this problem. We also explore an information theoretic criterion for. identifying those genes exhibiting maximum variation in behavior. The genes, are, clustered and then relationships are derived for the proposition of a temporal cell-cycle model governing regulatory behavior. We are presently considering the Human Fibroblast and Yeast data set for analysis.
暂无评论