Microarray technology enables the simultaneous monitoring of the expression pattern of a huge number of genes across different experimental conditions. Biclustering in microarray data is an important technique that di...
详细信息
Microarray technology enables the simultaneous monitoring of the expression pattern of a huge number of genes across different experimental conditions. Biclustering in microarray data is an important technique that discovers a group of genes that are coregulated in a subset of conditions. Biclustering algorithms require to identify coherent and nontrivial biclusters, i.e., the biclusters should have low mean squared residue and high row variance. A multiobjective genetic biclustering technique is proposed here that optimizes these objectives simultaneously. A novel encoding scheme that uses variable chromosome length is developed. Moreover, a new quantitative measure to evaluate the goodness of the biclusters is proposed. The performance of the proposed algorithm has been evaluated on both simulated and real-life gene expression datasets, and compared with some other well-known biclustering techniques.
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of ...
详细信息
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunn's index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated. (C) 2009 Elsevier Inc. All rights reserved.
In this article, the effectiveness of variable string length genetic algorithm along with a recently developed fuzzy cluster validity index (PBMF) has been demonstrated for clustering a data set into an unknown number...
详细信息
In this article, the effectiveness of variable string length genetic algorithm along with a recently developed fuzzy cluster validity index (PBMF) has been demonstrated for clustering a data set into an unknown number of clusters. The flexibility of a variable string length Genetic Algorithm (VGA) is utilized in conjunction with the fuzzy indices to determine the number of clusters present in a data set as well as a good, fuzzy partition of the data for that number of clusters. A comparative study has been performed for different validity indices, namely, PBMF, XB, PE and PC. The results of the fuzzy VGA algorithm are compared with those obtained by the well known FCM algorithm which is applicable only when the number of clusters is fixed a priori. Moreover, another genetic clustering scheme, that also requires fixing the value of the number of clusters, is implemented. The effectiveness of the PBMF index as the optimization criterion along with a genetic fuzzy partitioning technique is demonstrated on a number of artificial and real data sets including a remote sensing image of the city of Kolkata. (c) 2005 Elsevier B.V. All rights reserved.
A methodology based on the concept of a variable string length GA (VGA) is developed for determining automatically the number of hyperplanes for modeling the class boundaries in a GA-classifier. The genetic operators ...
详细信息
A methodology based on the concept of a variable string length GA (VGA) is developed for determining automatically the number of hyperplanes for modeling the class boundaries in a GA-classifier. The genetic operators and fitness function are defined to take care of the variability in chromosome length. It is proved that the method is able to arrive at the optimal number of misclassifications after a sufficiently large number of iterations, and will need a minimal number of hyperplanes for this purpose. Experimental results on different artificial and real life data sets demonstrate that the classifier. using the concept of a variablelength chromosome, can automatically determine an appropriate value of the number of hyperplanes, and also provide performance better than that of the fixed length version. Its comparison with another approach using a VGA is provided. (C) 1998 Elsevier Science B.V. All rights reserved.
The problem of classifying an image into different homogeneous regions is viewed as a task of clustering the pixels in the intensity space. In this letter, a newly developed genetic clustering technique is used for au...
详细信息
The problem of classifying an image into different homogeneous regions is viewed as a task of clustering the pixels in the intensity space. In this letter, a newly developed genetic clustering technique is used for automatically segmenting remote sensing satellite images. Each cluster is divided into several small hyperspherical subclusters, and the centers of all these small subclusters are encoded in a chromosome to represent the whole clustering. For assigning points to different clusters, these local subclusters are considered individually. For the purpose of objective function evaluation, these subclusters are merged appropriately to form a variable number of global clusters. A newly proposed point-symmetry-distance-based cluster validity index, Sym index, is used as a measure of the validity of the corresponding segment. The effectiveness of the proposed technique compared to a fuzzy C-means clustering technique, a recently proposed GAPS clustering with Sym-index-based method, and a subtractive clustering technique is demonstrated in identifying different land cover regions from two numeric image data sets and a remote sensing image of a part of the city of Kolkata.
Objective of any biclustering algorithm in microarray data is to discover a subset of genes that are expressed similarly in a subset of conditions. The boundaries of biclusters usually overlap as genes and conditions ...
详细信息
ISBN:
(纸本)9781424418220
Objective of any biclustering algorithm in microarray data is to discover a subset of genes that are expressed similarly in a subset of conditions. The boundaries of biclusters usually overlap as genes and conditions may belong to different biclusters with different membership degrees. Hence the notion of fuzzy sets is useful for discovering such overlapping biclusters. In this article an attempt has been made to develop a multiobjective genetic algorithm based approach for probabilistic fuzzy biclustering that minimizes the residual and maximizes cluster size and expression profile variance. A novel variable string length encoding has been proposed in this regard that encodes multiple biclusters in a single string. Also a new performance measure that reflects how a bicluster is statistically distinguished from the background is proposed. Performance of the proposed algorithm has been compared with some well known biclustering algorithms.
暂无评论