检索结果-内蒙古大学图书馆

Towards improving fuzzy clustering using support vector machine: Application to gene expression data

PATTERN RECOGNITION 2009年第11期42卷 2744-2763页

作者： Mukhopadhyay, Anirban Maulik, Ujjwal Univ Kalyani Dept Comp Sci & Engn Kalyani 741235 W Bengal India Jadavpur Univ Dept Comp Sci & Engn Kolkata 700032 India

Recent advancement in microarray technology permits monitoring of the expression levels of a large set of genes across a number of time points simultaneously. For extracting knowledge from such huge volume of microarray gene expression data, computational analysis is required. Clustering is one of the important data mining tools for analyzing such microarray data to group similar genes into clusters. Researchers have proposed a number of clustering algorithms in this purpose. In this article, an attempt has been made in order to improve the performance of fuzzy clustering by combining it with support vector machine (SVM) classifier. A recently proposed real-coded variable string length genetic algorithm based clustering technique and an iterated version of fuzzy C-means clustering have been utilized in this purpose. The performance of the proposed clustering scheme has been compared with that of some well-known existing clustering algorithms and their SVM boosted versions for one simulated and six real life gene expression data sets. Statistical significance test based on analysis of variance (ANOVA) followed by posteriori Tukey-Kramer multiple comparison test has been conducted to establish the statistical significance of the superior performance of the proposed clustering scheme. Moreover biological significance of the clustering solutions have been established. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： Microarray gene expression data Fuzzy clustering Cluster validity indices variable string length genetic algorithm Support vector machines Gene ontology

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Pixel Classification in Satellite Imagery: A Two-stage Fuzzy Clustering Approach

引用

FUNDAMENTA INFORMATICAE 2008年第4期86卷 411-428页

作者： Mukhopadhyay, Anirban Maulik, Ujjwal Univ Kalyani Dept Comp Sci & Engn Kalyani 741235 W Bengal India Jadavpur Univ Dept Comp Sci & Engn Kolkata 700032 India

A popular approach for landcover classification in remotely sensed satellite images is clustering the pixels in the spectral domain into several fuzzy partitions. It has been observed that performance of the clustering algorithms deteriorate with more and more overlaps in the data sets. Motivated by this observation, in this article a two-stage fuzzy clustering algorithm is described that utilizes the concept of points having significant membership to multiple classes. The points situated in the overlapped regions of different clusters are first identified and excluded from consideration while clustering. Thereafter, these points are given class labels based on Support vector Machine classifier which is trained by the remaining points. The well known Fuzzy C-Means algorithm and some recently proposed genetic clustering schemes are utilized in the process. The effectiveness of the two-stage clustering technique has been demonstrated on IRS remote sensing satellite images of the cities of Bombay and Calcutta and compared with other well known clustering techniques. Also statistical significance test has been carried out to establish the statistical significance of the clustering results.

关键词： Unsupervised pixel classification significant multiclass membership cluster validity index variable string length genetic algorithm multiobjective genetic algorithm Support Vector Machine

来源：评论

学校读者我要写书评

暂无评论

Fuzzy symmetry based real-coded genetic clustering technique for automatic pixel classification in remote sensing imagery

引用

FUNDAMENTA INFORMATICAE 2008年第3-4期84卷 471-492页

作者： Saha, Sriparna Bandyopadhyay, Sanghamitra Indian Stat Inst Machine Intelligence Unit Kolkata 700035 W Bengal India

The problem of classifying an image into different homogeneous regions is viewed as the task of clustering the pixels in the intensity space. In particular, satellite images contain landcover types some of which cover significantly large areas, while some (e.g., bridges and roads) occupy relatively much smaller regions. Automatically detecting regions or clusters of such widely varying sizes presents a challenging task. In this paper, a newly developed real-coded variable string length genetic fuzzy clustering technique with a new point symmetry distance is used for this purpose. The proposed algorithm is capable of automatically determining the number of segments present in an image. Here assignment of pixels to different clusters is done based on the point symmetry based distance rather than the Euclidean distance. The cluster centers are encoded in the chromosomes, and a newly developed fuzzy point symmetry distance based cluster validity index, FSym-index, is used as a measure of the validity of the corresponding partition. This validity index is able to correctly indicate presence of clusters of different sizes and shapes as long as they are internally symmetrical. The space and time complexities of the proposed algorithm are also derived. The effectiveness of the proposed technique is first demonstrated in identifying two small objects from a large background from an artificially generated image and then in identifying different landcover regions in remote sensing imagery. Results are compared with those obtained using the well known fuzzy C-means algorithm both qualitatively and quantitatively.

关键词： cluster validity index fuzzy clustering symmetry point symmetry based distance Kd tree variable string length genetic algorithm remote sensing imagery

来源：评论

学校读者我要写书评

暂无评论

Efficient two-stage fuzzy clustering of microarray gene expression data

Efficient two-stage fuzzy clustering of microarray gene expr...

引用

9th International Conference on Information Technology (ICIT 2006)

作者： Mukhopadhyay, Anirban Maulik, Ujjwal Bandyopadhyay, Sanghamitra Univ Kalyani Dept Comp Sci & Engn Kalyani 741235 W Bengal India Jadavpur Univ Dept Comp Sci & Engn Kolkata 700032 W Bengal India Indian Stat Inst Machine Intelligence Unit Kolkata 700108 India

ISBN: (纸本)9780769526355

This article presents an efficient two-stage clustering method for clustering microarray gene expression time series data. The algorithm is based on the identification of genes having significant membership to multiple classes. A recently proposed variable string length genetic scheme and an iterated version of well known fuzzy C-means algorithm are utilized as the underlying clustering techniques. The performance of the two-stage clustering technique has been compared with the hierarchical clustering algorithms, those are widely used for clustering gene expression data, to prove its effectiveness on some publicly available gene expression data.

关键词： microarray gene expression data cluster validity indices fuzzy clustering significant multi-class membership variable string length genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

Active site driven ligand design: An evolution approach

引用

Journal of Bioinformatics and Computational Biology 2005年第5期3卷 1053-1070页

作者： Bandyopadhyay, Sanghamitra Bagchi, Angshuman Maulik, Ujjwal Machine Intelligence Unit Indian Statistical Institute Kolkata 700 108 India Bioinformatics Center Bose Institute Kolkata 700 054 India Department of Computer Science and Engineering Jadavpur University Kolkata 700 032 India

An evolutionary approach for designing a ligand molecule that can bind to the active site of a target protein is described in this article. An earlier attempt in this regard assumed a fixed tree structure of the ligand on both sides of the pharmacophore, and used a genetic algorithm for optimizing the van der Waals energy. However, it is evident that knowledge about the size of the tree is difficult to obtain an a priori. Moreover, it will also change from one active site to another. This limitation is overcome in the present article by using variable string length genetic algorithm (VGA) for evolving an appropriate arrangement of the basic functional units of the molecule to be designed, whose size may now vary. The crossover and mutation operators are appropriately redesigned in order to tackle the concept of variable length chromosomes. Once the geometry of the molecule is obtained, the possible three-dimensional structure and its docking energy is determined. Results are demonstrated for five different target proteins both numerically and pictorially. It is found that not only does the molecule designed using variable length representation, in general, have lower energy values, the docking energies are also lower, as compared to the molecule evolved using fixed size representation. © Imperial College Press.

关键词： Active site Ligand Docking Target protein receptor Tree structured representation variable string length genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2003年第5期41卷 1075-1081页

作者： Maulik, U Bandyopadhyay, S Kalyani Govt Engn Coll Dept Comp Sci Kalyani 741235 W Bengal India Indian Stat Inst Machine Intelligence Unit Kolkata 700108 W Bengal India

The problem of classifying an image into different homogeneous regions is viewed as the task of clustering the pixels in the intensity space. Real-coded variable string length genetic fuzzy clustering with automatic evolution of clusters is used here for this purpose. The cluster centers are encoded in the chromosomes, and the Xie-Beni index is used as a measure of the validity of the corresponding partition. The effectiveness of the proposed technique is demonstrated for classifying different landcover regions in remote sensing imagery. Results are compared with those obtained using the well-known fuzzy C-means algorithm.

关键词： cluster validity fuzzy clustering pattern recognition remote sensing imagery variable string length genetic algorithm Xie-Beni index

来源：评论

学校读者我要写书评

暂无评论

Relation between VGA-classifier and MLP: Determination of network architecture

引用

Fundamenta Informaticae 1999年第1期37卷 177-199页

作者： Bandyopadhyay, Sanghamitra Pal, Sankar K.

An analogy between a genetic algorithm based pattern classification scheme (where hyperplanes are used to approximate the class boundaries through searching) and multilayer perceptron (MLP) based classifier is established. Based on this, a method for determining the MLP architecture automatically is described. It is shown that the architecture would need almost two hidden layers, the neurons of which are responsible for generating hyperplanes and regions. The neurons in the second hidden and output layers perform the AND & OR functions respectively. The methodology also includes a post processing step which automatically removes any redundant neuron in the hidden/output layer. An extensive comparative study of the performance of the MLP, thus derived using the proposed method, with those of several other conventional MLPs is presented for different data sets.

关键词： hyperplane fitting boundary approximation hard limiting neuron network architecture design variable string length genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：