检索结果-内蒙古大学图书馆

Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY 2012年第1期2012卷 1-18页

作者： Durston, kirk k. Chiu, David k. Y. Wong, Andrew k. C. Li, Gary C. L. 1.School of Computer Science University of Guelph 50 Stone Road East Guelph ON N1G 2W1 Canada 2.Department of System Design Engineering University of Waterloo 200 University Ave. W Waterloo ON N2L 3G1 Canada

Background: Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results: The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Co

关键词： k-modes algorithm Site cluster Associations Ubiquitin Transthyretin Pattern discovery Cluster tree Attribute clustering Protein structural sub-domains

来源：评论

学校读者我要写书评

暂无评论

On the impact of dissimilarity measure in k-modes clustering algorithm

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2007年第3期29卷 503-507页

作者： Ng, Michael k. Li, Mark Junjie Huang, Joshua Zhexue He, Zengyou Hong Kong Baptist Univ Dept Math Hong Kong Hong Kong Peoples R China Univ Hong Kong E Business Technol Inst Hong Kong Hong Kong Peoples R China Harbin Inst Technol Dept Comp Sci & Engn Harbin 150001 Peoples R China

This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in [4], [12] which allows the use of the k- modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k- modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework.

关键词： data mining clustering k-modes algorithm categorical data

来源：评论

学校读者我要写书评

暂无评论

A new initialization method for categorical data clustering

引用

EXPERT SYSTEMS WITH APPLICATIONS 2009年第7期36卷 10223-10228页

作者： Cao, Fuyuan Liang, Jiye Bai, Liang Shanxi Univ Sch Comp & Informat Technol Taiyuan 030006 Shanxi Peoples R China Minist Educ Res Key Lab Computat Intelligence & Chinese Informat Taiyuan 030006 Peoples R China

In clustering algorithms, choosing a subset of representative examples is very important in data set. Such "exemplars" can be found by randomly choosing an initial subset of data objects and then iteratively refining it, but this works well only if that initial choice is close to a good solution. In this paper, based on the frequency of attribute values, the average density of an object is defined. Furthermore, a novel initialization method for categorical data is proposed, in which the distance between objects and the density of the object is considered. We also apply the proposed initialization method to k-modes algorithm and fuzzy k-modes algorithm. Experimental results illustrate that the proposed initialization method is superior to random initialization method and can be applied to large data sets for its linear time complexity with respect to the number of data objects. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： Density Distance Initialization method Initial cluster center k-modes algorithm

来源：评论

学校读者我要写书评

暂无评论

Genetic k-modes based DNA Splice Site Adjacent sequences Feature Analysis

Genetic K-modes based DNA Splice Site Adjacent sequences Fea...

引用

7th World Congress on Intelligent Control and Automation

作者： Zhang, Quanwei Peng, Qinke Sun, Hequan Li, kankan Xi An Jiao Tong Univ State Key Lab Mfg Syst Engn Xian 710049 Peoples R China Xi An Jiao Tong Univ Sch Elect & Informat Engn Xian 710049 Peoples R China

ISBN: (纸本)9781424421138

DNA splice site adjacent sequences have remarkable conservative feature, and mining their underlying biological knowledge has become a key issue in the field of DNA sequences analysis. In this paper, we analyze the feature of human being's DNA splice site adjacent sequences. Firstly, we propose a kind of DNA splice site sequences clustering method based on Genetic k-odes, secondly, we analyze the frequency of various bases, di-bases and tri-bases about the experimental data set and each cluster, lastly, we propose one kind of Markov model based frequent patterns discovery algorithm and use it to mine the frequent patterns of the experimental data set and each cluster.

关键词： clustering genetic algorithm k-modes algorithm splice site Markov model

来源：评论

学校读者我要写书评

暂无评论

Genetic k-modes based DNA Splice Site Adjacent sequences Feature Analysis

Genetic K-modes based DNA Splice Site Adjacent sequences Fea...

引用

7th World Congress on Intelligent Control and Automation (WCICA 2008), vol.6

作者： Quanwei Zhang Qinke Peng Hequan Sun kankan Li State Key Laboratory for Manufacturing Systems Engineering and School of Electronic and Information Engineering Xi''an Jiaotong University Xi'an China

DNA splice site adjacent sequences have remarkable conservative feature, and mining their underlying biological knowledge has become a key issue in the field of DNA sequences analysis. In this paper, we analyze the feature of human being's DNA splice site adjacent sequences. Firstly, we propose a kind of DNA splice site sequences clustering method based on Genetic k-modes;secondly, we analyze the frequency of various bases, di-bases and tri-bases about the experimental data set and each cluster;lastly, we propose one kind of Markov model based frequent patterns discovery algorithm and use it to mine the frequent patterns of the experimental data set and each cluster.

关键词： Clustering Genetic algorithm k-modes algorithm Splice site Markov model

来源：评论

学校读者我要写书评

暂无评论

A k-populations algorithm for clustering categorical data

引用

PATTERN RECOGNITION 2005年第7期38卷 1131-1134页

作者： kim, DW Lee, k Lee, D Lee, kH Korea Adv Inst Sci & Technol Dept BioSyst Taejon 305701 South Korea Korea Adv Inst Sci & Technol Adv Informat Technol Res Ctr Taejon 305701 South Korea Korea Adv Inst Sci & Technol Dept Elect Engn & Comp Sci Taejon 305701 South Korea

In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments. (c) 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

关键词： clustering categorical data hierarchical algorithm k-modes algorithm fuzzy k-modes algorithm

来源：评论

学校读者我要写书评

暂无评论

Fuzzy clustering of categorical data using fuzzy centroids

引用

PATTERN RECOGNITION LETTERS 2004年第11期25卷 1263-1271页

作者： kim, DW Lee, kH Lee, D Korea Adv Inst Sci & Technol Dept Elect Engn & Comp Sci Taejon 305701 South Korea Korea Adv Inst Sci & Technol Dept Biosyst Taejon 305701 South Korea Korea Adv Inst Sci & Technol Adv Informat Technol Res Ctr Taejon 305701 South Korea

In this paper the conventional fuzzy k-modes algorithm for clustering categorical data is extended by representing the clusters of categorical data with fuzzy centroids instead of the hard-type centroids used in the original algorithm. Use of fuzzy centroids makes it possible to fully exploit the power of fuzzy sets in representing the uncertainty in the classification of categorical data. To test the proposed approach, the proposed algorithm and two conventional algorithms (the k-modes and fuzzy k-modes algorithms) were used to cluster three categorical data sets. The proposed method was found to give markedly better clustering results. (C) 2004 Elsevier B.V. All rights reserved.

关键词： fuzzy clustering k-modes algorithm fuzzy k-modes algorithm categorical data fuzzy centroid

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：