检索结果-内蒙古大学图书馆

Visualization methods for statistical analysis of microarray clusters

BMC BIOINFORMATICS 2005年第1期6卷 1-10页

作者： Hibbs, MA Dirksen, NC Li, K Troyanskaya, OG Princeton Univ Dept Comp Sci Princeton NJ 08544 USA Lewis Sigler Inst Integrat Genom Princeton NJ 08544 USA

Background: The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. Results: We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets) and is available at http://***/GeneVAnD. Conclusion: Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.

关键词： cluster algorithm Microarray Data Singular Value Decomposition Visualization Method cluster Quality

来源：评论

学校读者我要写书评

暂无评论

Reproducible clusters from microarray research: Whither?

引用

BMC BIOINFORMATICS 2005年第s2期6卷 S10-S10页

作者： Garge, NR Page, GP Sprague, AP Gorman, BS Allison, DB Univ Alabama Birmingham Dept Biostat Sect Stat Genet Birmingham AL 35294 USA Univ Alabama Birmingham Dept Comp & Informat Sci Birmingham AL 35294 USA Hofstra Univ Dept Psychol Hempstead NY 11550 USA Med Coll Georgia Augusta GA 30912 USA

Motivation: In cluster analysis, the validity of specific solutions, algorithms, and procedures present significant challenges because there is no null hypothesis to test and no 'right answer'. It has been noted that a replicable classification is not necessarily a useful one, but a useful one that characterizes some aspect of the population must be replicable. By replicable we mean reproducible across multiple samplings from the same population. Methodologists have suggested that the validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels. We used this approach to determine the performance of commonly used clustering algorithms and the degree of replicability achieved using several microarray datasets. Methods: We considered four commonly used iterative partitioning algorithms ( Self Organizing Maps (SOM), K-means, Clutsering LARge Applications (CLARA), and Fuzzy C-means) and evaluated their performances on 37 microarray datasets, with sample sizes ranging from 12 to 172. We assessed reproducibility of the clustering algorithm by measuring the strength of relationship between clustering outputs of subsamples of 37 datasets. cluster stability was quantified using Cramer's v(2) from a kXk table. Cramer's v(2) is equivalent to the squared canonical correlation coefficient between two sets of nominal variables. Potential scores range from 0 to 1, with 1 denoting perfect reproducibility. Results: All four clustering routines show increased stability with larger sample sizes. K-means and SOM showed a gradual increase in stability with increasing sample size. CLARA and Fuzzy C-means, however, yielded low stability scores until sample sizes approached 30 and then gradually increased thereafter. Average stability never exceeded 0.55 for the four clustering routines, even at a sample size of 50. These findings suggest several plausible scenarios: ( 1) microarray datasets lack natural clustering structure thereby

关键词： cluster algorithm Real Dataset cluster Solution Simulated Dataset Microarray Dataset

来源：评论

学校读者我要写书评

暂无评论

GenClust:: A genetic algorithm for clustering gene expression data -: art. no. 289

引用

BMC BIOINFORMATICS 2005年第1期6卷 289-289页

作者： Di Gesú, V Giancarlo, R Lo Bosco, G Raimondi, A Scaturro, D Univ Palermo Dipartimento Matemat & Applicaz I-90123 Palermo Italy

Background: clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results: GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update;(b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion: Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures;i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

关键词： cluster algorithm Internal Variance True Solution cluster Solution Rand Index

来源：评论

学校读者我要写书评

暂无评论

Shortest triplet clustering: reconstructing large phylogenies using representative sets

引用

BMC BIOINFORMATICS 2005年第1期6卷 92-92页

作者： Vinh, LS von Haeseler, A Heinrich-Heine-Universität Düsseldorf WE Informatik Universitätstr. 1 D-040225 Düsseldorf Germany Forschungszentrum Jülich Germany

Background: Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. Results: We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k- representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n2) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at http://***/software/stc/. Conclusion: The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy.

关键词： Path Length cluster algorithm Average Path Length Stochastic Error Large Simulation

来源：评论

学校读者我要写书评

暂无评论

Colloidal stabilization via nanoparticle halo formation

引用

Physical Review E 2005年第6期72卷 061401-061401页

作者： Jiwen Liu Erik Luijten []Department of Materials Science and Engineering and Frederick Seitz Materials Research Laboratory University of Illinois at Urbana-Champaign Urbana Illinois 61801 USA

We present a detailed numerical study of effective interactions between micrometer-sized silica spheres, induced by highly charged zirconia nanoparticles. It is demonstrated that the effective interactions are consistent with a recently discovered mechanism for colloidal stabilization. In accordance with the experimental observations, small nanoparticle concentrations induce an effective repulsion that counteracts the intrinsic van der Waals attraction between the colloids and thus stabilizes the suspension. At higher nanoparticle concentrations an attractive potential is recovered, resulting in reentrant gelation. Monte Carlo simulations of this highly size-asymmetric mixture are made possible by means of a geometric cluster Monte Carlo algorithm. A comparison is made to results obtained from the Ornstein-Zernike equations with the hypernetted-chain closure.

关键词： HARD-SPHERE MIXTURES cluster algorithm ATTRACTION SIMULATION DEPLETION FORCES

来源：评论

学校读者我要写书评

暂无评论

Efficient Simulation of Resistively Shunted Josephson Junctions

引用

Physical Review Letters 2005年第6期95卷 060201-060201页

作者： Philipp Werner Matthias Troyer Institut für theoretische Physik ETH Hönggerberg CH-8093 Zürich Switzerland

We present a cluster algorithm for resistively shunted Josephson junctions or similar physical systems, which dramatically improves sampling efficiency, and apply it to the superconductor-to-metal transition in a single junction. Measuring the temperature dependence of the zero bias resistance, we confirm that the critical point does not depend on the strength of the Josephson coupling. However, we find that the correlation exponents vary continuously along the phase boundary, indicating that the Schmid-Bulgadaev transition is a line of fixed points.

关键词： .Phase Boundary junctions cluster algorithm Resistively shunted

来源：评论

学校读者我要写书评

暂无评论

RBF neural network with optimal selection cluster algorithm and its application

RBF neural network with optimal selection cluster algorithm ...

引用

4th World Congress on Intelligent Control and Automation

作者： Liu, TN Guan, XZ Liu, ZY Xie, AH Zhang, H Daqing Petr Inst Dept Automat & Control Engn Anda 151400 Heilongjiang Peoples R China

ISBN: (纸本)0780372689

In this paper, it is framed a model of RBF neural network (RBFNN) to solve identification of nonlinear systems. First, it is proposed a kind of optimal selection cluster algorithm. By this algorithm, it is optimally gained the hidden layer node number of RBFNN in terms of input samples. At the same time, it is obtained the initial parameters values of RBF. Then, it is estimated the parameters value of RBF by gradient algorithm with momentum terms, and identified the weights of RBFNN by recursive least square algorithm. With the above two algorithms, it is alternately iterated. By the above hybrid algorithms, it is not only raised identification precision of RBFNN, but also improved generalization property of the net. It is proved the validity of the scheme by its applications.

关键词： RBF neural network optimal selection cluster algorithm identification gradient algorithm recursive least square algorithm

来源：评论

学校读者我要写书评

暂无评论

Incremental genetic K-means algorithm and its application in gene expression data analysis

引用

BMC BIOINFORMATICS 2004年第1期5卷 172-172页

作者： Lu, Y Lu, SY Fotouhi, F Deng, YP Brown, SJ Univ So Mississippi Dept Biol Sci Hattiesburg MS 39406 USA Wayne State Univ Dept Comp Sci Detroit MI 48202 USA Kansas State Univ Div Biol Manhattan KS 66506 USA

Background: In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. Results: In this paper, we propose a new clustering algorithm, Incremental Genetic K-means algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at http://***/proj/FGKA/***. Conclusions: Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster.

关键词： cluster algorithm Functional Category Mutation Operator Mutation Probability Gene Expression Data Analysis

来源：评论

学校读者我要写书评

暂无评论

Dynamic critical behavior of the Swendsen-Wang algorithm for the three-dimensional Ising model

引用

NUCLEAR PHYSICS B 2004年第3期691卷 259-291页

作者： Ossola, G Sokal, AD NYU Dept Phys New York NY 10003 USA

We have performed a high-precision Monte Carlo study of the dynamic critical behavior of the Swendsen-Wang algorithm for the three-dimensional Ising model at the critical point. For the dynamic critical exponents associated to the integrated autocorrelation times of the "energy-like" observables, we find z(int,N) = z(int,epsilon) = z(int,epsilon')= 0.459 +/- 0.005 +/- 0.025, where the first error bar represents statistical error (68% confidence interval) and the second error bar represents possible systematic error due to corrections to scaling (68% subjective confidence interval). For the "susceptibility-like" observables, we find z(int,M2) = z(int,S2) = 0.443 +/- 0.005 +/- 0.030. For the dynamic critical exponent associated to the exponential autocorrelation time, we find z(exp) approximate to 0.481. Our data are consistent with the Coddington-Baillie conjecture z(SW) = beta/v approximate to 0.5183, especially if it is interpreted as referring to z(exp). (C) 2004 Elsevier B.V. All rights reserved.

关键词： Ising model Potts model Swendsen-Wang algorithm cluster algorithm Monte Carlo autocorrelation time dynamic critical exponent

来源：评论

学校读者我要写书评

暂无评论

Consensus clustering and functional interpretation of gene-expression data

引用

GENOME BIOLOGY 2004年第11期5卷 1-16页

作者： Swift, S Tucker, A Vinciotti, V Martin, N Orengo, C Liu, XH Kellam, P UCL Windeyer Inst Dept Infect Virus Genom & Bioinformat Grp London W1T 4JF England Brunel Univ Dept Informat Syst & Comp Uxbridge UB8 3PH Middx England Univ London Birkbeck Coll Sch Comp Sci & Informat Syst London WC1E 7HX England UCL Dept Biochem & Mol Biol London WC1E 6BT England

Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFkappaB and the unfolded protein response in certain B-cell lymphomas.

关键词： cluster algorithm Simulated Annealing Synthetic Dataset Consensus cluster Robust cluster

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：