检索结果-内蒙古大学图书馆

arXiv 2020年

作者： Hunt, Emily L. Reffert, Sabine Landessternwarte Zentrum für Astronomie der Universität Heidelberg Königstuhl 12 Heidelberg69117 Germany

Context. The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellite, but other works have also reported that many open clusters discovered in the pre Gaia era may be associations. Aims. We aim to conduct a comparison of clustering algorithms used to detect open clusters, attempting to statistically quantify their strengths and weaknesses by deriving the sensitivity, specificity, and precision of each as well as their true positive rate against a larger sample. Methods. We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further study, owing to their speed and appropriateness for use with Gaia data. We developed a preprocessing pipeline for Gaia data and developed the algorithms further for the specific application to open clusters. We derived detection rates for all 1385 open clusters in the fields in our study as well as more detailed performance statistics for 100 of these open clusters. Results. DBSCAN was sensitive to 50% to 62% of the true positive open clusters in our sample, with generally very good specificity and precision. HDBSCAN traded precision for a higher sensitivity of up to 82%, especially across different distances and scales of open clusters. Gaussian mixture models were slow and only sensitive to 33% of open clusters in our sample, which tended to be larger objects. Additionally, we report on 41 new open cluster candidates detected by HDBSCAN, three of which are closer than 500 pc. Conclusions. When used with additional post-processing to mitigate its false positives, we have found that HDBSCAN is the most sensitive and effective algorithm for recovering open clusters in Gaia data. Our results suggest that many more new and already reported open clusters have yet to be detected in Gaia data. © 2020, CC BY.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

A Rapid Review of clustering algorithms

arXiv

引用

arXiv 2024年

作者： Yin, Hui Aryani, Amir Petrie, Stephen Nambissan, Aishwarya Astudillo, Aland Cao, Shengyuan Swinburne University of Technology Victoria Australia Australian National University Canberra Australia

clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. Numerous clustering algorithms exist, with ongoing developments introducing new ones. Each algorithm possesses its own set of strengths and weaknesses, and as of now, there is no universally applicable algorithm for all tasks. In this work, we analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers and application area. This classification facilitates researchers in understanding clustering algorithms from various perspectives and helps them identify algorithms suitable for solving specific tasks. Finally, we discussed the current trends and potential future directions in clustering algorithms. We also identified and discussed open challenges and unresolved issues in the *** Codes 68-02 © 2024, CC BY.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Packet-pair behavior in wired and 802.11-type wireless connection and the use of data clustering algorithms for dispersion-mode tracking

Packet-pair behavior in wired and 802.11-type wireless conne...

引用

Proceedings of the International Convention MIPRO

作者： M. Hosseinpour M. J. Tunnicliffe Faculty of Computing Information Systems and Mathematics Kingston University Surrey UK

Packet-pair bandwidth probing in wired-cum-wireless network paths was tested and analyzed in a C++ simulation environment using link models verified alongside Opnet results. Some major differences were noted between these results and those of pure wired scenarios investigated in earlier work. Attempts were made to use a dynamic Gaussian-mix algorithm to identify data clusters within the bandwidth distribution.

关键词： clustering algorithms Bandwidth Delay Ethernet networks Probes Testing Throughput Dispersion Information systems Mathematics

来源：评论

学校读者我要写书评

暂无评论

Data Analysis on Student's Performance based on Health status using Genetic Algorithm and clustering algorithms

Data Analysis on Student's Performance based on Health statu...

引用

International Conference on Computing Methodologies and Communication (ICCMC)

作者： V. Preetha SRI SRNM COLLEGE Tamilnadu

Data analysis is the emerging research field that relies on methods and techniques to make insights on the data sets. Data analysis on student's academic Performance based on their Health status such as nutritious food intake, hygienic life style and frequency of health issues is the main objective of the research. The datasets were obtained by Questionnaire method and the analysis were carried out initially with clustering algorithms such as K-means algorithm, Hierarchical clustering and EM Method. In the second phase, Genetic search was performed and the outputs were generated. The statistical output representation for the important attributes are given using orange software. The algorithmic Experimental setup was also carried out with weka datamining tool on student's dataset that has 113 instances and 93 attributes. The findings of the research work were that K-means algorithm outperformed well when compared with EM method and Hierarchical clustering. Genetic search method predicted correlated attributes for the selected class attribute and the outputs are generated. The statistical data analysis shows that nutrition and health issues of female students has an impact on the academic performance of students.

关键词： Data analysis Search methods Software algorithms clustering algorithms Tools Prediction algorithms Genetics

来源：评论

学校读者我要写书评

暂无评论

Comparing clustering algorithms on wisconsin data set

Comparing clustering algorithms on wisconsin data set

引用

IEEE Signal Processing and Communications Applications (SIU)

作者： Mücahit Erken Havacılık ve Uzay Teknolojileri Enstitüsü Hava Harp Okulu İstanbul Türkiye

Amount and diversity of data produced and processed has been dramatically increased parallel to improvements in technology. Unfortunately produced data usually don't have any labels which may make the classification and building information process more easily. This resulted with higher importance on data clustering for builing information. In this work K-Means, Spectral clustering and Girvan-Newman algorithms has been studied and compared on Breaast Cancer Wisconsin Data Set (BCWDS).

关键词： clustering algorithms Algorithm design and analysis Art Reactive power Conferences Machine learning algorithms Cancer

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Benchmarking of clustering algorithms with gene datasets

Performance Analysis and Benchmarking of Clustering Algorith...

引用

International Conference on Advances in Science, Engineering and Robotics Technology

作者： Meskat Jahan Mahmudul Hasan Computer Science and Engineering Rangamati Science and Technology University Rangamati Bangladesh Computer Science and Engineering Comilla University Cumilla Bangladesh

clustering is the identification of similar data from a rough or scaled or transformed data and grouping into clusters. Cluster shows symmetry and asymmetry of data and its relations. In this paper, comparisons of three fuzzy clustering algorithms and two conventional clustering algorithms are represented. The analysis is conducted on four datasets which include three gene expression datasets. Here, clustering performance is evaluated using both internal and external validation measurements and an attempt for searching the optimum number of the cluster has taken. This analysis provides an effective way of selecting a suitable algorithm for a particular dataset among different hardcore and soft-core clustering approaches.

关键词： clustering algorithms Partitioning algorithms Iris Breast cancer Classification algorithms Indexing

来源：评论

学校读者我要写书评

暂无评论

Evaluating the Performance of Hierarchical clustering algorithms to Detect Spatio-Temporal Crime Hot-Spots

Evaluating the Performance of Hierarchical Clustering algori...

引用

International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)

作者： Anees Baqir Sami ul Rehman Sayyam Malik Faizan ul Mustafa Usman Ahmad Faculty of Computing and IT University of Sialkot Sialkot Pakistan

ISBN: (数字)9781728149707

ISBN: (纸本)9781728149714

The constant growth in urbanization is a cause of significant social and economical transformations in urban areas. Areas where crime rates are above the normal level, are known as crime hot-spots. The increase in urban population is posing challenges related to the management, services and safety from criminal activities. It is important to keep an eye on criminal activities and for the law enforcement agencies, being able to provide much needed safety of public is an increasingly complex task. This complex task can be handled by new technologies which can help these agencies to effectively analyze and understand the different crime trends and patterns with respect to their geographic locations. This paper uses Hierarchical Density-based spatial clustering of applications with noise (HDBSCAN) to find spatio-temporal crime hot-spots by clustering and the results shows that this technique outperforms others.

关键词： Urban areas clustering algorithms Law enforcement Random access memory Safety Market research Kernel

来源：评论

学校读者我要写书评

暂无评论

Semantic explorative evaluation of document clustering algorithms

Semantic explorative evaluation of document clustering algor...

引用

Federated Conference on Computer Science and Information Systems (FedCSIS)

作者： Hung Son Nguyen Sinh Hoa Nguyen Wojciech Świeboda Institute of Mathematics The University of Warsaw Warsaw Poland

In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, which is based on the Minimal Description Length (MDL) principle. In fact this proposed measure, called SEE (Semantic Evaluation by Exploration), is an improvement of the existing evaluation methods such as Rand Index or Normalized Mutual Information. It fixes some of weaknesses of the original methods. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. This paper is a part of the research on designing and developing a dialog-based semantic search engine for SONCA system which is a part of the SYNAT project. We compare different semantic techniques for search result clustering using the proposed measure.

关键词： clustering algorithms Extraterrestrial measurements Semantics Moon Decision trees Biomedical measurement Indexes

来源：评论

学校读者我要写书评

暂无评论

K-means and fuzzy relational eigenvector centrality-based clustering algorithms for defensive islanding

K-means and fuzzy relational eigenvector centrality-based cl...

引用

IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT Europe)

作者： Mohammed Mahdi V. M. Istemihan Genc Department of Electrical Engineering Istanbul Technical University Istanbul Turkey

ISBN: (纸本)9781509033591

Among the power system corrective controls, defensive islanding is considered as the last resort to secure the system from severe cascading contingencies. The primary motive of defensive islanding is to limit the affected areas to maintain the stability of the resulting subsystems and to reduce the total loss of load in the system. The slow coherency based islanding can successfully be applied for the defensive islanding. In this paper, two partitioning methods are proposed, K-means clustering algorithm and fuzzy relational eigenvector centrality-based clustering algorithm. The proposed methods are using the data measured by phasor measurement units to determine the islands to be used in the defensive islanding. The proposed methods are demonstrated on the 16-generator 68-bus power system and their performances are discussed as their results are compared.

关键词： clustering algorithms Power system stability Generators Islanding Partitioning algorithms Algorithm design and analysis Phasor measurement units

来源：评论

学校读者我要写书评

暂无评论

A method for evaluating the quality of string dissimilarity measures and clustering algorithms for EST clustering

A method for evaluating the quality of string dissimilarity ...

引用

IEEE Symposium on Bioinformatics and Bioengineering (BIBE)

作者： J. Zimmermann Z. Liptak S. Hazelhurst Research Group Algorithms Data Structures and Applications Institute of Theoretical Computer Science ETH Zurich Zurich Switzerland Technische Fakultät AG Genominformatik Universität Bielefeld Bielefeld Germany School of Computer Science University of the Witwatersrand Johannesburg South Africa

We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim (EST simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST (evaluator for clusterings of ESTs), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones.

关键词： clustering algorithms Computational modeling Pollution measurement Sequences Computer science Bioinformatics Data structures Genomics Africa DNA

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：