检索结果-内蒙古大学图书馆

arXiv 2022年

作者： Gagolewski, Marek Warsaw University of Technology Faculty of Mathematics and Information Science ul. Koszykowa 75 Warsaw00-662 Poland Deakin University Data to Intelligence Research Centre School of IT GeelongVIC3220 Australia

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at https://***. © 2022, CC BY.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Comparative Study of clustering algorithms in Parallel and Serial Environments

Comparative Study of Clustering Algorithms in Parallel and S...

引用

International Conference on Futuristic Technologies (INCOFT)

作者： B Hemachandran Chirla Pavan Rakesh Reddy R Kannadasan D Harsha Vardhan Reddy 4th Year B.Tech (CSE) Vellore Institute of Technology Vellore Tamil Nadu India Department of Software systems Vellore Institute of Technology Vellore Tamil Nadu India

Machine learning has become a core part of computing and has affected countless sectors with better implementations of existing systems. Machine learning algorithms use various methods to organize and learn from data and clustering is one such method. clustering as the name suggests, forms different clusters of data from the dataset based on the characteristics. However, clustering datasets could be onerous and might become worse when the number of clusters or if the number of data points is increased. Parallelizing the algorithms is one way by which the time taken can be reduced. clustering algorithms can be parallelized by optimizing the algorithm to make use of multiple CPUs or multiple cores of a single CPU by sharing the workload. This paper focuses on the performance analysis of parallelized clustering algorithms and other mainstream clustering algorithms. DBSCAN (Density-Based Spatial clustering of Applications with Noise), K-Means, Mini-Batch K-Means, Mean Shift are the chosen algorithms from different types of clustering to diversify the comparison. This paper will provide a comparative analysis of the performance between the different clustering algorithms by controlling the environment to either be single or multi-threaded.

关键词： Machine learning algorithms clustering algorithms Machine learning Performance analysis

来源：评论

学校读者我要写书评

暂无评论

Systematic review of unsupervised genomic clustering algorithms techniques for high dimensional datasets

引用

Technology Reports of Kansai University 2020年第3期62卷 355-374页

作者： Najim Adeen, Idrees Mohammed Abdulazeez, Adnan Mohsin Zeebaree, Diyar Qader Akre Technical College Duhok Polytechnic University Duhok Kurdistan Region Iraq Duhok Polytechnic University Duhok Kurdistan Region Iraq Research Center of Duhok Polytechnic University Duhok Kurdistan Region Iraq

High-dimensional data is interpreted with a considerable number of features, and new problems are presented in groups. The so-called "high dimension" is initially created to explain the common increase in time complexity of many computational problems, and therefore the performance of general aggregation algorithms is unsuccessful. Accordingly, many works focused on introducing new technologies and aggregation algorithms to process data with higher dimensions. Standard algorithms for all aggregate algorithms are the fact that they need a different essential evaluation of the similarity between data objects. However, current aggregation algorithms still have some open research problems. In this review work, they provide a summary of the results of the high-dimensional data space and its effects on different aggregation algorithms. It also provides a detailed overview of several grouping algorithms with several types: subspace methods, model-based grouping, density-based grouping methods, partition-based grouping methods, etc., including a more detailed description of the recent work of its advantages and disadvantages in Solve the problem of higher-dimensional data. The scope of future work is also discussed at the end of the work to expand existing compilation methods and algorithms. © 2020 Hamdard Foundation Pakistan. All rights reserved.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Reliability evaluation of smart grid using various classic and metaheuristic clustering algorithms considering system uncertainties

引用

INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS 2021年第6期31卷

作者： Memari, Mehran Karimi, Ali Hashemi-Dezaki, Hamed Univ Kashan Fac Elect & Comp Engn 6 Km Ghotbravandi Blvd Kashan *** Iran Univ West Bohemia UWB Fac Elect Engn Reg Innovat Ctr Elect Engn RICE Plzen Czech Republic

The reliability of the smart grid is adversely affected due to system uncertainties. Also, the steadily growing deployment of renewable distributed generation (DG) units increases the uncertainties of smart grids. Hence, it is essential to concern the uncertainties in the field of reliability evaluation of smart grids. Although the Monte Carlo simulation (MCS) has received a significant deal of consideration in the literature, there is a research gap in using the clustering algorithms to assess smart grids' reliability. This article aims to fill such a research gap by proposing a new reliability assessment method, using various clustering algorithms. The benefits from the proposed method's accuracy and fast computation are highlighted, while optimal operation, optimal short-term planning, and repetitive problems should be studied. In this paper, the performance and accuracy of various classic (k-means, fuzzy c-means, and k-medoids) and metaheuristic (genetic algorithm, particle swarm optimization, differential evolutionary, harmony search, and artificial bee colony) clustering algorithms are studied. Comparing different scenario reduction algorithms in the proposed reliability evaluation method is one of the most contributions. The proposed method is applied to two realistic test systems. Test results infer that the proposed method is adequately precise, while the required computation time is less than MCS-based approaches. Test results for both test systems imply that the accurate expected energy not supplied (EENS) with less than 2.1% is achievable applying the proposed method. The fuzzy c-means clustering algorithm results in the best accuracy among the studied classic and nonclassic (metaheuristic) algorithms.

关键词： clustering algorithms Monte Carlo simulation reliability evaluation renewable distributed generations scenario reduction smart grids

来源：评论

学校读者我要写书评

暂无评论

Fundamental clustering algorithms suite

引用

SOFTWAREX 2021年 13卷

作者： Thrun, Michael C. Stier, Quirin Philipps Univ Marburg Databion Res Grp Hans Meerwein Str 6 D-35043 Marburg Germany Philipps Univ Marburg Dept Hematol Oncol & Immunol Hans Meerwein Str 6 D-35043 Marburg Germany

The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as "Fundamental clustering Problems Suite" (FCPS) is provided. The software library is named "FCPS", available in R on CRAN and accessible within Python. The input and output of clustering algorithms are standardized to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing, FCPS provides a tool to investigate the cluster-tendency quickly before the cluster analysis itself. Common clustering challenges can be generated with an arbitrary sample size. Additionally, FCPS sums up 26 indicators intending to estimate the number of clusters and provides an appropriate implementation of the clustering accuracy for more than two clusters. (C) 2020 The Author(s). Published by Elsevier B.V.

关键词： Cluster analysis clustering algorithms Clusterability Cluster-tendency Number of clusters

来源：评论

学校读者我要写书评

暂无评论

An Approach to Study the Poverty Reduction Effect of Digital Inclusive Finance from a Multidimensional Perspective Based on clustering algorithms

引用

SCIENTIFIC PROGRAMMING 2021年第1期2021卷

作者： Zhou, Lu Wang, Huiling Tianjin Univ Finance & Econ Coll Finance Tianjin 300222 Peoples R China Tianjin Renai Coll Tianjin 301636 Peoples R China Chongqing City Management Coll Chongqing 401331 Peoples R China

The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reduction effect of digital inclusive finance in three dimensions of income, education, and healthcare and further look at the transmission mechanism of digital inclusive finance in poverty alleviation. The results indicated that digital inclusive finance exerts a poverty reduction effect in three dimensions-medical poverty, income poverty, and education poverty. Of these, the coverage breadth significantly affects the alleviation of medical poverty, the use depth significantly affects the alleviation of income poverty and education poverty, and the digitization level affects the alleviation of poverty in three dimensions. The level of regional economic development plays an intermediary role in the poverty alleviation effect of digital inclusive finance. Compared with the western region, which is relatively backward in development, the poverty reduction effect of digital inclusive finance in the eastern region is more significant.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Improving the clustering algorithms Automatic Generation Process with Cluster Quality Indexes 20th

Improving the Clustering Algorithms Automatic Generation Pro...

引用

20th International Conference on Computational Science and Its Applications (ICCSA)

作者： Montenegro, Michel Meiguins, Aruanda Meiguins, Bianchi Morais, Jefferson Fed Univ Para Comp Sci Postgrad Program BR-66075110 Belem Para Brazil

ISBN: (纸本)9783030587994;9783030587987

Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given dataset for clustering tasks. Autoclustering uses the Estimation of Distribution algorithms (EDA) evolutionary technique to create the algorithms (individuals), and the adapted CLEST method (originally determines the best number of groups for a dataset) to compute individual fitness, using a decision-tree classifier. Thus, as the motivation to improve the quality of the results generated by Autoclustering, and to avoid possible bias by the adoption of a classifier, this work proposes to increase the efficiency of the evaluation process by the addition of a quality metric based on a fusion of three quality indexes of solution clusters. The three quality indexes are Silhouette, Dunn, and Davies-Bouldin, which assess the situation Intra and Inter clusters, with algorithms based on distance and independent of the generation of the groups. A final score for a specific solution (algorithm + parameters) is the average of normalized quality metric and normalized fitness. Besides, the results of the proposal presented solutions with higher cluster quality metrics, higher fitness average, and higher diversity of generated individuals (clustering algorithms) when compared with traditional Autocluestering.

关键词： Autoclustering Cluster quality index clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Holistic Assessment of Structure Discovery Capabilities of clustering algorithms

Holistic Assessment of Structure Discovery Capabilities of C...

引用

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

作者： Hoeppner, Frank Jahnke, Maximilian Ostfalia Univ Appl Sci Dept Comp Sci D-38302 Wolfenbuttel Germany

ISBN: (纸本)9783030461508;9783030461492

Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of the structure discovery capabilities of clustering algorithms based on three criteria. We determine the robustness or stability of cluster assignments and interpret it as the confidence of the clustering algorithm in its result. This information is then used to label the data and evaluate the consistency of the stability-assessment with the notion of a cluster as an area of dense and separated data. The resulting criteria of stability, structure and consistency provide interpretable means to judge the capabilities of clustering algorithms without the typical biases of prominent indices, including the judgment of a clustering tendency.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets 1

Performance evaluation of clustering algorithms for varying ...

引用

1st International Conference on Recent Advances in Materials and Manufacturing (ICRAMM)

作者： Renjith, Shini Sreekumar, A. Jathavedan, M. Cochin Univ Sci & Technol Dept Comp Applicat Kochi 682022 Kerala India Mar Baselios Coll Engn & Technol Dept Comp Sci & Engn Thiruvananthapuram 695015 Kerala India

clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementations available in practice. A bunch of literatures can be found focusing on the quality of clustering algorithms using various internal and external evaluation techniques. The motivation behind this work is the scarcity of literatures dealing with performance of clustering algorithms in terms of turnaround time. This paper summarizes the experimental analysis conducted on the performance of multiple clustering algorithms based on cardinality and dimensionality. The analysis is performed in R, which is a free and open source programming language mainly used for statistical computing. This work evaluates nine key algorithms coming under partitioning, hierarchical, density-based and model-based clustering approaches using different social media data sets. We captured performance trends of these algorithms in terms of turnaround time by varying the cardinality and dimensionality parameters of the data sets. Based on our experiments, CLARA, CLARANS, and k-means algorithms demonstrate best performances with varying cardinality. It is also observed that changes in dimensionality do not impact hierarchical clustering approaches whereas there is a positive influence on the execution time for partitioning, density-based and model-based clustering approaches. © 2019 Elsevier Ltd. All rights reserved.

关键词： clustering algorithms clustering performance clustering quality Social media Turnaround time

来源：评论

学校读者我要写书评

暂无评论

Towards the Use of clustering algorithms in Recommender Systems 26

Towards the Use of Clustering Algorithms in Recommender Syst...

引用

Conference of the Association-for-Information-Systems (AMCIS)

作者： Miranda, Leandro Viterbo, Jose Bernardini, Flavia Univ Fed Fluminense Niteroi RJ Brazil

ISBN: (纸本)9781733632546

Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequently used in commercial and scientific domains in data mining tasks and visualization tools. However, there is a lack of secondary studies in the literature that analyze the use of clustering algorithms in Recommender Systems and their behavior in different aspects. In this work, we present a Systematic Literature Review (SLR), which discusses the different types of information systems with the use of the clustering algorithm in Recommender Systems, which typically involves three main recommendation approaches found in literature: collaborative filtering, content-based filtering, and hybrid recommendation. In the end, we did a quantitative analysis using K-means clustering for finding patterns between clustering algorithms, recommendation approaches, and some datasets used in their publications.

关键词： Machine learning clustering algorithms recommender systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：