检索结果-内蒙古大学图书馆

Partition-Based clustering algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024

IEEE ACCESS 2024年 12卷 172923-172942页

作者： Dutt, Ashish Ismail, Maizatul Akmar Herawan, Tutut Hashem, Ibrahim Abaker Monash Univ Malaysia Sch Sci Jalan Lagoon Selatan Bandar 47500 Selangor Malaysia Univ Malaya Fac Comp Sci & Informat Technol Dept Informat Syst Kuala Lumpur 57600 Selangor Malaysia Univ Sharjah Coll Comp & Informat Dept Comp Sci Sharjah U Arab Emirates

Educational Data Mining (EDM) is the application of data mining methods in the educational domain. In the EDM field, we see mixed data (i.e., text and number data types). Grouping or clustering such data is challenging because determining the similarity between mixed data is poorly defined. Existing partition clustering algorithms for handling such data are based on two approaches: conversion of data types, where all data variables are converted to a single data type, and a mixed one, where the similarity measures of different data types are merged by either using a weighted sum approach as in Gower's distance or by using mixed dissimilarity function as used in the k-Medoids algorithm to define a singular similarity measure for mixed data. Such a datatype conversion causes information loss, and this aspect is not discussed in the existing research literature. This study systematically reviews the past fifty-three years i.e. from 1971 to 2024 of research works on partition clustering algorithms applied to mixed data in EDM. A review of 104 research articles noted that most partitional clustering algorithms have continuous or categorical variables but not mixed variables. Researchers and practitioners often cite the lack of continuous and categorical variables analysis methods. Therefore, developing machine learning algorithms that can handle mixed data inherently present in the educational domain is increasingly becoming important. In addition to comparative analysis and analysis based on several factors, research gaps are also identified and mentioned in this article, and future insights are outlined.

关键词： clustering algorithms unsupervised learning data mining clustering algorithms unsupervised learning data mining

来源：评论

学校读者我要写书评

暂无评论

A Survey of Parallel clustering algorithms Based on Spark

引用

SCIENTIFIC PROGRAMMING 2020年第1期2020卷

作者： Xiao, Wen Hu, Juan Wanjiang Univ Technol Key Lab Unmanned Aerial Vehicle Dev & Data Applic Maanshan 243000 Peoples R China Wanjiang Univ Technol Maanshan Engn Technol Res Ctr Wireless Sensor Net Maanshan 243000 Peoples R China

clustering is one of the most important unsupervised machine learning tasks, which is widely used in information retrieval, social network analysis, image processing, and other fields. With the explosive growth of data, the classical clustering algorithms cannot meet the requirements of clustering for big data. Spark is one of the most popular parallel processing platforms for big data, and many researchers have proposed many parallel clustering algorithms based on Spark. In this paper, the existing parallel clustering algorithms based on Spark are classified and summarized, the parallel design framework of each kind of algorithms is discussed, and after comparing different kinds of algorithms, the direction of the future research is discussed.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimized routers positions for large-scale RF mesh networks based on clustering algorithms

引用

AD HOC NETWORKS 2019年 93卷 101901-000页

作者： Mezher, Ahmad Mohamad Cardenas-Barrera, Julian Rajendran, Nisha Meng, Julian Guerra, Eduardo Castillo UNB Dept Elect & Comp Engn Fredericton NB Canada

Nowadays, a great interest to upgrade the existing power grid to become smart grid (SG) has been put by both, the research and the industrial community. More specifically, smart metering and communications method are recently and extensively getting studied in SG. However, the design and development of an efficient routing protocol in Radio Frequency (RF) mesh network to connect the advanced metering infrastructure (AMI) to collectors and vice versa highly depends on the positions of the routers. In this spirit, we focus our work in this paper to optimize the positions of the available routers to bring out the highest possible connectivity between smart meters and collectors. To do so, we have used two well-known clustering algorithms, the maximum distance to average vector (MDAV) and the Lloyd algorithm, to allocate routers in their optimized positions in a smart grid scenario. An extensive simulations have been carried out with the proposed algorithms, where significant improvement has been shown with respect to the initial distribution of routers. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Advanced metering infrastructure (AMI) Smart grid communications clustering algorithms Wireless mesh networks

来源：评论

学校读者我要写书评

暂无评论

Gaussian mixture model clustering algorithms for the analysis of high-precision mass measurements

引用

NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT 2022年 1027卷 166299-166299页

作者： Weber, C. M. Ray, D. Valverde, A. A. Clark, J. A. Sharma, K. S. Argonne Natl Lab Phys Div Lemont IL 60439 USA Univ Manitoba Dept Phys & Astron Winnipeg MB R3T 2N2 Canada

The development of the phase-imaging ion-cyclotron resonance (PI-ICR) technique for use in Penning trap mass spectrometry (PTMS) increased the speed and precision with which PTMS experiments can be carried out. In PI-ICR, data sets of the locations of individual ion hits on a detector are created showing how ions cluster together into spots according to their cyclotron frequency. Ideal data sets would consist of a single, 2D-spherical spot with no other noise, but in practice data sets typically contain multiple spots, non-spherical spots, or significant noise, all of which can make determining the locations of spot centers non-trivial. A method for assigning groups of ions to their respective spots and determining the spot centers is therefore essential for further improving precision and confidence in PI-ICR experiments. We present the class of Gaussian mixture model (GMM) clustering algorithms as an optimal solution. We show that on simulated PI-ICR data, several types of GMM clustering algorithms perform better than other clustering algorithms over a variety of typical scenarios encountered in PI-ICR. The mass spectra of 163Gd, 163 "'Gd, 162Tb, and 162 "'Tb measured using PI-ICR at the Canadian Penning trap mass spectrometer were checked using GMMs, producing results that were in close agreement with the previously published values.

关键词： PI-ICR Penning trap Mass spectrometry clustering algorithms Gaussian mixture models Machine learning

来源：评论

学校读者我要写书评

暂无评论

Early outlier detection in three-phase induction heating systems using clustering algorithms

引用

AIN SHAMS ENGINEERING JOURNAL 2024年第3期15卷

作者： Qais, Mohammed H. Kewat, Seema Loo, K. H. Lai, Cheung-Ming Ctr Adv Reliabil & Safety Hong Kong Peoples R China Karlstad Univ Dept Engn & Phys Karlstad Sweden Hong Kong Polytech Univ Dept Elect & Informat Engn Hong Kong Peoples R China

Induction heating (IH) devices transfer the electric power to the contactless cookware via the electromagnetic field. Therefore, the temperature of cookware is measured remotely, and the early detection of cookware overheating will ensure the user's safety as well as extend the remaining useful life of electronic components. Therefore, this work presents a clustering model for outlier detection in IH systems based on clustering algorithms and measured data using two thermal sensors. First, a healthy dataset is collected for the temperatures of inverters and cookware under different sizes and materials of cookware items, different amounts of water in cookware, and different amounts of electrical power. After that, K-means and fuzzy c-means were utilized to cluster this normal dataset, where the maximum distance between their centers and data points was selected as a threshold. Finally, the clustered model is investigated using a testing dataset that includes outliers. According to the results, the K-means algorithm detected around 96% of the produced outliers, however, the fuzzy c-means algorithm detected around 68%. In conclusion, the deployment of the clustering model in outlier detection is simple and uses only the threshold and the cluster centers.

关键词： Outlier detection clustering algorithms Induction Heating K-means Unsupervised machine learning

来源：评论

学校读者我要写书评

暂无评论

Decision-Making Support for the Evaluation of clustering algorithms Based on MCDM

引用

COMPLEXITY 2020年第1期2020卷

作者： Wu, Wenshuai Xu, Zeshui Kou, Gang Shi, Yong Sichuan Univ Business Sch Chengdu 610065 Sichuan Peoples R China SouthWestern Univ Finance & Econ Sch Business Adm Chengdu 611130 Sichuan Peoples R China Chinese Acad Sci CAS Res Ctr Fictitious Econ & Data Sci Beijing 100190 Peoples R China Chinese Acad Sci Key Lab Big Data Min & Knowledge Management Beijing 100190 Peoples R China

In many disciplines, the evaluation of algorithms for processing massive data is a challenging research issue. However, different algorithms can produce different or even conflicting evaluation performance, and this phenomenon has not been fully investigated. The motivation of this paper aims to propose a solution scheme for the evaluation of clustering algorithms to reconcile different or even conflicting evaluation performance. The goal of this research is to propose and develop a model, called decision-making support for evaluation of clustering algorithms (DMSECA), to evaluate clustering algorithms by merging expert wisdom in order to reconcile differences in their evaluation performance for information fusion during a complex decision-making process. The proposed model is tested and verified by an experimental study using six clustering algorithms, nine external measures, and four MCDM methods on 20 UCI data sets, including a total of 18,310 instances and 313 attributes. The proposed model can generate a list of algorithm priorities to produce an optimal ranking scheme, which can satisfy the decision preferences of all the participants. The results indicate our developed model is an effective tool for selecting the most appropriate clustering algorithms for given data sets. Furthermore, our proposed model can reconcile different or even conflicting evaluation performance to reach a group agreement in a complex decision-making environment.

关键词： clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Density peak clustering algorithms: A review on the decade 2014-2023

引用

EXPERT SYSTEMS WITH APPLICATIONS 2024年第PartA期238卷

作者： Wang, Yizhang Qian, Jiaxin Hassan, Muhammad Zhang, Xinyu Zhang, Tao Yang, Chao Zhou, Xingxing Jia, Fengjin Yangzhou Univ Coll Informat & Engn Yangzhou Peoples R China Inst Sci & Tech Informat China Beijing Peoples R China Tsinghua Univ Inst Biopharmaceut & Hlth Engn Tsinghua Shenzhen Int Grad Sch Shenzhen Peoples R China Shenzhen Children Hosp Dept Ophthalmol Shenzhen 518026 Guangdong Peoples R China

Density peak clustering (DPC) algorithm has become a well-known clustering method during the last decade, The research communities believe that DPC is a powerful tool applied to various fields underlying distinct contemporary issues and future prospects, it is time to summarize the research progress of DPC and help them quickly know what issues have been resolved, what issues remain open, and what to do in the future. In this survey, we first describe several frequently used synthetic, UCI, and image datasets followed by the reviewing of all the DPC-related works as categorized into: finding clusters with different densities, optimizing parameter values, preventing domino effects, clustering large datasets, implementing parameter-less DPC, clustering mixed data, and clustering imbalanced data. Then, we compare the recently and widely used extensions of DPC based on the 26 synthetic and UCI datasets. Finally, according to the above analysis, the survey concludes with the improvement of DPC on synthetic and UCI datasets, revisiting large-scale data clustering, parameter-less clustering, privacy-protecting based clustering like challenges, proposing solutions on the deployment of DPC in spark, introducing deep clustering to DPC, and finally federating DPC clustering. To the best of our knowledge, this is the first review that summarizes the progress of DPC in the last decade.

关键词： Density peak clustering clustering algorithms Challenges

来源：评论

学校读者我要写书评

暂无评论

Expediency Analysis of clustering algorithms for Electric Two-Wheeler Driving Cycle Development Under Indian Smart City Driving Conditions

引用

IEEE ACCESS 2024年 12卷 180279-180300页

作者： Gurusamy, Azhaganathan Bokdia, Akshat Kumar, Harsh Radhika, A. Ashok, Bragadeshwaran Gunavathi, Chellamuthu Vellore Inst Technol Sch Mech Engn Vellore 632014 Tamil Nadu India Vellore Inst Technol Sch Comp Sci & Engn Vellore 632014 Tamil Nadu India Velammal Coll Engn & Technol Dept Elect & Elect Engn Madurai 625009 Tamil Nadu India

The standard driving cycles (DCs) used to evaluate spark-ignition engine-based two-wheelers are inadequate for electric two-wheelers (E2Ws). Also, they fail to accurately represent the actual driving circumstances in specific areas, resulting in inaccuracies during the evaluation of performance. The current research is centred towards constructing an electric two-wheeler urban driving cycle (E2WUDC) that considers the driving circumstances of the smart city in India. Further, the denoised speed data is utilized to extract the micro-trips and compute their driving parameters. Furthermore, the dimensions of the data are decreased through the utilization of principal component analysis. Subsequently, the data is classified utilizing various clustering methods including k-means, X-means, hierarchical clustering, and density-based spatial clustering of applications with noise (DBSCAN). Then, the Calinski Harabasz index (CHI), Davies-Bouldin index (DBI), and silhouette score are used to assess the homogeneity and completeness of selected clustering algorithms in the data cluster. Overall, the E2WUDC is developed using X-means which is selected as a suitable clustering algorithm based on the performance indices. Also, the key driving features of E2WUDC such as total time duration and distance are 14.49 km and 1914 seconds with average and maximum driving speeds of 8 and 13.88 m/s respectively. Eventually, it establishes the foundation for assessing the energy economy, driving range and energy demand for the widespread deployment of electric two-wheelers in urban commuting.

关键词： Electric vehicle electric two-wheeler smart city smart city urban driving cycle urban driving cycle clustering algorithms clustering algorithms energy consumption energy consumption energy consumption

来源：评论

学校读者我要写书评

暂无评论

AN INVESTIGATION OF clustering algorithms IN THE IDENTIFICATION OF SIMILAR WEB PAGES

引用

JOURNAL OF WEB ENGINEERING 2009年第4期8卷 346-370页

作者： De Lucia, Andrea Risi, Michele Scanniello, Giuseppe Tortora, Genoveffa Univ Salerno Dipartimento Matemat & Informat Salerno Italy Univ Basilicata Dipartimento Matemat & Informat Potenza Italy

In this paper we investigate the effect of using clustering algorithms in the reverse engineering field to identify pages that are similar either at the structural level or at the content level. To this end, we have used two instances of a general process that only differ for the measure used to compare web pages. In particular, two web pages at the structural level and at the content level are compared by using the Levenshtein edit distances and Latent Semantic Indexing, respectively. The static pages of two web applications and one static web site have been used to compare the results achieved by using the considered clustering algorithms both at the structural and content level. On these applications we generally achieved comparable results. However, the investigation has also suggested some heuristics to quickly identify the best partition of web pages into clusters among the possible partitions both at the structural and at the content level.

关键词： clone analysis clustering algorithms latent semantic indexing Levenshtein string edit distances program comprehension reverse engineering

来源：评论

学校读者我要写书评

暂无评论

Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions

引用

PEERJ COMPUTER SCIENCE 2024年 10卷 e2286页

作者： Wani, Aasim Ayaz Cornell Univ Sch Engn Ithaca NY 14850 USA

This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on fi ve primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains-ranging - ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.

关键词： clustering algorithms Unsupervised learning Scalability and efficiency fi ciency Centroid-based clustering Hierarchical clustering Density-based clustering Distribution-based clustering clustering challenges and solutions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：