检索结果-内蒙古大学图书馆

Partition-Based clustering algorithms Applied to Mixed Data for Educational Data Mining: A Survey From 1971 to 2024

IEEE ACCESS 2024年 12卷 172923-172942页

作者： Dutt, Ashish Ismail, Maizatul Akmar Herawan, Tutut Hashem, Ibrahim Abaker Monash Univ Malaysia Sch Sci Jalan Lagoon Selatan Bandar 47500 Selangor Malaysia Univ Malaya Fac Comp Sci & Informat Technol Dept Informat Syst Kuala Lumpur 57600 Selangor Malaysia Univ Sharjah Coll Comp & Informat Dept Comp Sci Sharjah U Arab Emirates

Educational Data Mining (EDM) is the application of data mining methods in the educational domain. In the EDM field, we see mixed data (i.e., text and number data types). Grouping or clustering such data is challenging because determining the similarity between mixed data is poorly defined. Existing partition clustering algorithms for handling such data are based on two approaches: conversion of data types, where all data variables are converted to a single data type, and a mixed one, where the similarity measures of different data types are merged by either using a weighted sum approach as in Gower's distance or by using mixed dissimilarity function as used in the k-Medoids algorithm to define a singular similarity measure for mixed data. Such a datatype conversion causes information loss, and this aspect is not discussed in the existing research literature. This study systematically reviews the past fifty-three years i.e. from 1971 to 2024 of research works on partition clustering algorithms applied to mixed data in EDM. A review of 104 research articles noted that most partitional clustering algorithms have continuous or categorical variables but not mixed variables. Researchers and practitioners often cite the lack of continuous and categorical variables analysis methods. Therefore, developing machine learning algorithms that can handle mixed data inherently present in the educational domain is increasingly becoming important. In addition to comparative analysis and analysis based on several factors, research gaps are also identified and mentioned in this article, and future insights are outlined.

关键词： clustering algorithms unsupervised learning data mining clustering algorithms unsupervised learning data mining

来源：评论

学校读者我要写书评

暂无评论

An FPGA Based Accelerator for clustering algorithms With Custom Instructions

引用

IEEE TRANSACTIONS ON COMPUTERS 2021年第5期70卷 725-732页

作者： Wang, Chao Gong, Lei Jia, Fahui Zhou, Xuehai Univ Sci & Technol China Hefei 230027 Anhui Peoples R China Univ Sci & Technol China Suzhou Inst Suzhou 215123 Peoples R China

clustering algorithms are becoming popular and widely applied in many academic fields, such as machine learning, pattern recognition, and artificial intelligence. It has posed significant challenges to accelerate the algorithms due to the explosive data scale and wide variety of applications. However, previous studies mainly focus on the raw speedup with insufficient attention to the flexibility of the accelerator to support various applications. In order to accelerate different clustering algorithms in one accelerator, in this article, we design an accelerating framework based on FPGA for four state-of-the-art clustering methods, including K-means, PAM, SLINK, and DBSCAN algorithms. Moreover, we provide both euclidean and Manhattan distances as similarity metrics in the accelerator design paradigm. Moreover, we provide a custom instruction set to operate the accelerators within each application. In order to evaluate the performance and hardware cost of the accelerator, we constructed a hardware prototype on the state-of-the-art Xilinx FPGA platform. Experimental results demonstrate that the accelerator framework is able to achieve up to 23x speedup than Intel Xeon processor, and is 9.46x more energy efficient than NVIDIA GTX 750 GPU accelerators.

关键词： clustering algorithms Hardware Field programmable gate arrays Machine learning algorithms Arrays Logic arrays Acceleration Accelerators clustering custom instructions machine learning FPGA

来源：评论

学校读者我要写书评

暂无评论

End-point detection of the aerobic phase in a biological reactor using SOM and clustering algorithms

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2006年第1期19卷 19-28页

作者： González, IM García, HL Univ Oviedo Dept Ingn Elect Elect & Computadores & Sistemas Escuela Politecn Super Ingn Gijon 33204 Spain

The estimation of the aerobic phase end-point is usually used to improve the operating capacity in a sequencing batch reactor. In this paper, a software tool and a configuration of the dissolved oxygen control closed loop are proposed to achieve the aerobic end-point detection of a sequencing batch reactor in a coke wastewater treatment plant. The proposed software tool consists of self-organizing map (SOM) and clustering algorithms. Moreover a validation method for SOM training is outlined and a predefined criterion to determine the SOM size is tested. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： waste treatment biological treatment sequencing batch reactor self-organizing mapping clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Comparison of Machine Learning Classification and clustering algorithms for TV Commercials Detection

引用

IEEE ACCESS 2023年 11卷 116741-116751页

作者： Abdelfattah, Eman Joshi, Shreehar Sacred Heart Univ Sch Comp Sci & Engn Fairfield CT 06825 USA Ramapo Coll Sch Theoret & Appl Sci Mahwah NJ 07430 USA Boring Co Las Vegas NV 89169 USA

One of the essential aspects of broadcast monitoring is to detect and consequently extract commercial blocks in telecast news videos. The research carried out until now have based their work almost entirely on preconceived characteristics that are associated with a channel. With the advertisers constantly looking to work around the existing policies, the reliance on the nature of channels during an advertisement does not suffice. The other approach towards identifying a commercial is by frequentist approach. However, it is often the case that sponsored programs and other programs share similar time in any specified hour, rendering the frequentist approach almost useless in the process. As such, this paper uses machine learning based approach which is more generic and can employ inherent differences that commercials have over their non-commercial counterparts for classifying and clustering commercials in the news videos. The datasets which contain 90 hours of recordings from five different news channels from US, England and India have been used to train and test nine different classifiers - K Neighbors, Support Vector Machine, Decision Tree, Random Forests, Ada Boost, Gradient Boost, Gaussian NB, Linear Discriminant Analysis, and Quadratic Discriminant Analysis - and five different clustering algorithms - K Means, Agglomerative, Birch, Mini-Batch K Means, and Gaussian Mixture. Our results show that the Random Forests outperforms all the other classifiers used with respect to F1 score and median time to train and test on each of these datasets that consists of features of shots extracted from 18 hours of video. Similarly, Mini Batch K Means was found to perform the best for forming clusters of news and commercials.

关键词： TV Videos Feature extraction clustering algorithms Classification algorithms Machine learning algorithms Machine learning Advertising TV commercial detection machine learning classification clustering

来源：评论

学校读者我要写书评

暂无评论

A Novel Demand-Responsive Customized Bus Based on Improved Ant Colony Optimization and clustering algorithms

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2023年第8期24卷 8492-8506页

作者： Shu, Wanneng Li, Yan South Cent Univ Nationalities Coll Comp Sci Wuhan 430074 Peoples R China Huazhong Agr Univ Coll Sci Wuhan 430070 Peoples R China

The customized bus operating mode based on passenger demand is an effective way to solve the problem of bus services in low travel density areas such as urban fringe areas, ensure the profitability of bus enterprises, and promote the development of customized bus and other emerging bus. First, this study introduces the concept and operating principle of customized bus, determines the advantages and disadvantages of customized bus, evaluates the relevant theories of customized bus lines and station planning, and determines the principles of customized bus lines and station planning. Second, according to the characteristics of customized bus, this study proposes a novel customized bus line and station planning method completely based on passenger travel demand, including travel demand data processing, traffic community division, joint station planning, the establishment of a customized bus line planning model, and the solution of the planning model. Finally, the proposed planning method and improved ant colony optimization and clustering are verified by simulation experiments. The experimental results show that the station line planning method proposed in this paper can better realize the line planning of demand-responsive customized bus as well as meet diverse passenger travel needs.

关键词： Planning Costs Heuristic algorithms Urban areas clustering algorithms Vehicle dynamics Layout Demand response customized bus route and station planning K-means clustering ant colony

来源：评论

学校读者我要写书评

暂无评论

Class of constrained clustering algorithms for object boundary extraction

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 1996年第11期5卷 1507-1521页

作者： Abrantes, AJ Marques, JS INST SUPER ENGN LISBOA LISBON PORTUGAL Univ Tecn Lisboa DEPT ENGN ELECTROTECN & COMP INST SUPER TECN LISBON PORTUGAL

Boundary extraction is a key task in many image analysis operations. This paper describes a class of constrained clustering algorithms for object boundary extraction that includes several well-known algorithms proposed in different fields (deformable models, constrained clustering, data ordering, and traveling salesman problems), The algorithms belonging to this class are obtained by the minimization of a cost function with two terms: a quadratic regularization term and an image-dependent term defined by a set of weighting functions, The minimization of the cost function is achieved by lowpass filtering the previous model shape and by attracting the model units toward the centroids of their attraction regions, To define a new algorithm belonging to this class, the user has to specify a regularization matrix and a set of weighting functions that control the attraction of the model units toward the data, The usefulness of this approach is twofold: It provides a unified framework for many existing algorithms in pattern recognition and deformable models, and allows the design of new recursive schemes.

关键词： clustering algorithms Data mining Deformable models Cost function Image analysis Traveling salesman problems Minimization methods Filtering Shape Weight control

来源：评论

学校读者我要写书评

暂无评论

COMPARATIVE-STUDY OF SIMILARITY COEFFICIENTS AND clustering algorithms IN CELLULAR MANUFACTURING

引用

JOURNAL OF MANUFACTURING SYSTEMS 1994年第2期13卷 119-127页

作者： SEIFODDINI, H HSU, CP University of Wisconsin-Milwaukee USA

Three components of a machine cell formation process-similarity coefficients, clustering algorithms, and performance measures-are studied. A new performance measure is introduced and a comparative study of three different similarity coefficients-the Jaccard's similarity coefficient, weighted similarity coefficient, and commonality score-is conducted.

关键词： CELLULAR MANUFACTURING MACHINE-COMPONENT GROUPING SIMILARITY COEFFICIENT clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

EFFICIENT IMPLEMENTATION OF THE FUZZY C-MEANS clustering algorithms

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1986年第2期8卷 248-255页

作者： CANNON, RL DAVE, JV BEZDEK, JC IBM CORP CTR SCIPALO ALTOCA 94304

This paper reports the results of a numerical comparison of two versions of the fuzzy c-means (FCM) clustering algorithms. In particular, we propose and exemplify an approximate fuzzy c-means (AFCM) implementation based upon replacing the necessary ``exact'' variates in the FCM equation with integer-valued or real-valued estimates. This approximation enables AFCM to exploit a lookup table approach for computing Euclidean distances and for exponentiation. The net effect of the proposed implementation is that CPU time during each iteration is reduced to approximately one sixth of the time required for a literal implementation of the algorithm, while apparently preserving the overall quality of terminal clusters produced. The two implementations are tested numerically on a nine-band digital image, and a pseudocode subroutine is given for the convenience of applications-oriented readers. Our results suggest that AFCM may be used to accelerate FCM processing whenever the feature space is comprised of tuples having a finite number of integer-valued coordinates.

关键词： clustering algorithms Table lookup Image analysis Fuzzy sets Acceleration Equations Testing Digital images Pattern analysis Image processing

来源：评论

学校读者我要写书评

暂无评论

Evolutionary Multiobjective clustering algorithms With Ensemble for Patient Stratification

引用

IEEE TRANSACTIONS ON CYBERNETICS 2022年第10期52卷 11027-11040页

作者： Wang, Yunhe Li, Xiangtao Wong, Ka-Chun Chang, Yi Yang, Shengxiang Jilin Univ Sch Artificial Intelligence Changchun 130012 Peoples R China Northeast Normal Univ Sch Informat Sci & Technol Changchun 130117 Peoples R China De Montfort Univ Sch Comp Sci & Informat Leicester LE1 9BH Leics England City Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China

Patient stratification has been studied widely to tackle subtype diagnosis problems for effective treatment. Due to the dimensionality curse and poor interpretability of data, there is always a long-lasting challenge in constructing a stratification model with high diagnostic ability and good generalization. To address these problems, this article proposes two novel evolutionary multiobjective clustering algorithms with ensemble (NSGA-II-ECFE and MOEA/D-ECFE) with four cluster validity indices used as the objective functions. First, an effective ensemble construction method is developed to enrich the ensemble diversity. After that, an ensemble clustering fitness evaluation (ECFE) method is proposed to evaluate the ensembles by measuring the consensus clustering under those four objective functions. To generate the consensus clustering, ECFE exploits the hybrid co-association matrix from the ensembles and then dynamically selects the suitable clustering algorithm on that matrix. Multiple experiments have been conducted to demonstrate the effectiveness of the proposed algorithm in comparison with seven clustering algorithms, twelve ensemble clustering approaches, and two multiobjective clustering algorithms on 55 synthetic datasets and 35 real patient stratification datasets. The experimental results demonstrate the competitive edges of the proposed algorithms over those compared methods. Furthermore, the proposed algorithm is applied to extend its advantages by identifying cancer subtypes from five cancer-related single-cell RNA-seq datasets.

关键词： clustering algorithms Linear programming Optimization clustering methods Cancer Urban areas Heuristic algorithms Ensemble clustering multiobjective optimization (MOO) patient stratification

来源：评论

学校读者我要写书评

暂无评论

Fuzzy C-Means clustering algorithms with Weighted Membership and Distance

引用

INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS 2022年第4期30卷 567-594页

作者： Pimentel, Bruno Almeida Silva, Rafael de Amorim Santos Costa, Jadson Crislan Univ Fed Alagoas IC UFAL Inst Comp Maceio Alagoas Brazil

Fuzzy C-means (FCM) clustering algorithm is an important and popular clustering algorithm which is utilized in various application domains such as pattern recognition, machine learning, and data mining. Although this algorithm has shown acceptable performance in diverse problems, the current literature does not have studies about how they can improve the clustering quality of partitions with overlapping classes. The better the clustering quality of a partition, the better is the interpretation of the data, which is essential to understand real problems. This work proposes two robust FCM algorithms to prevent ambiguous membership into clusters. For this, we compute two types of weights: an weight to avoid the problem of overlapping clusters;and other weight to enable the algorithm to identify clusters of different shapes. We perform a study with synthetic datasets, where each one contains classes of different shapes and different degrees of overlapping. Moreover, the study considered real application datasets. Our results indicate such weights are effective to reduce the ambiguity of membership assignments thus generating a better data interpretation.

关键词： clustering algorithms ambiguous membership fuzzy C-means weighting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：