The mixed data clustering algorithms have been timidly emerging since the end of the last century. One of the last algorithms proposed for this data-type has been kamila (KAy-means for MIxed LArge data) algorithm. Whi...
详细信息
ISBN:
(数字)9783030336073
ISBN:
(纸本)9783030336073;9783030336066
The mixed data clustering algorithms have been timidly emerging since the end of the last century. One of the last algorithms proposed for this data-type has been kamila (KAy-means for MIxed LArge data) algorithm. While the kamila has outperformed the previous mixed data algorithms results, it has some gaps. Among them is the definition of numerical and categorical variable weights, which is a user-defined parameter or, by default, equal to one for all features. Hence, we propose an optimization algorithm called Biased Random-Key Genetic algorithm for Features Weighting (BRKGAFW) to accomplish the weighting of the numerical and categorical variables in the kamila algorithm. The experiment relied on six real-world mixed data sets and two baseline algorithms to perform the comparison, which are the kamila with default weight definition, and the kamila with weight definition done by the traditional genetic algorithm. The results have revealed the proposed algorithm overperformed the baseline algorithms results in all data sets.
A useful tool that has gained popularity in the Quality Control area is the control chart which monitors a process over time, identifies potential changes, understands variations, and eventually improves the quality a...
详细信息
A useful tool that has gained popularity in the Quality Control area is the control chart which monitors a process over time, identifies potential changes, understands variations, and eventually improves the quality and performance of the process. This article introduces a new class of multivariate semiparametric control charts for monitoring multivariate mixed-type data, which comprise both continuous and discrete random variables (rvs). Our methodology leverages ideas from clustering and Statistical Process Control to develop control charts for MIxed-type data. We propose four control chart schemes based on modified versions of the KAy-means for MIxed LArge kamila data clustering algorithm, where we assume that the two existing clusters represent the reference and the test sample. The charts are semiparametric, the continuous rvs follow a distribution that belongs in the class of elliptical distributions. Categorical scale rvs follow a multinomial distribution. We present the algorithmic procedures and study the characteristics of the new control charts. The performance of the proposed schemes is evaluated on the basis of the False Alarm Rate and in-control Average Run Length. Finally, we demonstrate the effectiveness and applicability of our proposed methods utilizing real-world data.
暂无评论