Multi-class classification is a challenging problem in pattern recognition. clustering-based classification (CC) is one of the most effective classification methods that first divides data into several clusters, each ...
详细信息
ISBN:
(纸本)9781467392808
Multi-class classification is a challenging problem in pattern recognition. clustering-based classification (CC) is one of the most effective classification methods that first divides data into several clusters, each cluster then being described by a One-Class Classifier (OCC). Scalability and accuracy are two key advantages of this clustering-enhanced approach. In continuation of this strategy, in this paper, we further propose Spectral clustering-based classification (SCC). In contrast to many other clustering algorithms, Spectral clustering (SC) aims to put the more mutually interconnected data points in one cluster, hence producing output clusters with smoother borders. A simpler border is easier to be described by an OCC, leading to higher accuracy. Application to seven UCI data sets of various nature and size confirms this improved performance in terms of higher accuracy, while keeping scalability property.
Representation learning is a fast growing approach in machine learning that aims to improve the quality of the input data, instead of insisting on designing complex subsequent learning algorithms. In this paper, we pr...
详细信息
ISBN:
(纸本)9781467392808
Representation learning is a fast growing approach in machine learning that aims to improve the quality of the input data, instead of insisting on designing complex subsequent learning algorithms. In this paper, we propose to use Denoising AutoEncoders (DAEs), as one of the most effective representation learning methods, in clustering-based classification (CC). CC is a multi-class classification solution for large-scale and complicated data sets. In this approach, data are divided into small and simple clusters, which are described by One-Class Classifiers (OCCs). In the proposed Representation Learning for clustering-based classification (RLCC), the new representation of each cluster is generated locally to increase the performance of OCCs in term of accuracy. This method still preserves the scalability property as one of the significant advantages of CC methods. RLCC is evaluated with six different data sets from UCI. The results of the experiments show that RLCC has higher generalization power compared to the standard version of CC.
The musical nuance classification model is proposed using a clustering-based classification approach. Gamelan, a traditional Indonesian music ensemble, is used as the subject of this study. The proposed approach emplo...
详细信息
The musical nuance classification model is proposed using a clustering-based classification approach. Gamelan, a traditional Indonesian music ensemble, is used as the subject of this study. The proposed approach employs initial and final data segmentation to analyze symbolic music data, followed by concatenation of the clustering results from both segments to generate a more complex label. Structural-based segmentation divides the composition into an initial segment, representing theme introduction, and a final segment, serving as a closing or resolution. This aims to capture the distinct characteristics of the initial and final segments of the composition. The approach reduces clustering complexity while maintaining the relevance of local patterns. The clustering process, performed using the K-Means algorithm, demonstrates strong performance and promising results. Furthermore, the classification rules derived from data segmentation and concatenation help mitigate clustering complexity, resulting in an effective classification outcome. The model evaluation was conducted by measuring the similarity within the classes formed from data merging using Euclidean distance score, where values below three indicate high similarity, and values greater than ten indicate strong dissimilarity. Three of the 13 formed classes with more than one data point, Class 5, Class 12, and Class 18, demonstrate high similarity with a value below three. Five other classes, Class 7, Class 10, Class 11, Class 15, and Class 20, exhibit near-high similarity, with values ranging from three to four, while the remaining five classes fall within the range of four to five.
When dealing a classification problem with mixed data, most of conventional supervised learning algorithms cannot perform well due to their numerical characteristics. However, some clustering algorithms, such as k- pr...
详细信息
When dealing a classification problem with mixed data, most of conventional supervised learning algorithms cannot perform well due to their numerical characteristics. However, some clustering algorithms, such as k- prototypes algorithm, show their potential in clustering mixed data. Therefore, the current study intends to develop a novel clustering-based classification algorithm for mixed data to have both merits of classification and clustering. The proposed algorithm employs a sine-cosine algorithm (SCA) to find attribute weights and initial centroids for a k-prototypes algorithm. The objective function of the algorithm is formulated as a sum-up purity. To have better performance for SCA, a mutation strategy, containing Gaussian mutation, Cauchy mutation, Levy mutation, and single-point mutation, is embedded into the original SCA. The proposed algorithm is compared with some metaheuristic-basedclassification algorithms and existing classification algorithms. based on the 10 data sets from UCI, the experimental results indicated that the proposed algorithm can achieve superior clas-sification performance in terms of accuracy and Cohen's Kappa. In addition, mutation mechanism can make SCA have better performance
With the rapid development in business transactions,especially in recent years,it has become necessary to develop different mechanisms to trace business user records in web server log in an efficient *** business tran...
详细信息
With the rapid development in business transactions,especially in recent years,it has become necessary to develop different mechanisms to trace business user records in web server log in an efficient *** business transactions have increased,especially when the user or customer cannot obtain the required *** example,with the spread of the epidemic Coronavirus(COVID-19)throughout the world,there is a dire need to rely more on online business *** order to improve the efficiency and performance of E-business structure,a web server log must be well utilized to have the ability to trace and record infinite user *** paper proposes an event stream mechanism based on formula patterns to enhance business processes and record all user activities in a structured log *** user activity is recorded with a set of tracing parameters that can predict the behavior of the user in business *** experimental results are conducted by applying clustering-based classification algorithms on two different datasets;namely,Online Shoppers Purchasing Intention and Instacart Market Basket *** clustering process is used to group related objects into the same cluster,then the classification process measures the predicted classes of clustered *** experimental results record provable accuracy in predicting user preferences on both datasets.
Event-related potentials (ERP) are brain electrophysiological patterns created by averaging electroencephalographic (EEC) data, time-locking to events of interest (e.g., stimulus or response onset). In this paper, we ...
详细信息
ISBN:
(纸本)9781595936097
Event-related potentials (ERP) are brain electrophysiological patterns created by averaging electroencephalographic (EEC) data, time-locking to events of interest (e.g., stimulus or response onset). In this paper, we propose a generic framework for mining and developing domain ontologies and apply it to mine brainwave (ERP) ontologies. The concepts and relationships in ERP ontologies can be mined according to the following steps: pattern decomposition, extraction of summary metrics for concept candidates, hierarchical clustering of patterns for classes and class taxonomies, and clustering-based classification and association rules mining for relationships (axioms) of concepts. We have applied this process to several dense-array (128-channel) ERP datasets. Results suggest good correspondence between mined concepts and rules, on the one hand, and patterns and rules that were independently formulated by domain experts, on the other. Data mining results also suggest ways in which expert-defined rules might be refined to improve ontology representation and classification results. The next goal of our ERP ontology mining framework is to address some long-standing challenges in conducting large-scale comparison and integration of results across ERP paradigms and laboratories. In a more general context, this work illustrates the promise of an interdisciplinary research program, which combines data mining, neuroinformatics and ontology engineering to address real-world problems.
暂无评论