Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequent...
详细信息
ISBN:
(纸本)9781733632546
Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequently used in commercial and scientific domains in data mining tasks and visualization tools. However, there is a lack of secondary studies in the literature that analyze the use of clustering algorithms in Recommender Systems and their behavior in different aspects. In this work, we present a Systematic Literature Review (SLR), which discusses the different types of information systems with the use of the clustering algorithm in Recommender Systems, which typically involves three main recommendation approaches found in literature: collaborative filtering, content-based filtering, and hybrid recommendation. In the end, we did a quantitative analysis using K-means clustering for finding patterns between clustering algorithms, recommendation approaches, and some datasets used in their publications.
The popular fuzzy c-means algorithm (FCM) converges to a local minimum of the objective function. Hence, different initializations may lead to different results. The important issue is how to avoid getting a bad local...
详细信息
ISBN:
(纸本)9780769535630
The popular fuzzy c-means algorithm (FCM) converges to a local minimum of the objective function. Hence, different initializations may lead to different results. The important issue is how to avoid getting a bad local minimum value to improve the cluster accuracy. The particle swarm optimization (PSO) is a popular and robust strategy for optimization problems. But the main difficulty in applying PSO to real-world applications is that PSO usually need a large number of fitness evaluations before a satisfying result can be obtained. In this paper, the improved new algorithm, "Fuzzy C-Mean based on Picard iteration and PSO (PPSO-FCM)", is proposed. Two real data sets were applied to prove that the performance of the PPSO-FCM algorithm is better than the conventional FCM algorithm and the PSO-FCM algorithm.
clustering algorithms are being widely used on biomedical data. They aim to extract important information that can be used to improve life conditions by helping specialized technicians on the decision process. Cluster...
详细信息
ISBN:
(纸本)9781479926053;9781479926046
clustering algorithms are being widely used on biomedical data. They aim to extract important information that can be used to improve life conditions by helping specialized technicians on the decision process. clustering algorithms based on information theory concepts claim that by using higher order statistic they are able to extract more information from the data and therefore provide much better results. In this work we try to verify this claim by comparing the performance of some entropic clustering algorithms against more conventional ones. Results of the performed experiments are not conclusive but they seem to indicate that this kind of entropic algorithms may provide some improvements when clustering biomedical data.
A crucial step in understanding a large legacy software system is to decompose it into meaningful subsystems, which can be separately studied. This decomposition can be done either manually or automatically by a softw...
详细信息
ISBN:
(纸本)0769506569
A crucial step in understanding a large legacy software system is to decompose it into meaningful subsystems, which can be separately studied. This decomposition can be done either manually or automatically by a software clustering algorithm (SCA). Similar versions of a software system can be expected to have similar decompositions. We say an SCA is stable if small changes in its input (the software system) produce small changes in its output (the decomposition). This paper defines stability formally, explains why it is an essential property for an SCA, and gives experimental results from evaluating the stability of various decomposition algorithms suggested in the literature.
As part of the 2018 MIT-Amazon Graph Challenge on subgraph isomorphism, we propose a novel joint hierarchical clustering and parallel counting technique called the PHC algorithm that can compute the exact number of tr...
详细信息
ISBN:
(纸本)9781538659892
As part of the 2018 MIT-Amazon Graph Challenge on subgraph isomorphism, we propose a novel joint hierarchical clustering and parallel counting technique called the PHC algorithm that can compute the exact number of triangles in large graphs. The PHC algorithm consists of first pruning followed by hierarchical clustering based on geodesic distance and then triangle counting in parallel. This allows scalable software framework such as MapReduce/Hadoop to count triangles inside each cluster as well as those straddling between clusters in parallel. We characterize the performance of the PHC algorithm mathematically, and its performance evaluation using representative graphs including random graphs demonstrates its computational efficiency over other existing techniques.
The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical inte...
详细信息
ISBN:
(纸本)9781467376983
The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions - types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means;K- means++;Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K- means++ together with LSA then 210- dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.
Semi-supervised clustering algorithms introduce partial knowledge into traditional unsupervised methods and generally improve results. Partial constrained clustering is one of the main kinds of semi-supervised cluster...
详细信息
ISBN:
(纸本)9781479972593
Semi-supervised clustering algorithms introduce partial knowledge into traditional unsupervised methods and generally improve results. Partial constrained clustering is one of the main kinds of semi-supervised clustering algorithms. Notably, constraints selected at random might probable bring only trivial improvement. To improve both the effectiveness and the efficiency of the partial constrained clustering algorithms, active selection for constraints is important. However, there are only few studies on the selection of active constraints. In view of this problem, in this paper we propose an improved selection approach of active constraints for partial constrained clustering algorithms. Compared to the state-of-the-art Explore and Consolidate approach, Experiments on a number of public benchmark data sets show that (i) our approach can find more informational constraints for partial constrained clustering algorithms and bring encouraging improvement;and (ii) our approach can find out constraints distributed among all the classes in investigated data sets quickly, which shows that our approach can be used in more occasions when only small numbers of constraints are allowed.
Prior works have elaborated on the problem of joint clustering in the optimization and geography domains. However, prior works neither clearly specify the connected constraint in the geography domain nor propose effic...
详细信息
ISBN:
(纸本)9783540681243
Prior works have elaborated on the problem of joint clustering in the optimization and geography domains. However, prior works neither clearly specify the connected constraint in the geography domain nor propose efficient algorithms. In this paper, we formulate the joint clustering problem in which a connected constraint and the number of clusters should be specified. We propose an algorithm K-means with Local Search (abbreviated as KLS) to solve the joint clustering problem with the connected constraint. Experimental results show that KLS can find correct clusters efficiently.
clustering is one of the important techniques in Data Mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different c...
详细信息
ISBN:
(纸本)0769509967;0769509975
clustering is one of the important techniques in Data Mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally ver)? expensive and many algorithms have been proposed so far to solve this problem. This paper considers gradual clustering problem. From practice, we noticed that user often begin clustering on a small number of attributes, e.g., two. If the result is partially satisfying, user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithm are based on DBSCAN algorithm with an associated M-Tree as bidder tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.
Aiming at the problem that the classification types are not easy to determine in the current fuzzy C-means classification, this paper proposes a power system load clustering method based on hierarchical and fuzzy theo...
详细信息
ISBN:
(纸本)9781728124551
Aiming at the problem that the classification types are not easy to determine in the current fuzzy C-means classification, this paper proposes a power system load clustering method based on hierarchical and fuzzy theory, and introduces the concept of silhouette coefficient in the mathematical field into the power system load classification to measure the classification *** at the problem of the number of clusters in the original Fuzzy C-means clustering algorithm, the idea of decision tree classification in the hierarchical clustering algorithm is integrated into the original algorithm, and the improved algorithm is fused. The improved algorithm can avoid the influence of prior values on the classification results, and then determine the optimal number of classifications according to the silhouette coefficient index. Finally, the reliability and validity of the algorithm are verified by the load data of PJM market in the United *** at the problem of the number of clusters in the original fuzzy C-means clustering algorithm, the idea of decision tree classification in the hierarchical clustering algorithm is integrated into the original algorithm, and the improved algorithm is fused. The improved algorithm can avoid the influence of prior values on the classification results, and then determine the optimal number of classifications according to the silhouette coefficient.
暂无评论