This paper presents a chain vector-quantization clustering (CVQC) algorithm for realtime speech recognition. In comparison with the conventional dynamic time warping (DTW) and vector-quantization (VQ) methods, it deli...
详细信息
This paper presents a chain vector-quantization clustering (CVQC) algorithm for realtime speech recognition. In comparison with the conventional dynamic time warping (DTW) and vector-quantization (VQ) methods, it delivers faster training and recognition speeds and requires smaller memory locations.
In recent years, social network services have grown rapidly. The number of friends of each user using social network services has also increased significantly and is so large that clustering and managing these friends...
详细信息
In recent years, social network services have grown rapidly. The number of friends of each user using social network services has also increased significantly and is so large that clustering and managing these friends has become difficult. In this paper, we propose an algorithm called mCAF that automatically clusters friends. Additionally, we propose methods that define the distance between different friends based on different sets of measurements. Our proposed mCAF algorithm attempts to reduce the effort and time required for users to manage their friends in social network services. The proposed algorithm could be more flexible and convenient by implementing different privacy settings for different groups of friends. According to our experimental results, we find that the improved ratios between mCAF and SCAN are 35.8 % in similarity and 84.9 % in F-1 score.
Aim: clustering belongs to unsupervised learning, which divides the data objects into the data set into multiple clusters or classes, so that the objects in the same cluster have high similarity. Background: The clust...
详细信息
Aim: clustering belongs to unsupervised learning, which divides the data objects into the data set into multiple clusters or classes, so that the objects in the same cluster have high similarity. Background: The clustering of spatial data objects can be solved by optimization based on the clustering objective function. Objective: Study on intelligent analysis and processing technology of computer big data based on clustering algorithm. Methods: First, a new dynamic self-organizing feature mapping model is proposed, and the training algorithm of the model is given. Then, the spectral clustering technology and related concepts are introduced. The spectral clustering algorithm is studied and analyzed, and a spectral clustering algorithm that automatically determines the number of clusters is proposed. Furthermore, an algorithm for constructing a discrete Morse function to find the optimal solution is proposed, proving that the constructed function is the optimal discrete Morse function. At the same time, two optimization models based on the discrete Morse theory are constructed. Finally, the optimization model based on discrete Morse theory is applied to cluster analysis, and a density clustering algorithm based on the discrete Morse optimization model is proposed. Results: This study is focused on designing and implementing a partitional-based clustering algorithm based on big data, that is suitable for clustering huge datasets to meet low computational requirements. The experiments are conducted in terms of time and space complexity and it is observed that the measure of clustering quality and the run time is capable of running in very less time without negotiating the quality of clustering. The results show that the experiments are carried out on the artificial data set and the UCI data set. Conclusion: The efficiency and superiority of the new model, are verified by comparing it with the clustering results of the DBSCAN algorithm.
To compare the performance of the clustering algorithm on two data processing architectures, the implementations of k-means clustering algorithm on two big data architectures are given at first in this paper. Then we ...
详细信息
To compare the performance of the clustering algorithm on two data processing architectures, the implementations of k-means clustering algorithm on two big data architectures are given at first in this paper. Then we focus on the differences of theoretical performance of k-means algorithm on two architectures from the mathematical point of view. The theoretical analysis shows that Spark architecture is superior to the Hadoop in aspects of the average execution time and I/O time. Finally, a text data set of social networking site of users' behaviors is employed to conduct algorithm experiments. The results show that Spark is significantly less than MapReduce in aspects of the execution time and I/O time based on k-means algorithm. The theoretical analysis and the implementation technology of the big data algorithm proposed in this paper are a good reference for the application of big data technology.
Metagenomic data is a novel and valuable source for personalized medicine approaches to improve human health. Data Visualization is a crucial technique in data analysis to explore and find patterns in data. Especially...
详细信息
Metagenomic data is a novel and valuable source for personalized medicine approaches to improve human health. Data Visualization is a crucial technique in data analysis to explore and find patterns in data. Especially, data resources from metagenomic often have very high dimension so humans face big challenges to understand them. In this study, we introduce a visualization method based on Mean-shift algorithm which enables us to observe high-dimensional data via images exhibiting clustered features by the clustering method. Then, these generated synthetic images are fetched into a convolutional neural network to do disease prediction tasks. The proposed method shows promising results when we evaluate the approach on four metagenomic bacterial species abundance datasets related to four diseases including Liver Cirrhosis, Colorectal Cancer, Obesity, and Type 2 Diabetes.
Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some ...
详细信息
Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.
How to dig out the business perspectives and market rules behind commodity transaction data, explore the relationship between commodities, so as to more scientifically and rationally classify and promote commodity cat...
详细信息
How to dig out the business perspectives and market rules behind commodity transaction data, explore the relationship between commodities, so as to more scientifically and rationally classify and promote commodity categories and improve commodity sales performance for e-commerce companies has become a recent research hotspot. To this end, this paper proposes to use clustering algorithm to explore the hidden laws of commodity-related big data. This article first consults a large amount of information through the literature survey method, systematically summarizes the relevant theoretical knowledge of the association rule method and clustering algorithm and gives a detailed introduction to its application in the commodity association big data mining. The research in this area has laid a sufficient theoretical foundation;after that, the Apriori algorithm in the association rules and the K-means algorithm in the clustering algorithm were used to carry out the fast clustering algorithm experiment of the commodity-related big data sparse network and the commodity transaction data was introduced in detail. The process of association analysis and cluster analysis;then taking China's well-known e-commerce platform Jingdong Mall as an example, by investigating the commodity transaction records of Jingdong Mall in the 4th week of July, the association and cluster analysis of its commodity transaction data were found. Among them, mobile phones and Bluetooth earphone, laptops and Bluetooth earphone, laptops and hard disks have the highest correlation and their confidence thresholds have reached 25%, 35 and 40% respectively. Finally, when the clustering results were tested, they were also found in the store. Strengthening the push and shopping guide of highly relevant product combinations on the website pages will increase the sales of products.
With China's rapid development of e-commerce and logistics, many large electronic business enterprises start to establish large volume warehouses. It takes a long time to distribute the goods every time, so the op...
详细信息
With China's rapid development of e-commerce and logistics, many large electronic business enterprises start to establish large volume warehouses. It takes a long time to distribute the goods every time, so the optimal distribution link can save a lot of time and it has important practical significance. In order to optimize goods inventory and delivery, the electronic commerce goods shopping cart stream data need to be analyzed. In this paper, a novel increment update clustering algorithm, named as IUCStream for commodity stream data analysis in e-commerce and logistics is proposed. In this algorithm, the correlation between goods is calculated and an efficient algorithm processing incremental updating of the data stream of goods is used to cluster different goods into groups. Finally, the algorithms' superiority and effectiveness are verifying by an example.
This paper proposes a new clustering algorithm based on ant colony to solve the unsupervised clustering problem. Ant colony optimization (ACO) is a population-based meta-heuristic that can be used to find approximate ...
详细信息
This paper proposes a new clustering algorithm based on ant colony to solve the unsupervised clustering problem. Ant colony optimization (ACO) is a population-based meta-heuristic that can be used to find approximate solutions to difficult combinatorial optimization problems. clustering Analysis, which is an important method in data mining, classifies a set of observations into two or more mutually exclusive unknown groups. This paper presents an effective clustering algorithm with ant colony which is based on stochastic best solution kept--ESacc. The algorithm is based on Sacc algorithm that was proposed by P.S. Shelokar. It's mainly virtue that best values iteratively are kept stochastically. Moreover, the new algorithm using Jaccard index to identify the optimal cluster number. The results of several times experiments in three datasets show that the new algorithm-ESacc is less in running time, is better in clustering effect and more stable than Sacc. Experimental results validate the novel algorithm's efficiency. In addition, Three indices of clustering validity analysis are selected and used to evaluate the clustering solutions of ESacc and Sacc.
Traditional customer segmentation methods cannot obtain more effective information from massive customer data, which affects the formulation of marketing strategies. Based on this, this study constructs a customer seg...
详细信息
Traditional customer segmentation methods cannot obtain more effective information from massive customer data, which affects the formulation of marketing strategies. Based on this, this study constructs a customer segmentation marketing strategy model that integrates support vector machines and clustering algorithms. This model first utilizes support vector machines to segment existing customer data, and then integrates support vector machines and clustering algorithms to construct a customer segmentation model. Finally, simulation experiments are conducted using the dataset. The results show that the model algorithm obtains the optimal solution when the quantity of iterations is 50. Meanwhile, the average error rate of the model algorithm in the customer segmentation process is 6.82%, the average recall rate is 91.28%, and the average profit predicted by the impact strategy developed by the segmentation model is 29.88%, which is 2.53% different from the true value.
暂无评论