clustering is the most important task in unsupervised learning and clustering validity is a major issue in cluster analysis. In this paper, a new strategy called clustering algorithm Based on Histogram Threshold (HTCA...
详细信息
ISBN:
(纸本)9783642284908
clustering is the most important task in unsupervised learning and clustering validity is a major issue in cluster analysis. In this paper, a new strategy called clustering algorithm Based on Histogram Threshold (HTCA) is proposed to improve the execution time. The HTCA method combines a hierarchical clustering method and Otsu's method. Compared with traditional clustering algorithm, our proposed method would save at least ten several times of execution time without losing the accuracy. From the experiments, we find that the performance with regard to speed up the execution time of the HTCA is much better than traditional methods.
The rapid development of online social networks has allowed users to obtain information, communicate with each other and express different opinions. Generally, in the same social network, users tend to be influenced b...
详细信息
The rapid development of online social networks has allowed users to obtain information, communicate with each other and express different opinions. Generally, in the same social network, users tend to be influenced by each other and have similar views. However, on another social network, users may have opposite views on the same event. Therefore, research undertaken on a single social network is unable to meet the needs of research on hot topic community discovery. "Cross social network" refers to multiple social networks. The integration of information from multiple social network platforms forms a new unified dataset. In the dataset, information from different platforms for the same event may contain similar or unique topics. This paper proposes a hot topic discovery method on cross social networks. Firstly, text data from different social networks are fused to build a unified model. Then, we obtain latent topic distributions from the unified model using the Labeled Biterm Latent Dirichlet Allocation (LB-LDA) model. Based on the distributions, similar topics are clustered to form several topic communities. Finally, we choose hot topic communities based on their scores. Experiment result on data from three social networks prove that our model is effective and has certain application value.
Wireless sensor networks (WSN) are considered as a special type of ad hoc networks, that represent an emerging technology that is having an increasing success in the scientific, logistical, and military areas. It not ...
详细信息
Wireless sensor networks (WSN) are considered as a special type of ad hoc networks, that represent an emerging technology that is having an increasing success in the scientific, logistical, and military areas. It not only realizes benefits for the customer in a technologically sophisticated way, but in addition provides this with high flexibility. However, the size of the sensors is an important limitation, mainly in terms of energy autonomy and lifetime because the battery must be very small. For this reason, many studies are currently focusing on managing the energy consumed by the sensors in the network. With this in mind, we have proposed an algorithm that improves the quality of service based on a clustering approach. In order to confirm the improvements provided by our algorithm, a simulation is done using MATLAB, in which the performance of our algorithm is evaluated and compared with available clustering protocols (LEACH and SEP).
The era of big data has come into our life, the acceleration of mass data growth, people with the naked eye to observe the work in the data pile with the growth of data becomes more and more laborious, data mining tec...
详细信息
The era of big data has come into our life, the acceleration of mass data growth, people with the naked eye to observe the work in the data pile with the growth of data becomes more and more laborious, data mining technology came into being. This paper analyzes the current situation of big data mining, expounds the relevant concepts, characteristics, process and relevant algorithms of data mining, and analyzes the future development direction and trend of data mining technology.
GM_PHD (Gaussian mixture of probability hypothesis density) cannot completely track multiple targets, such as the flying birds in the complex low-altitude airspace near the airport, due to the lack of the steps of bir...
详细信息
GM_PHD (Gaussian mixture of probability hypothesis density) cannot completely track multiple targets, such as the flying birds in the complex low-altitude airspace near the airport, due to the lack of the steps of birth detection, track extraction and death detection. A new algorithm is proposed to solve this problem, which mainly contributes to the following three aspects. First, the k-nearest neighbour algorithm is used to detect the birth of bird targets from measurements which is necessary to construct the birth intensity function. Second, the clustering algorithm is introduced into the probability hypothesis density filter framework to extract the bird targets' tracks from the filtering results. Third, an algorithm is added to detect the death of bird targets for better tracking. The Gaussian mixture implementation of the algorithm denoted as BT_GM_PHD (Bird Tracking GM_PHD) is presented. The test results on simulation and ground-truth data show that the proposed BT_GM_PHD algorithm can effectively track the multiple flying bird targets in the complex low-altitude airspace near the airport, outperforming the GM_PHD filter.
In this article, we prove the relaxed triangle inequality for Southworth and Hawkins, Drummond and Jopek orbital similarity criteria on the set of non-rectilinear Keplerian orbits with the eccentricity bounded above. ...
详细信息
In this article, we prove the relaxed triangle inequality for Southworth and Hawkins, Drummond and Jopek orbital similarity criteria on the set of non-rectilinear Keplerian orbits with the eccentricity bounded above. We give estimates of the minimal coefficients in the inequality for each criterion and show that one of the calculated coefficients is exactly minimal. The obtained inequalities can be used for the acceleration of algorithms involving pairwise distances calculations between orbits. We present an algorithm for calculation of all distances not exceeding a fixed number in a quasi-metric space and demonstrate that the algorithm is faster than the complete calculation on the set of meteors orbits. Finally, we estimate the correlation dimensions of the set of main belt asteroids orbits and meteors orbits with respect to various orbital metrics and quasi-metrics.
The abundant and growing amount of scientific research works and the ease of access to them has caused some abusive exploits from jobber people and illicit use of them in scientific and academic environments. "Pl...
详细信息
ISBN:
(纸本)9781538695692
The abundant and growing amount of scientific research works and the ease of access to them has caused some abusive exploits from jobber people and illicit use of them in scientific and academic environments. "Plagiarism" refers to the use of scientific-research works by others without reference to them correctly. Due to the rapid growth of Persian electronic resources, this paper considers the plagiarism detection in Persian texts. Plagiarism detection consists of two distinct steps: Candidate Retrieval and Text Alignment. The focus of our proposed method is on both steps. In the first step, using a Convolutional Neural Network (CNN), a vector representation is created in document-level and then, the candidate documents are retrieved using the k-means clustering algorithm. In order to align text, the features are extracted at the sentence-level using a CNN. Finally, using the classification algorithms, the copied sentences are detected. Experiments were performed on the prepared corpus in the AAI competition and the prepared corpus in the PAN2015 competition. The achieved precision and recall are 0.843 and 0.806 for the first corpus and 0.833 and 0.826 for the second one respectively.
BIRCH algorithm is a hierarchical clustering method which is suitable for clustering very large datasets especially. The traditional BIRCH algorithm uses distance to control the shape of clusters. However, the cluster...
详细信息
ISBN:
(纸本)9781450363532
BIRCH algorithm is a hierarchical clustering method which is suitable for clustering very large datasets especially. The traditional BIRCH algorithm uses distance to control the shape of clusters. However, the clustering results effect of the non-spherical dataset is not good, and in some cases the non-spherical clusters are divided into different clusters. In order to break through such limitation, this paper presents an improved BIRCH algorithm based on Link, which draws on the link concept of ROCK algorithm. Experiments show that the improved algorithm can cluster any shape clusters.
Multi-objective route planning is a hot issue in current research, and it applies all aspects of life. With the expansion of the scale of the problem, large numbers of approximate algorithms and heuristic algorithms p...
详细信息
ISBN:
(纸本)9781538676356
Multi-objective route planning is a hot issue in current research, and it applies all aspects of life. With the expansion of the scale of the problem, large numbers of approximate algorithms and heuristic algorithms proposed to solve the problem. In this paper, a solution of a multi objective route planning with a balanced assignment of tasks is proposed. The solution can divide into two steps. First, a clustering algorithm cbk-means (cluster balance k-means) is proposed, which improves the similarity measurement in the clustering process, and overcomes the shortcomings of traditional k-means algorithm, such as uncertain number of points and inflexible measurement criteria, which is the key step to achieve fair assignment of tasks. Second, this paper use genetic algorithm to obtain an optimal route planning for each cluster. Experimental results show that the cbk-means algorithm makes the workload of each cluster more balanced at the expense of negligible cost, which improves the fairness of task assignment greatly. Besides, this hybrid solution can save computational time and get better results.
Reasonable route planning for taxi can not only improve quality of customer experience, but also maximize the benefit of taxi drivers. Most current taxi planning schemes are designed to achieve the shortest route or t...
详细信息
ISBN:
(纸本)9781728105482
Reasonable route planning for taxi can not only improve quality of customer experience, but also maximize the benefit of taxi drivers. Most current taxi planning schemes are designed to achieve the shortest route or the shortest time, not to achieve the best profit. In this paper, we propose a route planning scheme with best profit for taxi (RPSBPT). First, we define the optimal profit point and the profit per unit time function. Second, we design the workflow of data cleaning, sampling and partitioning for preprocessing the dataset of taxi trajectory. Then, we integrate the DBSCAN algorithm and the K-means algorithm to obtain the optimal profit points. Finally, the simulate anneal algorithm (SA), the genetic algorithm (GA), and the ant colony optimization algorithm (ACO) are adopted respectively to plan route for taxi. We constructed the taxi route planning prototype system and applied the proposed route planning scheme to the system. Based on the system and the collected taxi trajectory data at the Jinjiang district of the Chengdu city, we performed a series of experiments to compare the performance of three heuristic algorithms, including optimal route length, algorithm stability, total profit and profitability per unit time. Experimental results show that ACO has the best performance.
暂无评论