Many kinds of huge amount of tweets about real-world events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledg...
详细信息
ISBN:
(纸本)9781479941698
Many kinds of huge amount of tweets about real-world events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge effectively. To solve the problem, we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in twitter. First, we use canopy cluster algorithm to cluster tweets, canopy cluster algorithm could divides a tweet into different clusters, and the tweet which only belongs to one cluster will be labeled. Second, the mechanism of label propagation is used to label the tweets that in the overlapping of different clusters. In order to evaluate our algorithm, we use two baseline algorithms, LDA (Latent Dirichlet Allocation) and Single-Pass cluster algorithm. We apply three algorithms on tweet dataset with three topics and some noisy data, and experiment results show our method outperforms other algorithms on precision and recall rate.
In wireless sensor network, repairing packet loss lead to increasing of the network load and substantial energy consumption. In order to reduce the network load, most of transport protocols employ a Cumulative Acknowl...
详细信息
In wireless sensor network, repairing packet loss lead to increasing of the network load and substantial energy consumption. In order to reduce the network load, most of transport protocols employ a Cumulative Acknowledgment (CACK) policy for received packets. In this paper, we propose a cluster-based Packet Acknowledgement (CPA) approach that naturally reduces more ACKs than traditional approaches of cumulating multi-packets of multi-nodes. A distributed greedy clustering algorithm is employed to form clusters with the minimum number. Different strategies are used to reduce the amount of packet ACKs in intra-cluster and inter-cluster respectively. And a scalable Bloom Filter is used to fix the length of the CACK, which can effectively save the storage. We also discuss a guarantee condition for the reduction of packet ACK path length. Our simulation results show that the CPA approach effectively increases the packet delivery ratio, and also reduces both the average end-to-end delay and the total length of ACKs, which thus decreases the network load.
Many kinds of huge amount of tweets about realworld events are generated everyday in ***,the disorganization messages required to be classified by topics and events are one of challenges to get knowledge *** solve the...
详细信息
ISBN:
(纸本)9781479941681
Many kinds of huge amount of tweets about realworld events are generated everyday in ***,the disorganization messages required to be classified by topics and events are one of challenges to get knowledge *** solve the problem,we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in ***,we use canopy cluster algorithm to cluster tweets,canopy cluster algorithm could divides a tweet into different clusters,and the tweet which only belongs to one cluster will be ***,the mechanism of label propagation is used to label the tweets that in the overlapping of different *** order to evaluate our algorithm,we use two baseline algorithms,LDA (Latent Dirichlet Allocation) and Single-Pass cluster *** apply three algorithms on tweet dataset with three topics and some noisy data,and experiment results show our method outperforms other algorithms on precision and recall rate.
Many kinds of huge amount of tweets about realworld events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge...
详细信息
Many kinds of huge amount of tweets about realworld events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge effectively. To solve the problem, we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in twitter. First, we use canopy cluster algorithm to cluster tweets, canopy cluster algorithm could divides a tweet into different clusters, and the tweet which only belongs to one cluster will be labeled. Second, the mechanism of label propagation is used to label the tweets that in the overlapping of different clusters. In order to evaluate our algorithm, we use two baseline algorithms, LDA(Latent Dirichlet Allocation) and Single-Pass cluster algorithm. We apply three algorithms on tweet dataset with three topics and some noisy data, and experiment results show our method outperforms other algorithms on precision and recall rate.
New developments of computer science and information technology make it possible to realize mass customization in apparel industry. These years many scholars and researchers have paid attention to classify and charact...
详细信息
ISBN:
(纸本)9787506456357
New developments of computer science and information technology make it possible to realize mass customization in apparel industry. These years many scholars and researchers have paid attention to classify and characterize human body shape to accelerate the process of making paper patterns. This paper based on an anthropometric survey of 280 Chinese women aged from 18 to 50 by [TC] 2 non-contact 3D body scanning system. By means of principal component analysis (PCA), 39 measurement items were transformed into 7 uncorrelated principal factors. Furthermore, these principal factors' were given professional definitions according to their eigenvectors. In order to make effective criteria for classifying female body type, all samples are sorted by these factors using dynamic samples cluster algorithm. In conclusion, it provides a new way to study female body type and will be useful to further somatotype research and practical garment manufacturing for mass customization.
Background: We present the algorithm PFClust (Parameter Free clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to b...
详细信息
Background: We present the algorithm PFClust (Parameter Free clustering), which is able automatically to cluster data and identify a suitable number of clusters to group them into without requiring any parameters to be specified by the user. The algorithm partitions a dataset into a number of clusters that share some common attributes, such as their minimum expectation value and variance of intra-cluster similarity. A set of n objects can be clustered into any number of clusters from one to n, and there are many different hierarchical and partitional, agglomerative and divisive, clustering methodologies available that can be used to do this. Nonetheless, automatically determining the number of clusters present in a dataset constitutes a significant challenge for clustering algorithms. Identifying a putative optimum number of clusters to group the objects into involves computing and evaluating a range of clusterings with different numbers of clusters. However, there is no agreed or unique definition of optimum in this context. Thus, we test PFClust on datasets for which an external gold standard of 'correct' cluster definitions exists, noting that this division into clusters may be suboptimal according to other reasonable criteria. PFClust is heuristic in the sense that it cannot be described in terms of optimising any single simply-expressed metric over the space of possible clusterings. Results: We validate PFClust firstly with reference to a number of synthetic datasets consisting of 2D vectors, showing that its clustering performance is at least equal to that of six other leading methodologies - even though five of the other methods are told in advance how many clusters to use. We also demonstrate the ability of PFClust to classify the three dimensional structures of protein domains, using a set of folds taken from the structural bioinformatics database CATH. Conclusions: We show that PFClust is able to cluster the test datasets a little better, on average, than
A systematic method for the analysis of the hydration structure of proteins is demonstrated on the case study of lysozyme. The method utilises multiple structural data of the same protein deposited in the protein data...
详细信息
A systematic method for the analysis of the hydration structure of proteins is demonstrated on the case study of lysozyme. The method utilises multiple structural data of the same protein deposited in the protein data bank. clusters of high water occupancy are localised and characterised in terms of their interaction with protein. It is shown that they constitute a network of interconnected hydrogen bonds anchored to the protein molecule. The high occupancy of the clusters does not directly correlate with water-protein interaction energy as was originally hypothesised. The highly occupied clusters rather correspond to the nodes of the hydration network that have the maximum number of hydrogen bonds including both the protein atoms and the surrounding water clusters. Copyright (c) 2013 John Wiley & Sons, Ltd.
E-government in China has entered the development stage of personalized services, and user segmentation has become an urgent demand. On the basis of systematic interpretation of e-government development stages, in thi...
详细信息
E-government in China has entered the development stage of personalized services, and user segmentation has become an urgent demand. On the basis of systematic interpretation of e-government development stages, in this article, the authors introduce CRM and customer segmentation concept into e-government areas, construct e-government user segmentation model, and obtain user segmentation results by empirical analysis. Comparing with existing segmentation methods based on experience, because of the introduction of customer segmentation concept and K-means algorithm, e-government user segmentation model presented makes segmentation more scientific and reasonable and can adjust dynamically as the user needs change, continuously improving.
HTTP-flooding attack disables the victimized web server by sending a large number of HTTP Get requests. Recent research tends to detect the attacks with the anomaly-based approaches, which detect the HTTP-flooding by ...
详细信息
ISBN:
(纸本)9781467360500
HTTP-flooding attack disables the victimized web server by sending a large number of HTTP Get requests. Recent research tends to detect the attacks with the anomaly-based approaches, which detect the HTTP-flooding by modeling the behavior of normal web users. However, most of the existing anomaly-based detection approaches usually cannot filter the web crawling traces of the unknown search bots mixed in the normal web browsing logs. These web-crawling traces can bias the detection model in the training phase, thus further influencing the performance of the anomaly-based detection schemes. This paper proposes a novel anomaly-based HTTP-flooding detection scheme (HTTP-sCAN), which can eliminate the influence of the web-crawling traces with the cluster algorithm. The simulation results show that HTTP-sCAN is immune to the interferences of unknown search sessions, and can detect all HTTP-flooding attacks.
Burnt areas as a result of wildfires can be readily detected from high resolution aerial photographs or satellite imagery of the zone that includes the wildfire. Moderate resolution remote sensing data, as provided by...
详细信息
ISBN:
(纸本)9780819496386
Burnt areas as a result of wildfires can be readily detected from high resolution aerial photographs or satellite imagery of the zone that includes the wildfire. Moderate resolution remote sensing data, as provided by MODIS, can also be used to detect active or past wildfires, most usually from daily records of a suitable combination of reflectance bands. The objective of the present work was to test some simple algorithms and variations for automatic blind detection of burnt areas from MODIS biweekly vegetation indices time series data. MODIS derived NDVI 250m time series data for the Valencia region, Southeast Spain, were subjected to a two-steps process for the detection of candidate burnt areas, and the results compared with the record of wildfires with affected area greater than 100 hectares. For each pixel and date in the data series, a model was fitted to both the previous and posterior time series data. Discrepancies or jumps between the pre-and post-models exceeding a certain threshold were used as seeds to define cluster of pixels, the candidate burnt areas, with similarities between pixels either from their extreme discrepancy dates or from their parameters in the fitted models. Results using a simple combination of a constant fitted model and pixel similarity from jump dates were in good agreement with the perimeters of the actual burnt areas. A computationally efficient implementation of the method was developed using a digital filter type approach.
暂无评论