Crime analysis has been widely studied, but problem of identifying conspirators through communication network analysis is still not well resolved. In this paper, we proposed a fuzzy clustering algorithm to detect hidd...
详细信息
ISBN:
(纸本)9781479958771
Crime analysis has been widely studied, but problem of identifying conspirators through communication network analysis is still not well resolved. In this paper, we proposed a fuzzy clustering algorithm to detect hidden criminals from topic network, which took no use of individuals' prior identity information. We first built up a local suspicion calculation from nodes' neighboring information (node and edge);and then with global information, we employed the fuzzy k-means clustering algorithm, and made the membership to suspicious group as the global suspicion degree. Experiments showed it works well on identification: known suspects gained relative high values and known innocents got relative low values.
Affinity is common among Virtual Machines (VMs) in cloud environments. If VMs collaborating on a job are split in geographically distributed clouds, the low bandwidth and high latency inter-cloud communication via a w...
详细信息
ISBN:
(纸本)9781479940875
Affinity is common among Virtual Machines (VMs) in cloud environments. If VMs collaborating on a job are split in geographically distributed clouds, the low bandwidth and high latency inter-cloud communication via a wide area network (WAN) will dramatically degrade the system performance. A potential solution is migrating all of the VMs collaborating on a job in parallel, so as to avoid wide area communication. However, if the job is too large, it becomes impractical to migrate all of the VMs simultaneously due to limited WAN bandwidth and high block dirty rate. We propose a migration optimization mechanism called Clique Migration to partition a large group of VMs into subgroups based on the traffic affinities among VMs. Then, subgroups are migrated one at a time. Based on Clique Migration, we propose and implement two algorithms called R-Min-Cut and Kmeans-SF. Analysis of the traffic trace of 68 VMs in an IBM production cluster shows that our algorithms can reduce inter-cloud traffic by 25% to 60%, when the degree of parallel migration is from 2 to 32. Tests of MPI multi-PingPing benchmark running on simulated inter-cloud environments, show that our algorithms can significantly shorten the period during which applications undergo performance degradation. Tests of MPI Reduce scatter benchmark show that R-Min-Cut can keep the performance during migration at 26% to 75% of the non-migration scenario.
This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clu...
详细信息
ISBN:
(纸本)9781479959969
This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clustering, consisting of a preliminary color quantization, histogram extraction and selection of frequently occurring colors of the image. These selected colors will be clustered by tested c-means algorithms. Clustering is followed by another common step, which creates the output image. Besides conventional hard (HCM) and fuzzy c-means (FCM) clustering, the so-called generalized improved partition FCM algorithm, and several versions of the suppressed FCM (s-FCM) in its conventional and generalized form, are included in this study. Accuracy is measured as the average color difference between pixels of the input and output image, while efficiency is mostly characterized by the total runtime of the performed color reduction. Numerical evaluation found all enhanced FCM algorithms more accurate, and four out of seven enhanced algorithms faster than FCM. All tested algorithms can create reduced color images of acceptable quality.
In many social networks, people interact based on their interests. Community detection algorithms are then useful to reveal the sub-structures of a network and help us find interest groups. Identifying these social co...
详细信息
ISBN:
(纸本)9781479958771
In many social networks, people interact based on their interests. Community detection algorithms are then useful to reveal the sub-structures of a network and help us find interest groups. Identifying these social communities can bring benefit to understanding and predicting users behaviors. However, for some kind of online community sites such as question-and-answer (Q&A) sites or forums, there is no friendship based social network structure, which means people are not aware who they are in contact with. Therefore, many traditional community detection techniques do not apply directly. In this paper, we propose an empirical approach for extracting data from Q&A sites suitable to apply community detection methods. Then we compare three kinds of community detection methods we applied on a dataset extracted from the popular Q&A site StackOverflow. We analyze and comment the results of each method.
In this paper we outline important differences between (1) protein interaction networks and (2) social and other complex networks, in terms of fine-grained network community profiles. While these families of networks ...
详细信息
ISBN:
(纸本)9781479958771
In this paper we outline important differences between (1) protein interaction networks and (2) social and other complex networks, in terms of fine-grained network community profiles. While these families of networks present some general similarities, they also have some stark differences in the way the communities are formed. Namely, we find that the sizes of the best communities in such biological networks are an order of magnitude smaller than in social and other complex networks. We furthermore find that the generative model describing biological networks is very different from the model describing social networks. While for latter the Forest-Fire model best approximates their network community profile, for biological networks it is a random rewiring model that generates networks with the observed profiles. Our study suggests that these families of networks should be treated differently when deriving results from network analysis, and a fine-grained analysis is needed to better understand their structure.
Community detection and influence analysis are significant notions in social networks. We exploit the implicit knowledge of influence-based connectivity and proximity encoded in the network topology, and propose a nov...
详细信息
ISBN:
(纸本)9781479958771
Community detection and influence analysis are significant notions in social networks. We exploit the implicit knowledge of influence-based connectivity and proximity encoded in the network topology, and propose a novel algorithm for both community detection and influence ranking. Using a new influence cascade model, the algorithm generates an influence vector for each node, which captures in detail how the node's influence is distributed through the network. Similarity in this influence space defines a new, meaningful and refined connectivity measure for the closeness of any pair of nodes. Our approach not only differentiates the influence ranking but also effectively finds communities in both undirected and directed networks, and incorporates these two important tasks into one integrated framework. We demonstrate its superior performance with extensive tests on a set of real-world networks and synthetic benchmarks.
Clinical chemistry tests are widely used in medical diagnosis. Physicians typically interpret them in a univariate sense, by comparing each parameter to a reference interval, however, their correlation structure may a...
详细信息
ISBN:
(纸本)9781479959969
Clinical chemistry tests are widely used in medical diagnosis. Physicians typically interpret them in a univariate sense, by comparing each parameter to a reference interval, however, their correlation structure may also be interesting, as it can shed light on common physiologic or pathological mechanisms. The correlation analysis of such parameters is hindered by two problems: the relationships between the variables are sometimes non-linear and of unknown functional form, and the number of such variables is high, making the use of classical tools infeasible. This paper presents a novel approach to address both problems. It uses an information theory-based measure called total correlation to quantify the dependence between clinical chemistry variables, as total correlation can detect any dependence between the variables, non-linear or even non-monotone ones as well, hence it is completely insensitive to the actual nature of the relationship. Another advantage is that is can quantify dependence not only between pairs of variables, but between larger groups of variables as well. By the virtue of this fact, a novel approach is presented that can handle the high dimensionality of clinical chemistry parameters. The approach is implemented and illustrated on a real-life database from the representative US public health survey NHANES.
The traffic classification is the foundation for many network activities, such as Quality of Service (QoS), security monitoring, Lawful Interception and Intrusion Detection Systems (IDS). A recent statistics-based app...
详细信息
ISBN:
(纸本)9781479937806
The traffic classification is the foundation for many network activities, such as Quality of Service (QoS), security monitoring, Lawful Interception and Intrusion Detection Systems (IDS). A recent statistics-based approach to address the unsatisfactory results of traditional port-based and payload-based approaches has attracted attention. However, the presence of non-informative attributes and noise instances degrade the performance of this approach. Thus, to address this problem, in this paper, we propose a hybrid clustering-classification approach (called CluClas) to improve the accuracy and efficiency of network traffic classification by selecting informative attributes and representative instances. An extensive empirical study on four traffic data sets shows the effectiveness of our proposed approach.
Community detection is one of the most important problems in social network analysis in the context of the structure of the underlying graphs. Many researchers have proposed their own methods for discovering dense reg...
详细信息
ISBN:
(纸本)9781479958771
Community detection is one of the most important problems in social network analysis in the context of the structure of the underlying graphs. Many researchers have proposed their own methods for discovering dense regions in social networks. Such methods are only designed with links of the underlying social network. However, with the development of recent applications, rich edge content can be available to give another view to the community detection process. In this study, we focus on improving community detection with the edge content in social networks. In order to regulate the effect of both linkage structure and edge content, we propose two feature integration strategies. Experiment results illustrate that the presence of edge content provides unprecedented opportunities and flexibility for the community detection process.
Agglutinative languages, such as Hungarian, use inflection to modify the meaning of words. Inflection is a string transformation which describe how can a word converted into its inflected form. The transformation can ...
详细信息
ISBN:
(纸本)9781479959969
Agglutinative languages, such as Hungarian, use inflection to modify the meaning of words. Inflection is a string transformation which describe how can a word converted into its inflected form. The transformation can be described by a transformational string. The words can be classified by their transformational string, so inflection is considered as a classification. Linear separability of clusters is important to create an efficient and accurate classification method. This paper review a linear programming based testing method of linear separability. This method was analyzed on generated data sets, these measurements showed the time cost of the algorithm grows polynomially with the number of the points. The accusative case of Hungarian was used to create a data set of 56.000 samples. The words were represented in vector space by alphabetical and phonetic encoding and left and right adjust, thus four different representation of words were used during the tests. Our test results showed there are non linear separable cluster pairs in both of the representations.
暂无评论