Knowledge discovery on social network data can benefit general public, since these data contain latent social trends and valuable information. Recent research finds that preserving data privacy plays a vital role in k...
详细信息
Knowledge discovery on social network data can benefit general public, since these data contain latent social trends and valuable information. Recent research finds that preserving data privacy plays a vital role in knowledge discovery. Therefore, social network data need to be anonymized to preserve users' identity before the data can be released for research purposes. In this paper, we model social network data as directed graphs with signed edge weights;formally define privacy, attack models for the anonymization problem. Based on our analysis, we develop a graph anonymization approach. The other main contribution is our graph clustering algorithm which can effectively group similar graph nodes into clusters with minimum cluster size constraints. Finally, we carry out a series of experiments to evaluate the effectiveness and utility of our approach on anonymizing social network data.
Mobile Ad-Hoc networks (MANET) are widely used for data communication between mobile nodes on the digital battlefield. clustering has evolved as an important research topic in military MANETs in reference to supportin...
详细信息
ISBN:
(纸本)9788393484850
Mobile Ad-Hoc networks (MANET) are widely used for data communication between mobile nodes on the digital battlefield. clustering has evolved as an important research topic in military MANETs in reference to supporting the ability of C4I system reconfiguration in highly dynamic operation. clustering is a process that divides the network nodes into groups according to the command system. Each cluster has a Cluster Head (CH) as a coordinator within the cluster. For policy-based MANET Cognitive Radio (CR) it is very important to have an appropriate set of policies according to the variety of specific scenarios of the data network usage. A few algorithms have been recently studied and proposed for distributed clustering. The presented algorithm is based on the multi-criteria decision using the weighted function of the main network parameters. An appropriate selection of weights values for each particular scenario is critical for the effectiveness of MANET-CR. As the simulations have shown, the number of clusters within the network and the number of nodes in the clusters determine the ability of CR adaptation according to better usage of the available spectrum resources. The performances of the proposed algorithm has been evaluated through the simulations using OMNeT++ simulator and the results are encouraging.
Filtering is used in intrusion detection to remove the insignificant events from a log to facilitate the analysis method to focus on the significant events and to minimize processing overhead. Generally, filtering is ...
详细信息
Filtering is used in intrusion detection to remove the insignificant events from a log to facilitate the analysis method to focus on the significant events and to minimize processing overhead. Generally, filtering is performed using filtering rules, which are framed using a set of data (training data), or the known facts on anomalous events. This knowledge-dependent nature confines the filterer to filter-in only the recognized anomalies in the logs, making the rest unavailable for further scrutiny. This problem has been addressed earlier by designing a filterer that manipulates the tested log data based on the patterns and volume of events to calculate the filtering threshold. Even though this filtering threshold was able to retain the anomalous events in most heterogeneous logs, it failed when such events were of high volume and also due to the inaccuracies in cluster formation. Therefore, this paper proposes a refined filterer for unsupervised heterogeneous anomaly detection that retains most anomalous events irrespective of its volume in the logs and also discusses the impact of the refined filterer in supporting the detection. The experiment conducted reveals that the refined filterer retained almost all the abnormal events thereby enabling the detection of maximum anomalies. Copyright (c) 2016 John Wiley & Sons, Ltd.
By analyzing large-scale number of human behavior data, we propose a new parallel and distributed algorithms for social role discovery based on dynamic and fine-grained human behavior attributes in social networks. We...
详细信息
By analyzing large-scale number of human behavior data, we propose a new parallel and distributed algorithms for social role discovery based on dynamic and fine-grained human behavior attributes in social networks. We first mining and propose number of properties that on behalf of human behavior. After that, to deal with the large human behavior data, a simple, scalable and distributed parallel clustering algorithm based on grid and density is developed. The theoretical analysis and experimental results show that the algorithm has better efficiency and effectiveness, and algorithms reveals valuable discovery on the real-life datasets. Besides, the methodology in this paper for user role discovery also can be applied to social networks in general.
With the continuous development of data mining technology, to apply the data mining techniques to transportation sector will provide service to transportation scientifically and reasonably. In intelligent transportati...
详细信息
ISBN:
(纸本)9783642240966
With the continuous development of data mining technology, to apply the data mining techniques to transportation sector will provide service to transportation scientifically and reasonably. In intelligent transportation, the analysis of traffic flow data is very important, how to analyze the traffic data intelligently is more difficult problem, so using a new data mining techniques to replace the traditional data analysis and interpretation methods is very necessary and meaningful, clustering algorithm is the collection of physical or abstracting objects into groups of similar objects from the multiple classes of processes. This paper describes all kinds of the data mining clustering algorithms, clustering algorithm is proposed in the method of dealing with traffic flow data, and applied to the actual traffic data processing, and finally the clustering algorithm is applied to each of highway toll station Various types of car traffic volume data analysis.
Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excels in network topology management, ene...
详细信息
Routing technology at the network layer is pivotal in the architecture of wireless sensor networks. As an active branch of routing technology, cluster-based routing protocols excels in network topology management, energy minimization, data ag- gregation and so on. For sensor nodes in wireless sensor networks energy are limited, this paper, based on the physical location of wireless sensor node in the target area, used the division method of sector to classify nodes that have some connection in the physical location into the same cluster ,that have little connection in the physical location into different clusters, to achieve the clustering of wireless sensor networks. The algorithm reduces the implementation of clustering algorithm to bring the energy and computational overhead, maintaining a stable network topology to extend the network lifetime.
A Gaussian point-adaptive grouping scheme for the multilevel fast multipole algorithm (MLFMA) has been proposed in this communication to significantly reduce the heavy memory cost resulted from using the basis functio...
详细信息
A Gaussian point-adaptive grouping scheme for the multilevel fast multipole algorithm (MLFMA) has been proposed in this communication to significantly reduce the heavy memory cost resulted from using the basis functions defined on large patches without sacrificing the accuracy of the numerical solutions. The grouping process in the MLFMA has been considered as a single-objective optimization problem and is solved by using the clustering algorithm for Gaussian quadrature points on each patch. Meanwhile, the constraint of the addition theorem for MLFMA is still satisfied. As a result, the presented scheme is able to acquire an optimal number of multipoles for each basis function. Compared to the conventional octree grouping scheme or other grouping schemes used in the past, the method in this communication is advantageous in terms of the memory efficiency and the solution accuracy. Numerical examples have been given to demonstrate the validity and effectiveness of the proposed scheme.
Information visualization is essential for improving effectiveness and efficiency of data exploration and knowledge discovery. Therefore, visualization has been used in a wide range of fields from biology, medicine, c...
详细信息
ISBN:
(纸本)9781509006199
Information visualization is essential for improving effectiveness and efficiency of data exploration and knowledge discovery. Therefore, visualization has been used in a wide range of fields from biology, medicine, criminal activity analysis to business and education. Information visualization has become more important than ever as the amount of data being generated has increased dramatically in recent years. One of the major difficulties of information visualization is performance, and this is even more critical when visualizing big data. One potential solution to this challenge is data sampling while maintaining fidelity of visual representation. In this paper, we propose two new centrality clustering-based sampling approaches that apply centrality measures on clusters of data points in order to make more informed sampling than random sampling approaches. We evaluate the new methods on graph data sets. The results show that the new methods significantly outperform existing data sampling methods in term of perceived differences and their ability to preserve essential visual information. Moreover computational complexity is comparable or even better than simple random sampling methods.
This study investigates a proposal of new Bayesian network model for the diagnosis of the most frequent breast pathologies and their implementation under a medical diagnostic system as part of maintenance. It consists...
详细信息
This study investigates a proposal of new Bayesian network model for the diagnosis of the most frequent breast pathologies and their implementation under a medical diagnostic system as part of maintenance. It consists in reproducing the process of doctor's diagnosis that allows the identification of a disease through its symptoms. The proposed Bayesian network allows a representation of qualitative and quantitative knowledge expressing the uncertainty divided into four levels: clinical level, medical imaging level, biological level and diagnostic level. Bayesian networks are used to calculate the probabilities of the most likely a posteriori or causes, of an observed anomaly by using the clustering algorithm proposed by GeNIe tool and should be sufficient for our application. In order to improve the performances of the system and due to errors in the construction of the model which is supplied a priori by an expert, and the changes in the dynamics domains, we propose the maintenance of BN that implements the policies of updating a fixed structure and considers its reorganization by defining supplementary variables noted as maintenance actions that could be add them or deleted and their values can be edited.
Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such pop...
详细信息
Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such population structure inferences are routinely investigated via the program structure implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical data sets. In this study, I used simulated and empirical microsatellite data sets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward-biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, while at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled data sets. Additionally, a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.
暂无评论