Association Rule mining (ARM) is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using d...
详细信息
Association Rule mining (ARM) is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Most ARM algorithms focus on a sequential or centralized environment where no external communication is required. distributed ARM algorithms (DARM), aim to generate rules from different data sets spread over various geographical sites; hence, they require external communications throughout the entire process. DARM algorithm efficiency is highly dependent on data distribution. The Classical algorithms used in DARM are Count Distribution Algorithm (CDA), Fast distributedmining (FDM) Algorithm and Optimized distributed Association mining (ODAM) Algorithm. This paper presents the implementation details and experimental results of above mentioned algorithms. The paper also highlights the issues of message exchange size in a distributed environment of current DARM algorithms that can affect the communication costs in a distributed environment.
Aiming at the severe energy and computing resource constraints of Wireless Sensor Network(WSN),based on rough set theory and ART2 network,a distributed data mining model for WSN is *** model poses a three-layer MLP fo...
详细信息
Aiming at the severe energy and computing resource constraints of Wireless Sensor Network(WSN),based on rough set theory and ART2 network,a distributed data mining model for WSN is *** model poses a three-layer MLP for data aggregation in the clustered sensor network. And the input layer neuron and the first layer neuron are located in every cluster member,while the second layer neuron and the output layer neuron are located in every cluster head. The features of the training samples were extracted to build up the decision table;the rough set theory was applied to reduce the decision ***,the reduced decision attributes were used to construct ART2 neural network classification data. Constructed datamining algorithm can be integrated in each sensor network *** results prove data dimension is reduced and data redundancy is eliminated after the raw-data is processed by datamining algorithm,and the communication traffic is decreased and the life of WSN is extended.
The article describes a approach of parallel datamining algorithms to be executed on multicore processors of various architecture. The suggested method presents an algorithm as a consequence of pure functions with un...
详细信息
ISBN:
(纸本)9781479989997
The article describes a approach of parallel datamining algorithms to be executed on multicore processors of various architecture. The suggested method presents an algorithm as a consequence of pure functions with unified interfaces. For parallel execution additional functions are introduced to share data and models between the parallel threads. Besides such functions allow to obtain various parallel algorithm structures and implement various strategies of execution for different environment conditions. Application of the described method is illustrated through algorithm Naive Bayes.
In this paper, we propose differentially private protocols for Naive Bayes classification over distributeddata. Compared with existing works, the privacy and security models in the proposed protocols are stronger: fi...
详细信息
ISBN:
(纸本)9783319251592;9783319251585
In this paper, we propose differentially private protocols for Naive Bayes classification over distributeddata. Compared with existing works, the privacy and security models in the proposed protocols are stronger: firstly, both the miner and parties can be arbitrarily malicious and can collude with each other to violate the remaining honest parties privacy;secondly, all communication channels between them can be assumed to be insecure. Specifically, we build a guarantee of differential privacy into the cryptographic construction so that the proposed protocols can tolerate collusions and resist eavesdropping attacks which are caused by insecure communication channels. Additionally, the proposed protocols can be implemented at lower computation and communication costs, and some extensions to our protocols (e.g. supporting parties dynamic joins or leaves) are also proposed in this paper. Both theoretical analysis and simulation results show that the proposed privacy-preserving protocols for Naive Bayes have strong security and better classification performance than the standard one.
When inducing Decision Trees, Windowing consists in selecting a random subset of the available training instances (the window) to induce a tree, and then enhance it by adding counter examples, i.e., instances not cove...
详细信息
ISBN:
(纸本)9781614995784;9781614995777
When inducing Decision Trees, Windowing consists in selecting a random subset of the available training instances (the window) to induce a tree, and then enhance it by adding counter examples, i.e., instances not covered by the tree, to the window for inducing a new tree. The process iterates until all instances are well classified or no accuracy is gained. In favorable domains, the technique is known to speed up the induction process, and to enhance the accuracy of the induced tree;while reducing the number of training instances used. In this paper, a Windowing based strategy exploiting an optimized search of counter examples through the use of GPUs is introduced to cope with distributed data mining (DDM) scenarios. The strategy is defined and implemented in JaCa-DDM, a novel system founded on the Agents & Artifacts paradigm. Our approach is well suited for DDM problems generating large amounts of training instances. Some experiments in diverse domains compare our strategy with the traditional centralized approach, including an exploratory case study on pixel-based segmentation for the detection of precancerous cervical lesions on colposcopic images.
This article describes the approach to building datamining cloud service based on actor model. The article describes the mapping of the algorithm decomposed into functional blocks on the set of actors. Also it descri...
详细信息
ISBN:
(纸本)9783319231266;9783319231259
This article describes the approach to building datamining cloud service based on actor model. The article describes the mapping of the algorithm decomposed into functional blocks on the set of actors. Also it describes the architecture and implementation of cloud service to perform datamining algorithms for actors. As an example, it describes the implementation and experiments with neural network learning algorithm on the cluster actors.
Frequent Itemset mining (FIM) is a very effective method for knowledge acquisition from data, but with the advent of the era of big data, traditional algorithms based on memory are facing severe challenges such as the...
详细信息
ISBN:
(纸本)9781467377232
Frequent Itemset mining (FIM) is a very effective method for knowledge acquisition from data, but with the advent of the era of big data, traditional algorithms based on memory are facing severe challenges such as the computation speed and storage capacity. Fortunately, MapReduce model provides an efficient framework for distributed programming and operation framework. This paper proposes a novel MapReduce-based H-mine algorithm (MRH-mine), a version of H-mine algorithm in the distributed operation environment. Experimental results show that MRH-mine algorithm has a better performance and scalability than traditional H-Mine when facing massive data growth.
distributed data mining techniques are widely used for many applications viz;marketing, decision making, statistical analysis etc. In distributeddata environment, each of the involving sites contains local informatio...
详细信息
ISBN:
(纸本)9781479987924
distributed data mining techniques are widely used for many applications viz;marketing, decision making, statistical analysis etc. In distributeddata environment, each of the involving sites contains local information which will be collaborated to extract global mining result. However, these techniques have been investigated in terms of privacy and security concerns of individual site's information. To solve this problem, many cryptography techniques have been investigated. Still there is a room for further improvement. In this paper, we propose an efficient approach for privacy preserving distributed association rule mining. We use onion routing protocol in order to exchange information among involving sites. We use an elliptic curve (EC) based cryptography in order to achieve security and privacy of individual site's information in unsecured distributed environment. Finally, we analyze proposed solution in terms of security, privacy, computational cost and communication cost.
Consolidation of virtual machines (VM) is one of the key strategies used to reduce the power consumption of Cloud servers. For this reason it is extensively studied. Nevertheless, the effectiveness of a consolidation ...
详细信息
ISBN:
(纸本)9781479984909
Consolidation of virtual machines (VM) is one of the key strategies used to reduce the power consumption of Cloud servers. For this reason it is extensively studied. Nevertheless, the effectiveness of a consolidation strategy strongly depends on the forecast of the VM resource needs. This paper describes the design and development of a system for energy-aware allocation of virtual machines, driven by predictive datamining models. In particular, migrations are driven by the forecast of the future computational needs (CPU, RAM) of each virtual machine, in order to efficiently allocate those on the available servers. Experimental results, performed on data of a real Cloud data center, show encouraging benefits in terms of energy saving.
This article describes an approach to parallelizing of datamining algorithms, implemented in functional programming language, for distributeddata processing in cluster. Here are provided requirements for the functio...
详细信息
ISBN:
(纸本)9783319231266;9783319231259
This article describes an approach to parallelizing of datamining algorithms, implemented in functional programming language, for distributeddata processing in cluster. Here are provided requirements for the functions which form these algorithms for their conversion into parallel type. As an example we describe Naive Bayes algorithm implementation in Common Lisp language, its conversion into parallel type and execution on cluster with MPI system.
暂无评论