检索结果-内蒙古大学图书馆

Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks

NEURAL COMPUTING & APPLICATIONS 2018年第2期29卷 593-612页

作者： Azimi, Rasool Sajedi, Hedieh Islamic Azad Univ Qazvin Branch Young Researchers & Elite Club Qazvin Iran Univ Tehran Coll Sci Sch Math Stat & Comp Sci Dept Comp Sci Tehran Iran

Clustering, as an unsupervised learning method and an important process in data mining, is an aspect of large and distributed data analysis. In many applications, such as peer-to-peer systems, huge volumes of data are distributed between multiple sources. Analysis of these volumes of data and identifying appropriate clusters is challenging due to transmission, processing and storage costs. In this paper, a gossip-based distributed clustering algorithm for P2P networks called Efficient GBDC-P2P is proposed, based on an improved gossip communicative approach by combining the peer sampeling and CYCLON protocol and the idea of partitioning-based data clustering. This algorithm is appropriate for data clustering in unstructured P2P networks, and it is adapted to the dynamic conditions of these networks. In the Efficient GBDC-P2P algorithm, distributed peers perform clustering operation in a distributed way only through local communications with their neighbors. Our approach does not rely on the central server to carry out data clustering task and without the need to synchronize operations. Evaluation results verify the efficiency of our proposed algorithm for data clustering in unstructured P2P networks. Furthermore, comparative analyses with other well-established distributed clustering approaches demonstrate the superior accuracy of the proposed method.

关键词： distributed data mining Clustering Gossiping Overlay Peer-to-peer network

来源：评论

学校读者我要写书评

暂无评论

distributed classification for image spam detection

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2018年第11期77卷 13249-13278页

作者： Amir, Amiza Srinivasan, Bala Khan, Asad I. Univ Malaysia Perlis Sch Comp & Commun Engn Arau Perlis Malaysia Monash Univ Fac Informat Technol Melbourne Vic Australia

Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring *** have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages-in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.

关键词： P2P classification distributed pattern recognition Spam detection Image spam distributed classification distributed data mining P2P data mining

来源：评论

学校读者我要写书评

暂无评论

A gossip based information fusion protocol for distributed frequent itemset mining

引用

ENTERPRISE INFORMATION SYSTEMS 2018年第6期12卷 674-694页

作者： Sohrabi, Mohammad Karim Islamic Azad Univ Semnan Branch Dept Comp Engn Semnan Iran

The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.

关键词： distributed data mining frequent itemset mining Gossip based protocol bit wise approach wireless sensor network

来源：评论

学校读者我要写书评

暂无评论

FSCOALParallel simultaneous fuzzy co-clustering and learning

引用

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS 2018年第7期33卷 1364-1380页

作者： Biton, David Kalech, Meir Rokach, Lior Ben Gurion Univ Negev Software & Informat Syst Engn Dept Beer Sheva Israel

A model-based co-clustering divides the data based on two main axes and simultaneously trains a supervised model for each co-cluster using all other input features. For example, in the rating prediction task of recommender system, the main two axes are items and users. In each co-cluster, we train a regression model for predicting the rating based on other features such as user's characteristics (e.g., gender), item's characteristics (e.g., genre), contextual features (e.g., location), and so on. In reality, users and items do not necessarily belong to a single co-cluster, but rather can be associated with several co-clusters. We extend the model-based co-clustering to support fuzzy co-clustering. In this setting, each item-user pair is associated to every co-cluster with some membership grade. This grade indicates the level of relevance of the item-user pair to the co-cluster. Furthermore, we propose a distributed algorithm, based on a map-reduce approach, to handle big datasets. Evaluating the fuzzy co-clustering algorithm on three datasets shows a significant improvement comparing with a regular co-clustering algorithm. In addition, a map-reduce version of the fuzzy co-clustering algorithm significantly reduces the runtime.

关键词： distributed data mining fuzzy co-clustering predictive modeling

来源：评论

学校读者我要写书评

暂无评论

A distributed data clustering algorithm in P2P networks

引用

APPLIED SOFT COMPUTING 2017年 51卷 147-167页

作者： Azimi, Rasool Sajedi, Hedieh Ghayekhloo, Mohadeseh Islamic Azad Univ Qazvin Branch Young Researchers & Elite Club Qazvin Iran Univ Tehran Coll Sci Sch Math Stat & Comp Dept Comp Sci Tehran Iran

Clustering is one of the important data mining issues, especially for large and distributed data analysis. distributed computing environments such as Peer-to-Peer (P2P) networks involve separated/scattered data sources, distributed among the peers. According to unpredictable growth and dynamic nature of P2P networks, data of peers are constantly changing. Due to the high volume of computing and communications and privacy concerns, processing of these types of data should be applied in a distributed way and without central management. Today, most applications of P2P systems focus on unstructured P2P systems. In unstructured P2P networks, spreading gossip is a simple and efficient method of communication, which can adapt to dynamic conditions in these networks. Recently, some algorithms with different pros and cons have been proposed for data clustering in P2P networks. In this paper, by combining a novel method for extracting the representative data, a gossip-based protocol and a new centralized clustering method, a Gossip Based distributed Clustering algorithm for P2P networks called GBDC-P2P is proposed. The GBDC-P2P algorithm is suitable for data clustering in unstructured P2P networks and it adapts to the dynamic conditions of these networks. In the GBDC-P2P algorithm, peers perform data clustering operation with a distributed approach only through communications with their neighbours. The GBDC-P2P does not need to rely on a central server and it performs asynchronously. Evaluation results demonstrate the superior performance of the GBDC-P2P algorithm. Also, a comparative analysis with other well-established methods illustrates the efficiency of the proposed method. (C) 2016 Elsevier B. V. All rights reserved.

关键词： distributed data mining data clustering Gossiping Overlay peer-to-Peer networka

来源：评论

学校读者我要写书评

暂无评论

data mining Technique for Reduction of Association Rules in distributed System 1

Data Mining Technique for Reduction of Association Rules in ...

引用

International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT)

作者： Waghamare, Bhagyashri Bodhe, Yogesh Shree Ramchandra Coll Engn Dept Comp Engn Pune Maharashtra India Dr DY Patil Sch Engn Dept Comp Engn Pune Maharashtra India

ISBN: (纸本)9781509020805

In today's world, there are number of transactions can be performed on social media. In such distributed environment where timely accessing of data is important, it becomes difficult to generate strong association rules. So it is necessary to reduce these rules for increasing rule reduction rate. This paper uses w-Tabular algorithm which combines weight assignment method and Quine-Mccluskey method which increases data processing time in distributed system.

关键词： Association Rule mining data mining distributed data mining Frequent Item Sets mining Reduction Framework

来源：评论

学校读者我要写书评

暂无评论

distributed Execution Environment for data mining as Service

Distributed Execution Environment for Data Mining as Service

引用

IEEE North-West-Russia-Section Young Researchers in Electrical and Electronic Engineering Conference (ElConRusNW)

作者： Kholod, Ivan Borisenko, Konstantin St Petersburg Electrotech Univ LETI Fac Comp Sci & Technol St Petersburg Russia

ISBN: (纸本)9781509004454

the article describes the mapping of the algorithm decomposed into functional blocks on a distributed execution environment. In addition, it describes the architecture and implementation of service to perform data mining algorithms in that environment. As an example, it describes the implementation and experiments with classification algorithm - 1R.

关键词： distributed data mining cloud computing data mining cloud

来源：评论

学校读者我要写书评

暂无评论

distributed data mining for e-business

引用

INFORMATION TECHNOLOGY & MANAGEMENT 2011年第2期12卷 67-79页

作者： Liu, Bin Cao, Shu Gui He, Wu Hebei Univ Sci & Technol Coll Econ & Management Shijiazhuang 050018 Peoples R China Tsinghua Univ State Key Lab Intelligent Technol & Syst Dept Comp Sci & Technol Beijing 100084 Peoples R China Old Dominion Univ Ctr Learning Technol Norfolk VA 23529 USA

In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributed data is necessary. Through a thorough literature review, this paper identifies four main issues in distributed data mining (DDM) systems for e-business and classifies modern DDM systems into three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named DRHPDM (data source Relevance-based Hierarchical Parallel distributed data mining Model). In addition, to improve the quality of the final result, the data sources are divided into a centralized mining layer and a distributed mining layer, according to their relevance. To improve the openness, cross-platform ability, and intelligence of the DDM system, web service and multi-agent technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage mining scenario.

关键词： distributed data mining e-business Web service Multi-agent Knowledge integration

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of a distributed Clustering Approach for Spatial datasets 15th

Performance Evaluation of a Distributed Clustering Approach ...

引用

15th Australasian data mining Conference (AusDM)

作者： Bendechache, Malika Nhien-An Le-Khac Kechadi, M-Tahar Univ Coll Dublin Insight Ctr Data Analyt Obrien BldgCtr East Dublin 04 Ireland Univ Coll Dublin Dublin 04 Ireland

ISBN: (纸本)9789811302923;9789811302916

The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has superlinear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.

关键词： distributed data mining distributed computing Synchronous communication Asynchronous communication Spacial data mining Super-speedup

来源：评论

学校读者我要写书评

暂无评论

Conditions for Parallel Execution of Functions in data mining Algorithm

Conditions for Parallel Execution of Functions in Data Minin...

引用

IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus)

作者： Kholod, Ivan I. St Petersburg Electrotech Univ LETI Fac Comp Sci & Technol St Petersburg Russia

ISBN: (纸本)9781538643402

The paper describes necessary and sufficient conditions for parallel execution of functions in data mining algorithms. The said conditions take into account data connections between functions based on a variety of usable and modifiable mining model's elements. We determine the conditions for parallel execution in computing environments with distributed and shared memory. As an example, we describe the determination of the conditions for parallel execution of Naive Bayes classifier functions.

关键词： parallel data mining distributed data mining data dependensy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：