The computing-intensive datamining for inherently Internet-wide distributeddata, referred to as distributed data mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM oft...
详细信息
The computing-intensive datamining for inherently Internet-wide distributeddata, referred to as distributed data mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of datamining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (datamining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper. (C) 2006 Elsevier B.V. All rights reserved.
distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distri...
详细信息
distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high-performance applications and solving complex problems, offering effective support to the implementation and use of datamining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high-level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta-learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributedmining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k-means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service-oriented framework. An extensive evaluation of its performance was provided. Copyright (c) 2011 John Wiley & Sons, Ltd.
The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are co...
详细信息
The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computationally intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery. To achieve this goal, advanced software tools and services are needed to support the development of KDD applications. The Knowledge Grid is a high-level framework providing Grid-based knowledge discovery tools and services. Such services allow users to create and manage complex knowledge discovery applications that integrate data sources and datamining tools provided as distributed services on the Grid. All of these services are currently being re-designed and re-implemented as WSRF-compliant Grid Services. This paper highlights design aspects and implementation choices involved in such a process. (C) 2006 Elsevier B.V. All rights reserved.
Multi-agent systems (MAS) offer an architecture for distributed problem solving. distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distribut...
详细信息
Multi-agent systems (MAS) offer an architecture for distributed problem solving. distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distributeddata. This paper offers a perspective on DDM algorithms in the context of multi-agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering. (c) 2005 Elsevier Ltd. All rights reserved.
In many industrial, scientific and commercial applications, it is often necessary to analyze large data sets, maintained over geographically distributed sites, by using the computational power of distributed and paral...
详细信息
In many industrial, scientific and commercial applications, it is often necessary to analyze large data sets, maintained over geographically distributed sites, by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for knowledge discovery applications. We describe a software architecture for geographically distributed high-performance knowledge discovery applications called KNOWLEDGE GRID, which is designed on top of computational grid mechanisms, provided by grid environments such as Globus. The KNOWLEDGE GRID uses the basic grid services such as communication, authentication, information, and resource management to build more specific parallel and distributed knowledge discovery tools and services. The paper discusses how the KNOWLEDGE GRID can be used to implement distributed data mining services. (C) 2002 Elsevier Science B.V. All rights reserved.
distributed data mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributeddata...
详细信息
distributed data mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in distributed data mining and presents a distributed data mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.
In the field of wireless network optimization, with the enlargement of network size and the complication of network structure, traditional processing methods cannot effectively identify the causes of network faults in...
详细信息
ISBN:
(纸本)9783319635644;9783319635637
In the field of wireless network optimization, with the enlargement of network size and the complication of network structure, traditional processing methods cannot effectively identify the causes of network faults in the face of increasing network data. In this paper, we propose a root-cause-analysis method based on distributed data mining (DRCA). Firstly, we put forward an improved decision tree, where the selection of the best split-feature is based on the feature's puritygain, and then we skillfully convert the problem of root-cause-analysis into modeling of an improved decision tree and interpretation of the tree model. In order to solve the problem of memory and efficiency associated with large-scale data, we parallelize the algorithm and distribute the tasks to multiple computers. The experiments show that DRCA is an effective, efficient, and scalable method.
In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributeddata is necessary. Through a thor...
详细信息
In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributeddata is necessary. Through a thorough literature review, this paper identifies four main issues in distributed data mining (DDM) systems for e-business and classifies modern DDM systems into three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named DRHPDM (data source Relevance-based Hierarchical Parallel distributed data mining Model). In addition, to improve the quality of the final result, the data sources are divided into a centralized mining layer and a distributedmining layer, according to their relevance. To improve the openness, cross-platform ability, and intelligence of the DDM system, web service and multi-agent technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage mining scenario.
Grid computing has emerged as an important new branch of distributed computing focused on large-scale resource sharing and high-performance orientation. In many applications, it is necessary to perform the analysis of...
详细信息
ISBN:
(纸本)9788132212980;9788132212997
Grid computing has emerged as an important new branch of distributed computing focused on large-scale resource sharing and high-performance orientation. In many applications, it is necessary to perform the analysis of very large data sets. The data are often large, geographically distributed and its complexity is increasing. In these areas, grid technologies provide effective computational support for applications such as knowledge discovery. This paper is an introduction to grid infrastructure and its potential for machine learning tasks.
The paper discusses an approach for distributed execution of datamining algorithms based on the actors model and the concept of the Internet of Things. The suggested approach allows us to decompose datamining algori...
详细信息
ISBN:
(纸本)9781509022212
The paper discusses an approach for distributed execution of datamining algorithms based on the actors model and the concept of the Internet of Things. The suggested approach allows us to decompose datamining algorithms into actors and execute them in the distributed environment. It provides data analysis both in centralized systems (cloud computing) and in distributed systems (fog computing) for IoT.
暂无评论