检索结果-内蒙古大学图书馆

7th IEEE International Conference on data Science and Advanced Analytics (DSAA)

作者： Pechlivanoglou, Tilemachos Alsaeed, Mahmoud Papagelis, Manos York Univ Lassonde Sch Engn Toronto ON Canada

ISBN: (纸本)9781728182063

Several data mining and machine learning problems can be reduced to the computational geometry problem of finding intersections of a set of geometric objects, such as intersections of line segments or rectangles/boxes. Currently, the state-of-the-art approach for addressing such intersection problems in Euclidean space is collectively known as the sweep-line or plane sweep algorithm, and has been utilized in a variety of application domains, including databases, gaming and transportation, to name a few. The idea behind sweep line is to employ a conceptual line that is swept or moved across the plane, stopping at intersection points. However, to report all K intersections among any N objects, the standard sweep line algorithm (based on the Bentley-Ottmann algorithm) has a time complexity of O((N + K)logN), therefore cannot scale to very large number of objects and cases where there are many intersections. In this paper, we propose MRSWEEP and MRSWEEP-D, two sophisticated and highly scalable algorithms for the parallelization of sweep-line and its variants. We provide algorithmic details of fully distributed in-memory versions of the proposed algorithms using the MapReduce programming paradigm in the Apache Spark cluster environment. A theoretical analysis of the proposed algorithms is presented, as well as a thorough experimental evaluation that provides evidence of the algorithms' scalability in varying levels of problem complexity. We make source code and datasets available to support the reproducibility of the results.

关键词： parallel and distributed data mining big data analytics computational geometry intersection problem overlaps sweep-line

来源：评论

学校读者我要写书评

暂无评论

parallel and distributed Spatial Outlier mining in Grid: Algorithm, Design and Application

引用

JOURNAL OF GRID COMPUTING 2015年第2期13卷 139-157页

作者： Chen, Chongcheng Lin, Jiaxiang Wu, Xiaozhu Wu, Jianwei Fuzhou Univ Key Lab Spatial Data Min & Informat Sharing MOE Fuzhou 350002 Peoples R China Fujian Agr & Forestry Univ Coll Comp & Informat Sci Fuzhou 350002 Peoples R China

There is an increasing interest in the field of parallel and distributed data mining in grid environment over the past decade. As an important branch of spatial data mining, spatial outlier mining can be used to find out some interesting and unexpected spatial patterns in many applications. In this paper, a new parallel & distributed spatial outlier mining algorithm (PD-SOM) is proposed to simultaneously detect global and local outliers in a grid environment. PD-SOM is a Delaunay triangulation (D-TIN) based approach, which was encapsulated and deployed in a distributed platform to provide parallel and distributed spatial outlier mining service. Subsequently, a distributed system framework for PD-SOM is designed on top of a geographical knowledge service grid (GeoKSGrid) developed by our research group, a two-step strategy for spatial outlier detection is put forward to support the encapsulation and distributed deployment of the geographical knowledge service, and two key techniques of the geographical knowledge service: parallel and distributed computing of Delaunay triangulation and the implementation of PD-SOM algorithm are discussed. Finally, the efficiency of the spatial outlier mining service is analyzed in theory, the practicality is confirmed by a demonstrative application on the abnormality analyzing of soil geochemical investigation samples from Fujian eastern coastal zone area in China, and the effectiveness and superiority of PD-SOM in a balanced, scalable grid environment are verified through the comparison with the popular spatial outlier mining algorithm SLOM, for the involvement of large amount of computing cores.

关键词： Spatial outlier parallel and distributed data mining Global outlier Local outlier Geographical knowledge grid Delaunay triangulation Soil geochemistry investigation

来源：评论

学校读者我要写书评

暂无评论

Depth-Based Outlier Detection Algorithm

Depth-Based Outlier Detection Algorithm

引用

9th International Conference on Hybrid Artificial Intelligence Systems (HAIS)

作者： Cardenas-Montes, Miguel Ctr Invest Energet Medioambientales & Tecnol Dept Fundamental Res Madrid Spain

ISBN: (纸本)9783319076171;9783319076164

Nowadays society confronts to a huge volume of information which has to be transformed into knowledge. One of the most relevant aspect of the knowledge extraction is the detection of outliers. Numerous algorithms have been proposed with this purpose. However, not all of them are suitable to deal with very large data sets. In this work, a new approach aimed to detect outliers in very large data sets with a limited execution time is presented. This algorithm visualizes the tuples as N-dimensional particles able to create a potential well around them. Later, the potential created by all the particles is used to discriminate the outliers from the objects composing clusters. Besides, the capacity to be parallelized has been a key point in the design of this algorithm. In this proof-of-concept, the algorithm is tested by using sequential and parallel implementations. The results demonstrate that the algorithm is able to process large data sets with an affordable execution time, so that it overcomes the curse of dimensionality.

关键词： Outlier Detection parallel and distributed data mining Big data Large-Scale Learning Scalability

来源：评论

学校读者我要写书评

暂无评论

Exploiting idle cycles to execute data mining applications on clusters of PCs

引用

JOURNAL OF SYSTEMS AND SOFTWARE 2007年第5期80卷 778-790页

作者： Senger, Hermes Hruschka, Eduardo R. Silva, Fabricio A. B. Sato, Liria M. Bianchini, Calebe P. Jerosch, Bruno F. Univ Catolica Santos BR-11070906 Santos SP Brazil Univ Sao Paulo Escola Politecn BR-05508900 Sao Paulo Brazil

In this paper we present and evaluate Inhambu, a distributed object-oriented system that supports the execution of data mining applications on clusters of PCs and workstations. This system provides a resource management layer, built on the top of Java/RMI, that supports the execution of the data mining tool called Weka. We evaluate the performance of Inhambu by means of several experiments in homogeneous, heterogeneous and non-dedicated clusters. The obtained results are compared with those achieved by a similar system named Weka-parallel. Inhambu outperforms its counterpart for coarse grain applications, mainly for heterogeneous and non-dedicated clusters. Also, our system provides additional advantages such as application checkpointing, support for dynamic aggregation of hosts to the cluster, automatic restarting of failed tasks, and a more effective usage of the cluster. Therefore, Inhambu is a promising tool for efficiently executing real-world data mining applications. The software is delivered at the project's web site available at http://***/projects/inhambu/. (c) 2006 Elsevier Inc. All rights reserved.

关键词： parallel and distributed data mining commodity clusters load sharing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：