检索结果-内蒙古大学图书馆

25th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Patwary, Md. Mostofa Ali Palsetia, Diana Agrawal, Ankit Liao, Wei-keng Manne, Fredrik Choudhary, Alok Northwestern Univ Evanston IL 60208 USA Univ Bergen N-5020 Bergen Norway

ISBN: (纸本)9781467308052;9781467308045

DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing up to several hundred million high-dimensional points, we show that PDSDBSCAN significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.

关键词： Density based clustering Union-Find algorithm disjoint-set data structure

来源：评论

学校读者我要写书评

暂无评论

A New Scalable Parallel DBSCAN Algorithm Using the disjoint-set data structure 12

A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-...

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Mostofa Ali Patwary Diana Palsetia Ankit Agrawal Wei-keng Liao Fredrik Manne Alok Choudhary Northwestern University University of Bergen

ISBN: (纸本)9781467308052

关键词： Density based clustering Union-Find algorithm disjoint-set data structure distributed memory architecture shared memory systems data structures Memory architecture distributed memory Parallel Lines algorithms Master-slave Workload Scalability

来源：评论

学校读者我要写书评

暂无评论

A data structure for Nearest Common Ancestors with Linking

引用

ACM TRANSACTIONS ON ALGORITHMS 2017年第4期13卷 1–28页

作者： Gabow, Harold N. Univ Colorado Dept Comp Sci Boulder CO 80309 USA

Consider a forest that evolves via link operations that make the root of one tree the child of a node in another tree. Intermixed with link operations are nca operations, which return the nearest common ancestor of two given nodes when such exists. This article shows that a sequence of m such nca and link operations on a forest of n nodes can be processed online in time O(m alpha(m, n) + n). This was previously known only for a restricted type of link operation. The special case where a link only extends a tree by adding a new leaf occurs in Edmonds' algorithm for finding a maximum weight matching on a general graph. Incorporating our algorithm into the implementation of Edmonds' algorithm in [9] achieves time O(n(m + n logn)) for weighted matching, an arguably optimum asymptotic bound (n and m are the number of vertices and edges, respectively). Our data structure also provides a simple alternative implementation of the incremental-tree set merging algorithm of Gabow and Tarjan [11].

关键词： Least common ancestors matching algorithms set merging disjoint-set data structure

来源：评论

学校读者我要写书评

暂无评论

PARDICLE: Parallel Approximate Density-based Clustering 14

PARDICLE: Parallel Approximate Density-based Clustering

引用

International Conference on High Performance Computing, Networking, Storage and Analysis

作者： Patwary, Md. Mostofa Ali Satish, Nadathur Sundaram, Narayanan Manne, Fredrik Habib, Salman Dubey, Pradeep Intel Parallel Comp Lab Santa Clara CA 95052 USA Univ Bergen N-5020 Bergen Norway Argonne Natl Lab Argonne IL 60439 USA

ISBN: (纸本)9781479955008

DBSCAN is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56x faster than exact algorithms with almost identical quality (Omega-Index >= 0.99). We develop a new parallel DBSCAN algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15x using 16 cores, single node Intel (R) Xeon (R) processor) and distributed memory (3917x using 4096 cores, multinode) computers, with 2x additional performance improvement using Intel (R) Xeon Phi (TM) coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.

关键词： Density based clustering approximate clustering algorithm Union-Find algorithm disjoint-set data structure

来源：评论

学校读者我要写书评

暂无评论

Scalable Parallel OPTICS data Clustering Using Graph Algorithmic Techniques 13

Scalable Parallel OPTICS Data Clustering Using Graph Algorit...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Patwary, Md Mostofa Ali Palsetia, Diana Agrawal, Ankit Liao, Wei-keng Manne, Fredrik Choudhary, Alok Northwestern Univ Evanston IL 60208 USA Univ Bergen N-5020 Bergen Norway

ISBN: (纸本)9781450323789

OPTICS is a hierarchical density-based data clustering algorithm that discovers arbitrary-shaped clusters and eliminates noise using adjustable reachability distance thresholds. Parallelizing OPTICS is considered challenging as the algorithm exhibits a strongly sequential data access order. We present a scalable parallel OPTICS algorithm (POPTICS) designed using graph algorithmic concepts. To break the data access sequentiality, POPTICS exploits the similarities between the OPTICS algorithm and PRIM'S Minimum Spanning Tree algorithm. Additionally, we use the disjoint-set data structure to achieve a high parallelism for distributed cluster extraction. Using high dimensional datasets containing up to a billion floating point numbers, we show scalable speedups of up to 27.5 for our OpenMP implementation on a 40-core shared-memory machine, and up to 3,008 for our MPI implementation on a 4,096-core distributed-memory machine. We also show that the quality of the results given by POPTICS is comparable to those given by the classical OPTICS algorithm.

关键词： Density-based clustering Minimum spanning tree Union-Find algorithm disjoint-set data structure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：