检索结果-内蒙古大学图书馆

A Survey and Experimental Review on data Distribution Strategies for parallel Spatial Clustering Algorithms

Journal of Computer Science & Technology 2024年第3期39卷 610-636页

作者： Jagat Sesh Challa Navneet Goyal Amogh Sharma Nikhil Sreekumar Sundar Balasubramaniam Poonam Goyal Advanced Data Analytics and Parallel Technologies Laboratory Birla Institute of Technology and Science Pilani 333031India Uber New York 11101U.S.A. Computer Science and Engineering Department University of MinnesotaMinneapolis 55455U.S.A.

The advent of Big data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and *** important step for any parallel clustering algorithm is the distribution of data amongst the cluster *** step governs the methodology and performance of the entire *** typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the ***,these strategies are generic and are not tailor-made for any specific parallel clustering *** this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they *** also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution *** of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load *** experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.

关键词： parallel data mining data distribution parallel clustering spatial locality preservation

来源：评论

学校读者我要写书评

暂无评论

Android Web Security Solution using Cross-device Federated Learning 14

Android Web Security Solution using Cross-device Federated L...

引用

14th International Conference on COMmunication Systems and NETworkS, COMSNETS 2022

作者： Singh, A K Goyal, Navneet BITS Pilani Advanced Data Analytics Parallel Technologies Lab Dept. of CSIS Pilani Campus India

ISBN: (纸本)9781665421041

Over the last one decade or so, Machine Learning has changed the global technology landscape with applications in almost all disciplines and verticals. Mobile and Web Security is an important research area in which researchers have been trying to apply Machine Learning, but data privacy concerns and high data communication costs to a central Machine Learning server have limited its use. Federated Learning is emerging as a promising solution which addresses privacy concerns and drastically reduces communication costs. In Federated Learning, data from individual devices is not communicated to a central server and model learning happens in a distributed manner. In this paper, we propose a Federated Learning solution for security of Android based devices. Mobile and Web Security solutions have evolved from signature-based detections to building Machine Learning models which are trained over large centralized malware repositories. We have used Federated Learning to learn security patterns from users' browsing data, which resides on individual devices and will never leave the devices. Federated Learning preserves users' privacy as it shares with the central server only the model that it learns from users' browsing data, and not the data itself. This way each mobile platform trains its own web security model from its data, and shares it to the centralized server. The centralized server aggregates these trained models received from numerous mobile devices and compiles an aggregated global model, which in turn is sent to mobile devices for inference. Mobile security solutions based on this concept create a sustained self-evolving security ecosystem, in which millions of mobile platforms share their learned models to form a robust distributed security paradigm. The results obtained using Federated Learning are found to be comparable with the results of centralized Machine Learning. © 2022 IEEE.

关键词： Mobile security

来源：评论

学校读者我要写书评

暂无评论

Resource-Aware Multi-Criteria Vehicle Participation for Federated Learning in Internet of Vehicles

SSRN

引用

SSRN 2023年

作者： Wen, Jie Zhang, Jingbo Zhang, Zhixia Cui, Zhihua Cai, Xingjuan Chen, Jinjun The Shanxi Key Laboratory of Advanced Control and Equipment intelligence Taiyuan University of Science and Technology Shanxi Taiyuan China The Shanxi Key Laboratory of Big Data Analysis and Parallel Computing Taiyuan University of Science and Technology Shanxi Taiyuan China The State Key Lab for Novel Software Technology Nanjing University China Department of Computing Technologies Swinburne University of Technology Melbourne Australia

Federated learning (FL), as a safe distributed training mode, provides strong support for the edge intelligence of the Internet of Vehicles (IoV) to realize efficient collaborative control and safe data sharing. However, due to the resource limitation and the instability of training environment in the complex IoV, ideal performance of FL cannot be achieved. Since considering the actual resource constraints and federated task requirements, the diversified device selection criteria make the resource-aware vehicle selection problem become a multi-criteria selection problem. To effectively support FL for IoV, the resource-aware multi-criteria vehicle selection problem was described as a many-objective optimization problem, and proposed a resource-aware many-objective vehicle selection model (RA-MaOVSM) to optimize resource efficiency. The RA-MaOVSM considering heterogeneous resources (like computation resources, communication resources, energy resources and data resources) of on-board devices in IoV, and realizes the joint optimization of learning efficiency, energy cost and global performance. Additionally, a novel probability distribution combination game strategy is applied to many-objective evolutionary algorithm (MaOEA) for improving the model solving performance. Simulation results demonstrate that RA-MaOVSM can effectively optimize the IoV resources and FL model performance, and the designed algorithm exhibits good convergence and distribution, achieving a good balance among multiple device selection criteria. © 2023, The Authors. All rights reserved.

关键词： Evolutionary algorithms

来源：评论

学校读者我要写书评

暂无评论

A Rapid Prototyping Approach for High Performance Density-Based Clustering

A Rapid Prototyping Approach for High Performance Density-Ba...

引用

International Conference on data Science and advanced analytics (DSAA)

作者： Saiyedul Islam Sundar Balasubramaniam Poonam Goyal Ankit Sultana Lakshit Bhutani Saurabh Raje Navneet Goyal Advanced Data Analytics & Parallel Technologies Laboratory Birla Institute of Technology and Science Pilani India

Big data has significantly increased the dependence of data analytics community on High Performance Computing (HPC) systems. However, efficiently programming an HPC system is still a tedious task requiring specialized skills in parallelization and the use of platform-specific languages as well as mechanisms. We present a framework for quickly prototyping new/existing density-based clustering algorithms while obtaining low running times and high speedups via automatic parallelization. The user is required only to specify the sequential algorithm in a Domain Specific Language (DSL) for clustering at a very high level of abstraction. The parallelizing compiler for the DSL does the rest to leverage distributed systems - in particular, typical scale-out clusters made of commodity hardware. Our approach is based on recurring, parallelizable programming patterns known as Kernels, which are identified and parallelized by the compiler. We demonstrate the ease of programming and scalable performance for DBSCAN, SNN, and RECOME algorithms. We also establish that the proposed approach can achieve performance comparable to state-of-the-art manually parallelized implementations while requiring minimal programming effort that is several orders of magnitude smaller than those required on other parallel platforms like MPI/Spark.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Rapid Prototyping of Hierarchical Agglomerative Clustering Algorithms for Distributed Systems

Rapid Prototyping of Hierarchical Agglomerative Clustering A...

引用

IEEE International Conference on Big data

作者： Saiyedul Islam Navneet Goyal Sundar Balasubramaniam Poonam Goyal Achal Agarwal Kirti Singh Rathore Nischay Singh Advanced Data Analytics & Parallel Technologies Laboratory Birla Institute of Technology & Science Pilani Pilani Campus India

ISBN: (数字)9781728108582

ISBN: (纸本)9781728108599

Hierarchical Agglomerative Clustering (HAC) algorithms are used in many applications where clusters have a hierarchical relationship between them. Their parallelization is challenging due to the dependence of every agglomeration step on all previous agglomerations. Although a few parallel algorithms have been proposed for SLINK HAC algorithm, only limited work has been done to parallelize other HAC algorithms. In this paper, we present a high-level abstraction, which provides a uniform way to specify any HAC algorithm, and a framework for automatic parallelization of the same for distributed memory systems. The abstraction is supported by constructs in a high level, domain specific language, and a compiler translates algorithms expressed in this language to efficient parallel code targeting distributed systems. Our experiments on multiple HAC algorithms proves that the runtime performance achieved is comparable with state-of-the-art manual parallel implementations on Spark and MPI while requiring only a fraction of the programming effort. At runtime, master-slave execution is used, and load is balanced among the slaves in an algorithm-agnostic way, which is a significant contrast to custom load-balancing techniques seen in the literature on parallel HAC algorithms.

关键词： Clustering algorithms Programming Merging Heuristic algorithms DSL Measurement Manuals

来源：评论

学校读者我要写书评

暂无评论

BCS-Net: Boundary, Context and Semantic for Automatic COVID-19 Lung Infection Segmentation from CT Images

arXiv

引用

arXiv 2022年

作者： Cong, Runmin Yang, Haowei Jiang, Qiuping Gao, Wei Li, Haisheng Wang, Cong Zhao, Yao Kwong, Sam The Institute of Information Science Beijing Jiaotong University Beijing100044 China Beijing Key Laboratory of Big Data Technology for Food Safety Beijing Technology and Business University Beijing100048 China The Department of Computer Science City University of Hong Kong Hong Kong The Beijing Key Laboratory of Advanced Information Science and Network Technology Beijing100044 China The School of Information Science and Engineering Ningbo University Ningbo315211 China The School of Electronic and Computer Engineering Peking University Shenzhen Graduate School Shenzhen518055 China Peng Cheng Laboratory Shenzhen518055 China The Distributed and Parallel Software Lab Huawei Technologies Shenzhen518129 China The City University of Hong Kong Shenzhen Research Institute Shenzhen51800 China

The spread of COVID-19 has brought a huge disaster to the world, and the automatic segmentation of infection regions can help doctors to make diagnosis quickly and reduce workload. However, there are several challenges for the accurate and complete segmentation, such as the scattered infection area distribution, complex background noises, and blurred segmentation boundaries. To this end, in this paper, we propose a novel network for automatic COVID-19 lung infection segmentation from CT images, named BCS-Net, which considers the boundary, context, and semantic attributes. The BCS-Net follows an encoder-decoder architecture, and more designs focus on the decoder stage that includes three progressively Boundary-Context-Semantic Reconstruction (BCSR) blocks. In each BCSR block, the attention-guided global context (AGGC) module is designed to learn the most valuable encoder features for decoder by highlighting the important spatial and boundary locations and modeling the global context dependence. Besides, a semantic guidance (SG) unit generates the semantic guidance map to refine the decoder features by aggregating multi-scale high-level features at the intermediate resolution. Extensive experiments demonstrate that our proposed framework outperforms the existing competitors both qualitatively and quantitatively. © 2022, CC BY-NC-SA.

关键词： COVID-19

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel Algorithms for Shared Nearest Neighbor Clustering

Scalable Parallel Algorithms for Shared Nearest Neighbor Clu...

引用

International Conference on High Performance Computing

作者： Sonal Kumari Saurabh Maurya Poonam Goyal Sundar S Balasubramaniam Navneet Goyal Department of Computer Science & Information Systems Advanced Data Analytics & Parallel Technologies Laboratory INDIA

ISBN: (纸本)9781509054121

Clustering is a popular data mining technique which discovers structure in unlabeled data by grouping objects together on the basis of a similarity criterion. Traditional similarity measures lose their meaning as the number of dimensions increases and as a consequence, distance or density based clustering algorithms become less meaningful. Shared Nearest Neighbor (SNN) is a solution to clustering high-dimensional data with the ability to find clusters of varying density. SNN assigns objects to a cluster, which share a large number of their nearest neighbors. However, SNN is compute and memory intensive for data of large size and/or dimensionality. Nearest neighbor queries are responsible for a major proportion of computations in SNN, resulting in lower efficiency for higher value of number of nearest neighbors (k). The main motivation of this work is to improve the efficiency of SNN and to parallelize it so that it can be used for clustering large high-dimensional datasets and for large values of k. Existing SNN algorithms become inefficient in these situations. In this paper, we present a new sequential SNN algorithm, R-SNN, which uses R-tree for executing neighborhood queries efficiently and exploiting spatial locality to minimize memory usage. R-SNN is benchmarked against the best available implementation of SNN and is found up to 77 times faster when tested on various real datasets. R-SNN is parallelized for distributed memory, shared memory, and hybrid systems. Significant speedup and scalability achieved can be attributed to parallelization and good load balancing strategies and also to exploitation of spatial locality. Experimental results demonstrate the same for datasets of varying dimensionality and size. The maximum speedup achieved for shared, distributed, and hybrid models are 427.19 using 48 threads, 394.24 using 32 processes, and 1380.69 on 32 nodes (with each node spawning 4 threads), respectively. Super-linear speedup for some datasets is attributed

关键词： Clustering algorithms Algorithm design and analysis Time complexity Spatial databases parallel algorithms Memory management Density measurement

来源：评论

学校读者我要写书评

暂无评论

A parallel Framework for Grid-Based Bottom-Up Subspace Clustering

A Parallel Framework for Grid-Based Bottom-Up Subspace Clust...

引用

International Conference on data Science and advanced analytics (DSAA)

作者： Poonam Goyal Sonal Kumari Shubham Singh Vivek Kishore Sundar S. Balasubramaniam Navneet Goyal Department of Computer Science & Information Systems Advanced Data Analytics & Parallel Technologies Laboratory Pilani Campus INDIA

ISBN: (纸本)9781509052073

Clustering is a popular data mining and machine learning technique which discovers interesting patterns from unlabeled data by grouping similar objects together. Clustering high-dimensional data is a challenging task as points in high dimensional space are nearly equidistant from each other, rendering commonly used similarity measures ineffective. Subspace clustering has emerged as a possible solution to the problem of clustering high-dimensional data. In subspace clustering, we try to find clusters in different subspaces within a dataset. Many subspace clustering algorithms have been proposed in the last two decades to find clusters in multiple overlapping subspaces of high-dimensional data. Subspace clustering algorithms iteratively find the best subset of dimensions for a cluster from 2d-1 possible combinations in d-dimensional data. Subspace clustering is extremely compute intensive because of exhaustive search of subspaces, especially in the bottom-up subspace clustering algorithms. To address this issue, an efficient parallel framework for grid-based bottom-up subspace clustering algorithms is developed, considering popular algorithms belonging to this category. The framework is implemented for shared memory, distributed memory, and hybrid systems and is tested for three grid-based bottom-up subspace clustering algorithms: CLIQUE, MAFIA, and ENCLUS. All parallel implementations exhibit impressive speedup and scalability on real datasets.

关键词： Conferences

来源：评论

学校读者我要写书评

暂无评论

A fast, Scalable SLINK Algorithm for Commodity Cluster Computing Exploiting Spatial Locality

A fast, Scalable SLINK Algorithm for Commodity Cluster Compu...

引用

IEEE International Conference on High Performance Computing and Communications

作者： Poonam Goyal Sonal Kumari Sumit Sharma Dhruv Kumar Vivek Kishore Sundar Balasubramaniam Navneet Goyal Advanced Data Analytics & Parallel Technologies Laboratory Department of Computer Science & Information Systems BITS-Pilani Pilani Campus INDIA

ISBN: (纸本)9781509042982

Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. To the best of our knowledge, all existing parallel SLINK algorithms are based on the traditional SLINK algorithm and thus require large number of computing resources. In this paper, we present a novel optimization of SLINK algorithm, GridSLINK, which is an order of magnitude faster than the existing state-of-the-art implementation. The optimization in GridSLINK comes from reduction in number of distance calculations required by SLINK. This reduction is achieved by exploiting spatial locality of data points and using an adaptive gridding technique. GridSLINK is parallelized for distributed memory systems. Scalable performance is achieved for increasing number of compute nodes. The proposed parallel algorithm, dGridSLINK, is benchmarked against the best existing parallel algorithm in literature and found to outperform the latter for all the real datasets considered. dGridSLINK can cluster millions of data points in few seconds/minutes using a small number of processing elements, without compromising the quality of clustering.

关键词： parallel computing multi-core multi-node clustering SLINK adaptive gridding

来源：评论

学校读者我要写书评

暂无评论

DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms

DD-Rtree: A dynamic distributed data structure for efficient...

引用

IEEE International Conference on Big data

作者： Jagat Sesh Challa Poonam Goyal S. Nikhil Aditya Mangla Sundar S. Balasubramaniam Navneet Goyal Department of Computer Science & Information Systems Birla Institute of Technology & Science Pilani Pilani Campus India Advanced Data Analytics & Parallel Technologies Laboratory Department of Computer Science & Information Systems Birla Institute of Technology & Science Pilani Pilani Campus India

ISBN: (纸本)9781467390064

parallelizing data mining algorithms has become a necessity as we try to mine ever increasing volumes of data. Spatial data mining algorithms like Dbscan, Optics, Slink, etc. have been parallelized to exploit a cluster infrastructure. The efficiency achieved by existing algorithms can be attributed to spatial locality preservation using spatial indexing structures like k-d-tree, quad-tree, grid files, etc. for distributing data among cluster nodes. However, these indexing structures are static in nature, i.e., they need to scan the entire dataset to determine the partitioning coordinates. This results in high data distribution cost when the data size is large. In this paper, we propose a dynamic distributed data structure, DD-Rtree, which preserves spatial locality while distributing data across compute nodes in a shared nothing environment. Moreover, DD-Rtree is dynamic, i.e., it can be constructed incrementally making it useful for handling big data. We compare the quality of data distribution achieved by DD-Rtree with one of the recent distributed indexing structure, SD-Rtree. We also compare the efficiency of queries supported by these indexing structures along with the overall efficiency of DBSCAN algorithm. Our experimental results show that DD-Rtree achieves better data distribution and thereby resulting in improved overall efficiency.

关键词： data structures Clustering algorithms data mining Indexing Distributed databases Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：