检索结果-内蒙古大学图书馆

Dynamic distributed and parallel Machine Learning algorithms for big data mining processing

DATA TECHNOLOGIES AND APPLICATIONS 2022年第4期56卷 558-601页

作者： Djafri, Laouni Ibn Khaldoun Univ Tiaret Algeria Univ Djillali Liabes Sidi Bel Abbes EEDIS Lab Sidi Bel Abbes Algeria

Purpose This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies. Design/methodology/approach In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors' proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic distributed and parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at

关键词： Big data mining Statistical sampling Map-reduce Machine learning distributed and parallel processing Big data platforms

来源：评论

学校读者我要写书评

暂无评论

MonARCh: an actor based architecture for dynamic linked data monitoring

引用

PEERJ COMPUTER SCIENCE 2024年 10卷 e2133页

作者： Yonyul, Burak Alatl, Oylum Erdur, Riza Cenk Ege Univ Fac Engn Dept Comp Engn Izmir Turkiye

Monitoring the data sources for possible changes is an important consumption requirement for applications running in interaction with the Web of Data. In this article, MonARCh which is an architecture for monitoring the result changes of registered SPARQL queries in the Linked Data environment, is proposed. MonARCh can be comprehended as a publish/subscribe system in the general sense. However, it differs in how communication with the data sources is realized. Data sources in the Linked Data environment do not publish the changes in the data. MonARCh provides the necessary communication infrastructure between the data sources and the consumers for the notification of changes. Users subscribe SPARQL queries to the system which are then converted to federated queries. MonARCh periodically checks for updates by re-executing SERVICE clauses and notifying users in case of any result change. In addition, to provide scalability, MonARCh takes the advantage of concurrent computation of the actor model. The parallel join algorithm utilized speeds up query execution and result generation processes. The design science methodology is used during the design, implementation and evaluation of the architecture. When compared to the literature MonARCh meets all the sufficient requirements from the linked data monitoring and state of the art perspectives while having many outstanding features from both points of view. The evaluation results show that even while working under the limited two-node cluster setting MonARCh could reach from 300 to 25,000 query monitoring capacity according to the diverse query selectivities executed within our test bench.

关键词： Query monitoring SPARQL Linked data Scalability distributed and parallel processing Concurrent computation Publish/Subscribe architecture Push-pull model Actor model

来源：评论

学校读者我要写书评

暂无评论

ParSCL: A parallel and distributed Framework to Process All Nearest Neighbor Queries on a Road Network

引用

IEEE ACCESS 2023年 11卷 94043-94056页

作者： Bhandari, Aavash Hamandawana, Prince Attique, Muhammad Cho, Hyung-Ju Chung, Tae-Sun Ajou Univ Dept Artificial Intelligence Suwon 16499 South Korea Sejong Univ Dept Software Seoul 05006 South Korea Kyungpook Natl Univ Dept Software Sangju 37224 South Korea

The proliferation of current and next-generation mobile and sensing devices has increased at an alarming rate. With these state-of-the-art devices, the global positioning system (GPS) has made remote sensing and location tracking more viable. One such query is the All Nearest Neighbor (ANN) query, which extracts and returns all data objects that are in close vicinity to all query objects. An ANN is a combination of k-nearest neighbors (kNN), and join queries. Hence, ANN has useful for applications in different domains such as transportation optimization, locating safe zones, and ride-sharing. An example of its applications is, "find the nearest gas station for each car parking lot". Because these applications are responsible for generating a massive number of query requests, a large amount of computation is required to return these query requests. As a single machine cannot meet this demand in this study, we propose a distributed query processing framework to process ANN queries using the Apache Spark framework. In an empirical study, our proposed framework achieved superior query efficiency and scalability compared to other methods and design alternatives.

关键词： All nearest neighbor queries distributed and parallel processing spatial query processing

来源：评论

学校读者我要写书评

暂无评论

Enhanced-Sweep: Communication Cost Efficient Top-K Best Region Search

引用

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2023年第2期48卷 2121-2132页

作者： Potluri, Avinash Bhattu, S. Nagesh Kumar, N. V. Narendra Subramanyam, R. B., V IDRBT Hyderabad India NIT Warangal Hanamkonda India NIT AP Tadepalligudem India

The best region search (BRS) is one of the major research problems in geospatial data processing applications. The BRS problem objective is to discover the ideal location of a particular size specified rectangle, with a predetermined end goal of maximizing the user-defined scoring function. The existing solutions for finding the top-k best regions have focused on designing algorithms for centralized settings. These solutions are not suitable for processing massive datasets. In this paper, we enable a Hadoop MapReduce-based parallel and distributed computation to obtain significant improvement in the performance. In addition to the parallel and distributed setting, we also incorporate early pruning strategies to eliminate the need to process rectangles that are not part of the output to minimize the communication cost involved in computing k-BRS. We later introduced a redistribution strategy over the initially proposed methodology that handles skew inherited from the dataset. Our results are obtained from extensive experimentation, both synthetic and real-world datasets.

关键词： Big data MapReduce Best region search distributed and parallel processing

来源：评论

学校读者我要写书评

暂无评论

Edge Robotics: Edge-Computing-Accelerated Multirobot Simultaneous Localization and Mapping

引用

IEEE INTERNET OF THINGS JOURNAL 2022年第15期9卷 14087-14102页

作者： Huang, Peng Zeng, Liekang Chen, Xu Huang, Luo Zhou, Zhi Yu, Shuai Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou 510006 Guangdong Peoples R China

With the wide penetration of smart robots in multifarious fields, the simultaneous localization and mapping (SLAM) technique in robotics has attracted growing attention in the community. Yet collaborating SLAM over multiple robots still remains challenging due to performance contradiction between the intensive graphics computation of SLAM and the limited computing capability of robots. While traditional solutions resort to the powerful cloud servers acting as an external computation provider, we show by real-world measurements that the significant communication overhead in data offloading prevents its practicability to real deployment. To tackle these challenges, this article promotes the emerging edge-computing paradigm into multirobot SLAM and proposes RecSLAM, a multirobot laser SLAM system that focuses on accelerating the map construction process under the robot-edge-cloud architecture. In contrast to the conventional multirobot SLAM that generates graphic maps on robots and completely merges them on the cloud, RecSLAM develops a hierarchical map fusion technique that directs robots' raw data to edge servers for real-time fusion and then sends to the cloud for global merging. To optimize the overall pipeline, an efficient multirobot SLAM collaborative processing framework is introduced to adaptively optimize robot-to-edge offloading tailored to heterogeneous edge resource conditions, meanwhile ensuring the workload balancing among the edge servers. Extensive evaluations show RecSLAM can achieve up to 39.31% processing latency reduction over the state of the art. Besides, a proof-of-concept prototype is developed and deployed in real scenes to demonstrate its effectiveness.

关键词： Robots Simultaneous localization and mapping Servers Robot sensing systems Cloud computing Robot kinematics Merging distributed and parallel processing edge intelligence edge offloading multirobot laser simultaneous localization and mapping (SLAM)

来源：评论

学校读者我要写书评

暂无评论

HYPERSPECTRAL AND MULTISPECTRAL IMAGE FUSION TARGET DETECTION BASED ON CLOUD-EDGE COLLABORATION

HYPERSPECTRAL AND MULTISPECTRAL IMAGE FUSION TARGET DETECTIO...

引用

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Hu, Jun Wu, Shanshan Wu, Zebin Zhang, Yi Plaza, Javier Plaza, Antonio Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Nanjing Res Inst Elect Engn NRIEE Nanjing 210000 Peoples R China Univ Extremadura Dept Technol Comp & Commun Hyperspectral Comp Lab Caceres 10071 Spain

ISBN: (纸本)9798350320107

Hyperspectral target detection (HTD) aims to detect fine targets in hyperspectral images ( HSIs). The traditional HTD method in low-resolution hyperspectral image (LR-HSI) is incapable of detecting small targets, clearly and precisely. Accordingly, in this paper, we propose a hyperspectral and multispectral image fusion target detection method based on cloud-edge collaboration. In this method, LR-HSI is first employed for coarse detection with the output of some suspicious target areas. Afterwards, the hyperspectral images and multispectral images (HSI-MSI) fusion is performed on these areas for precise target detection. In order to ensure the efficiency of HTD, we intend to accelerate our method in parallel based on the cloud-edge collaborative architecture. Furthermore, we establish an optimization model and design a greedy strategy to achieve the optimal deployment for minimizing the shortest runtime on the cloud-edge collaborative architecture. The experimental results demonstrate that our proposed method can significantly improve the computational efficiency while ensuring the accuracy.

关键词： Hyperspectral and multispectral image fusion distributed and parallel processing Hyperspectral target detection Cloud-edge collaboration

来源：评论

学校读者我要写书评

暂无评论

A distributed AND parallel METHOD OF HYPERSPECTRAL COMPUTATIONAL IMAGING VIA COLLABORATIVE TUCKER3 TENSOR DECOMPOSITION

A DISTRIBUTED AND PARALLEL METHOD OF HYPERSPECTRAL COMPUTATI...

引用

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Zhang, Ling Wu, Zebin Sun, Jin Xu, Yang Wei, Zhihui Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China

ISBN: (数字)9781665427920

ISBN: (纸本)9781665427920

Hyperspectral computational imaging (HCI) is to reconstruct hyperspectral images (HSIs) based on the compressed signals collected by remote sensing and imaging systems. Collaborative Tucker3 tensor decomposition is beneficial for HCI models in reconstructing high-fidelity HSIs. However, the ever-increasing amount of compressed data leads to heavy computation burden for tensor decomposition-based HCI models, which may exceed the computing capacity of a single machine. For this reason, this paper proposes a Spark-based distributed and parallel HCI implementation via collaborative Tucker3 tensor decomposition. The proposed implementation decomposes the processing flow of the HCI algorithm into several stages, each of which can be processed in parallel on Spark. In addition, we develop parallel strategies for improving the performance of the redundant computational procedure and data storage procedure, respectively. Experimental results demonstrate that the parallel algorithm not only achieves high accuracy but also improves the computational efficiency when processing large-scale HSI datasets.

关键词： Hyperspectral computational imaging Spark distributed and parallel processing Tucker3 tensor decomposition

来源：评论

学校读者我要写书评

暂无评论

FTK: A Simplicial Spacetime Meshing Framework for Robust and Scalable Feature Tracking

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021年第8期27卷 3463-3480页

作者： Guo, Hanqi Lenz, David Xu, Jiayi Liang, Xin He, Wenbin Grindeanu, Iulian R. Shen, Han-Wei Peterka, Tom Munson, Todd Foster, Ian Argonne Natl Lab Math & Comp Sci Div Lemont IL 60439 USA Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA Bosch Res North Amer Sunnyvale CA 94085 USA Argonne Natl Lab Data Sci & Learning Div Lemont IL 60439 USA

We present the Feature Tracking Kit (FTK), a framework that simplifies, scales, and delivers various feature-tracking algorithms for scientific data. The key of FTK is our simplicial spacetime meshing scheme that generalizes both regular and unstructured spatial meshes to spacetime while tessellating spacetime mesh elements into simplices. The benefits of using simplicial spacetime meshes include (1) reducing ambiguity cases for feature extraction and tracking, (2) simplifying the handling of degeneracies using symbolic perturbations, and (3) enabling scalable and parallel processing. The use of simplicial spacetime meshing simplifies and improves the implementation of several feature-tracking algorithms for critical points, quantum vortices, and isosurfaces. As a software framework, FTK provides end users with VTK/ParaView filters, Python bindings, a command line interface, and programming interfaces for feature-tracking applications. We demonstrate use cases as well as scalability studies through both synthetic data and scientific applications including tokamak, fluid dynamics, and superconductivity simulations. We also conduct end-to-end performance studies on the Summit supercomputer. FTK is open sourced under the MIT license: https://***/hguo/ftk.

关键词： Feature extraction Three-dimensional displays Isosurfaces Tracking parallel processing Topology Faces Feature tracking spacetime meshing distributed and parallel processing critical points isosurfaces vortices

来源：评论

学校读者我要写书评

暂无评论

Scheduling-Guided Automatic processing of Massive Hyperspectral Image Classification on Cloud Computing Architectures

引用

IEEE TRANSACTIONS ON CYBERNETICS 2021年第7期51卷 3588-3601页

作者： Wu, Zebin Sun, Jin Zhang, Yi Zhu, Yaoqin Li, Jun Plaza, Antonio Benediktsson, Jon Atli Wei, Zhihui Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Sun Yat Sen Univ Sch Geog & Planning Ctr Integrated Geog Informat Anal Guangdong Prov Key Lab Urbanizat & Geosimulat Guangzhou 510275 Guangdong Peoples R China Univ Extremadura Dept Technol Comp & Commun Hyperspectral Comp Lab E-10071 Caceres Spain Univ Iceland Fac Elect & Comp Engn IS-101 Reykjavik Iceland

The large data volume and high algorithm complexity of hyperspectral image (HSI) problems have posed big challenges for efficient classification of massive HSI data repositories. Recently, cloud computing architectures have become more relevant to address the big computational challenges introduced in the HSI field. This article proposes an acceleration method for HSI classification that relies on scheduling metaheuristics to automatically and optimally distribute the workload of HSI applications across multiple computing resources on a cloud platform. By analyzing the procedure of a representative classification method, we first develop its distributed and parallel implementation based on the MapReduce mechanism on Apache Spark. The subtasks of the processing flow that can be processed in a distributed way are identified as divisible tasks. The optimal execution of this application on Spark is further formulated as a divisible scheduling framework that takes into account both task execution precedences and task divisibility when allocating the divisible and indivisible subtasks onto computing nodes. The formulated scheduling framework is an optimization procedure that searches for optimized task assignments and partition counts for divisible tasks. Two metaheuristic algorithms are developed to solve this divisible scheduling problem. The scheduling results provide an optimized solution to the automatic processing of HSI big data on clouds, improving the computational efficiency of HSI classification by exploring the parallelism during the parallel processing flow. Experimental results demonstrate that our scheduling-guided approach achieves remarkable speedups by facilitating the automatic processing of HSI classification on Spark, and is scalable to the increasing HSI data volume.

关键词： Task analysis Cloud computing Processor scheduling Sparks Scheduling Hyperspectral imaging Cloud computing distributed and parallel processing divisible task scheduling hyperspectral image (HSI) classification partitioning factor

来源：评论

学校读者我要写书评

暂无评论

Asynchronous and Load-Balanced Union-Find for distributed and parallel Scientific Data Visualization and Analysis

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021年第6期27卷 2808-2820页

作者： Xu, Jiayi Guo, Hanqi Shen, Han-Wei Raj, Mukund Wang, Xueyun Xu, Xueqiao Wang, Zhehui Peterka, Tom Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Argonne Natl Lab Math & Comp Sci Div Lemont IL 60439 USA Peking Univ Sch Phys Beijing 100871 Peoples R China Lawrence Livermore Natl Lab Phys & Life Sci Directorate Livermore CA 94550 USA Los Alamos Natl Lab Div Phys Los Alamos NM 87545 USA

We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this study, we prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs, in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processes using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively.

关键词： Data visualization Synchronization distributed databases parallel processing Data mining Scalability Merging Union-find disjoint set connected component labeling distributed and parallel processing critical point level set

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：