检索结果-内蒙古大学图书馆

distributed large-scale graph processing on FPGAs

JOURNAL OF BIG DATA 2023年第1期10卷 95页

作者： Sahebi, Amin Barbone, Marco Procaccini, Marco Luk, Wayne Gaydadjiev, Georgi Giorgi, Roberto Univ Siena Dept Informat Engn & Math Siena Italy Univ Florence Dept Informat Engn Florence Italy Imperial Coll London Dept Comp London England Delft Univ Technol Dept Quantum & Comp Engn Delft Netherlands Consorzio Interuniv Nazl Informat Rome Italy

Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators' resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host's file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, a

关键词： Graph processing distributed computing Grid partitioning FPGA Accelerators

来源：评论

学校读者我要写书评

暂无评论

Proactive scheduling in distributed computing-A reinforcement learning approach

引用

JOURNAL OF PARALLEL AND distributed computing 2014年第7期74卷 2662-2672页

作者： Tong, Zhao Xiao, Zheng Li, Kenli Li, Keqin Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China SUNY Coll New Paltz Dept Comp Sci New Paltz NY 12561 USA

In distributed computing such as grid computing, online users submit their tasks anytime and anywhere to dynamic resources. Task arrival and execution processes are stochastic. How to adapt to the consequent uncertainties, as well as scheduling overhead and response time, are the main concern in dynamic scheduling. Based on the decision theory, scheduling is formulated as a Markov decision process (MDP). To address this problem, an approach from machine learning is used to learn task arrival and execution patterns online. The proposed algorithm can automatically acquire such knowledge without any aforehand modeling, and proactively allocate tasks on account of the forthcoming tasks and their execution dynamics. Under comparison with four classic algorithms such as Min-Min, Min-Max, Suffrage, and ECT, the proposed algorithm has much less scheduling overhead. The experiments over both synthetic and practical environments reveal that the proposed algorithm outperforms other algorithms in terms of the average response time. The smaller variance of average response time further validates the robustness of our algorithm. (C) 2014 Elsevier Inc. All rights reserved.

关键词： distributed computing Markov decision process Queueing model Reinforcement learning Task scheduling

来源：评论

学校读者我要写书评

暂无评论

RLPTO: A Reinforcement Learning-Based Performance-Time Optimized Task and Resource Scheduling Mechanism for distributed Machine Learning

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2023年第12期34卷 3266-3279页

作者： Lu, Xiaofeng Liu, Chao Zhu, Senhao Mao, Yilu Lio, Pietro Hui, Pan Beijing Univ Post & Telecommun Natl Engn Ctr Mobile Internet Secur Technol Beijing 100876 Peoples R China Alibaba Cloud Comp Co Ltd Beijing 100102 Peoples R China Univ Cambridge Comp Lab Cambridge CB2 1TN England Hong Kong Univ Sci & Technol Computat Media & Arts Thrust Guangzhou 510000 Peoples R China

With the wide application of deep learning, the amount of data required to train deep learning models is becoming increasingly larger, resulting in an increased training time and higher requirements for computing resources. To improve the throughput of a distributed learning system, task scheduling and resource scheduling are required. This article proposes to combine ARIMA and GRU models to predict the future task volume. In terms of task scheduling, multi-priority task queues are used to divide tasks into different queues according to their priorities to ensure that high-priority tasks can be completed in advance. In terms of resource scheduling, the reinforcement learning method is adopted to manage limited computing resources. The reward function of reinforcement learning is constructed based on the resources occupied by the task, the training time, the accuracy of the model. When a distributed learning model tends to converge, the computing resources of the task are gradually reduced so that they can be allocated to other learning tasks. The results of experiments demonstrate that RLPTO tends to use more compu-ting nodes when facing tasks with large data scale and has good scalability. The distributed learning system reward experiment shows that RLPTO can make the computing cluster get the largest reward.

关键词： Cloud computing scheduling algorithms dynamic scheduling resource management distributed computing reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Measurement and Applications: The Role of Communication Technologies in Developing distributed Measurement Systems and Measurement Applications

引用

IEEE INSTRUMENTATION & MEASUREMENT MAGAZINE 2023年第4期26卷 19-26页

作者： Bourelly, Carmine Capriglione, Domenico Carissimo, Chiara Milano, Filippo Tari, Luca Italian Co Sensichips Srl Engineer Res & Dev Area Aprilia LT Italy Univ Cassino & Southern Lazio Elect & Elect Measurements Cassino FR Italy

Communication technologies are of primary importance for today's activity, since they contribute to achieving a lot of services for humans, from simple phone calls to advanced multimedia services, banking activities, and healthcare, to cite a few [1]. They are also the key factor of many measurement applications in the contexts of industry 4.0, transportation, environmental monitoring, telemetering, building automation, and the emerging applications of the Internet of Things (IoT) and Industrial Internet of Things (IIoT) [2]-[4]. In other words, communication technologies enable the concept of "Networking for measurements" which explains the crucial role of networks for measurement applications.

关键词： Wireless communication Power demand Transportation Standardization Throughput Communications technology distributed computing

来源：评论

学校读者我要写书评

暂无评论

Optimal alignments between large event logs and process models over distributed systems: An approach based on Petri nets

引用

INFORMATION SCIENCES 2023年 619卷 406-420页

作者： Cheng, Long Liu, Cong Zeng, Qingtian North China Elect Power Univ Sch Control & Comp Engn Beijing 102206 Peoples R China Shandong Univ Sci & Engn Coll Comp Sci & Technol Qingdao 266590 Peoples R China Shandong Univ Technol Sch Comp Sci & Technol Zibo 255000 Peoples R China

Process descriptions are the backbones for creating products and delivering services auto-matically. computing the alignments between process descriptions (such as process mod-els) and process behavior is one of the fundamental tasks to lead to better processes and services. The reason is that the computed results can be directly used in checking compli-ance, diagnosing deviations, and analyzing bottlenecks for processes. Although various alignment techniques have been proposed in recent years, their performance is still chal-lenged by large logs and models. In this work, we introduce an efficient approach to accel-erate the computation of alignments. Specifically, we focus on the computation of optimal alignments, and try to improve the performance of the state-of-the-art A*-based method through Petri net decomposition. We present the details of our designs and also show that our approach can be easily implemented in a distributed environment using the Spark plat-form. Using datasets with large event logs and process models, we experimentally demon-strate that our approach can indeed accelerate current A*-based implementations in general. (c) 2022 Elsevier Inc. All rights reserved.

关键词： Process mining Optimal alignment Model decomposition Petri nets distributed computing

来源：评论

学校读者我要写书评

暂无评论

Reaching consensus for asynchronous distributed key generation

引用

distributed computing 2023年第3期36卷 219-252页

作者： Abraham, Ittai Jovanovic, Philipp Maller, Mary Meiklejohn, Sarah Stern, Gilad Tomescu, Alin VMware Res Herzliyya Israel UCL London England Ethereum Fdn London England Google London England Hebrew Univ Jerusalem Jerusalem Israel VMware Res Palo Alto CA USA

We give a protocol for Asynchronous distributed Key Generation (A-DKG) that is optimally resilient (can withstand f < n/3 faulty parties), has a constant expected number of rounds, has O(lambda n(3)) expected communication complexity, and assumes only the existence of a PKI. Prior to our work, the best A-DKG protocols required Omega(n) expected number of rounds, and Omega(n(4)) expected communication. Our A-DKG protocol relies on several building blocks that are of independent interest. We define and design a Proposal Election (PE) protocol that allows parties to retrospectively agree on a valid proposal after enough proposals have been sent from different parties. With constant probability the elected proposal was proposed by a nonfaulty party. In building our PE protocol, we design a Verifiable Gather protocol which allows parties to communicate which proposals they have and have not seen in a verifiable manner. The final building block to our A-DKG is a Validated Asynchronous Byzantine Agreement (VABA) protocol. We use our PE protocol to construct a VABA protocol that does not require leaders or an asynchronous DKG setup. Our VABA protocol can be used more generally when it is not possible to use threshold signatures.

关键词： distributed computing distributed key generation Consensus Byzantine adversary Asynchrony

来源：评论

学校读者我要写书评

暂无评论

GEANT4 distributed computing for compact clusters

引用

NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT 2014年 764卷 11-17页

作者： Harrawood, Brian P. Agasthya, Greeshma A. Lakshmanan, Manu N. Raterman, Gretchen Kapadia, Anuj J. Duke Univ Med Ctr Dept Radiol Ravin Adv Imaging Labs Durham NC 27710 USA

A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C+ + class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client server work flow model to specify the parameters for each individual run of the simulation. The new g4distributedRunmanager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through put of a single model. (C) 2014 Elsevier B.V. All rights reserved,

关键词： GEANT4 NSECT Monte Carlo Parallel computing distributed computing Beowulf

来源：评论

学校读者我要写书评

暂无评论

A Utility-Based distributed Pattern Mining Algorithm With Reduced Shuffle Overhead

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2023年第1期34卷 416-428页

作者： Kumar, Sunil Mohbey, Krishna Kumar Cent Univ Rajasthan Dept Comp Sci Ajmer 305817 Rajasthan India

With the arrival of the current digital era and the advancement of information transmission technologies, there has been an unprecedented rise in data. Efficient extraction of useful information from the volumes of data has garnered growing interest from academics and the industry. Data mining research focuses on finding utility patterns in large datasets. But the inherent complications like frequent scans, creation of substantial candidate sets, etc. plague the mining process for large datasets. Distributive architecture-based approaches also prove inefficacious due to high communication overhead over iterations. High communication cost over data exchange both locally and remotely further aggravates the situation. We propose a Communication Cost Effective Utility-based Pattern Mining (CEUPM) algorithm based on the Spark framework to address this issue. Spark accelerates iterative scanning by storing scanned datasets in a memory abstraction called resilient distributed datasets (RDD). RDD operations need a redistribution of data among cluster nodes during processing. To minimize the communication cost incurred during the shuffle process, we adopt a search space division strategy based on data parallelism for a fair and effective task allocation across cluster nodes. Communication overhead is incurred during this redistribution or shuffle process while minimizing costs. Experimental results in four real datasets demonstrate that CEUPM considerably reduces shuffling overhead and outperforms other existing methods in terms of memory usage, communication cost, execution time, and scalability.

关键词： Communication cost distributed computing high utility pattern mining scalability spark

来源：评论

学校读者我要写书评

暂无评论

Real-time person re-identification and tracking on edge devices with distributed optimization

引用

PATTERN ANALYSIS AND APPLICATIONS 2025年第3期28卷 1-22页

作者： Dang, Tuan Linh Hoang, Minh Hoang Ngo, Viet Anh Duong, Minh Quan Ha, Hoang Hiep Nguyen, The An Le, Hoang Hanoi Univ Sci & Technol Sch Informat & Commun Technol 01 Dai Co Viet Rd Hanoi Vietnam Taureau ai CT1Bldg C14 Bac HaHuu St Hanoi Vietnam

This paper presents an efficient real-time person re-identification (ReID) and pedestrian tracking solution optimized for resource-constrained edge devices in multi-camera surveillance. Our key contribution is a hybrid distributed architecture that offloads lightweight detection tasks (using YOLOv10n) to edge devices, while a centralized server handles advanced feature extraction (OSNet) and robust identity tracking (ByteTrack). To improve efficiency, we integrate adaptive frame skipping on edge devices and parallel batch processing on the server. Semantic-enhanced embeddings and a memory-based retrieval mechanism improve ReID performance in crowded scenes. Additionally, we employ Apache Kafka for efficient load balancing and video stream management. Experimental results on CUHK03 and Penn-Fudan demonstrated high accuracy while maintaining real-time performance on limited-resource hardware (2 vCPU, 4 GB RAM, and Jetson Nano). These results make our approach a practical solution for real-world surveillance applications in crowded environments. Our code is available at: https://***/2uanDM/reid-pipeline.

关键词： Person re-identification Lightweight architecture Frame skipping Batch processing distributed computing Edge devices Message queue

来源：评论

学校读者我要写书评

暂无评论

Communication Optimization Algorithms for distributed Deep Learning Systems: A Survey

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2023年第12期34卷 3294-3308页

作者： Yu, Enda Dong, Dezun Liao, Xiangke Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China

Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training speed, creating a bottleneck in distributed training. To address this issue, researchers are focusing on communication optimization algorithms for distributed deep learning systems. In this paper, we propose a standard that systematically classifies all communication optimization algorithms based on mathematical modeling, which is not achieved by existing surveys in the field. We categorize existing works into four categories based on the optimization strategies of communication: communication masking, communication compression, communication frequency reduction, and hybrid optimization. Finally, we discuss potential future challenges and research directions in the field of communication optimization algorithms for distributed deep learning systems.

关键词： Communication optimization algorithms distributed computing distributed deep learning parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：