检索结果-内蒙古大学图书馆

Serving Graph Neural Networks With distributed Fog Servers for Smart IoT Services

IEEE-ACM TRANSACTIONS ON NETWORKING 2024年第1期32卷 550-565页

作者： Zeng, Liekang Chen, Xu Huang, Peng Luo, Ke Zhang, Xiaoxi Zhou, Zhi Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou 510006 Guangdong Peoples R China

Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.

关键词： Fog computing graph neural networks model serving distributed processing

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Speaker Diarization in distributed IoT Networks Using Federated Learning

引用

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2025年第2期9卷 1934-1946页

作者： Bhuyan, Amit Kumar Dutta, Hrishikesh Biswas, Subir Michigan State Univ Dept Elect & Comp Engn E Lansing MI 48823 USA

This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on cosine similarity of speaker embeddings. Moreover, the proposed diarization system solves the problem of speaker change detection via. unsupervised segmentation techniques using Hotelling's t-squared Statistic and Bayesian Information Criterion. In this new approach, speaker change detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead due to frame-by-frame identification of speakers is reduced via. unsupervised clustering of speech segments. The results demonstrate the effectiveness of the proposed training method in the presence of non-IID speech data. It also shows a considerable improvement in the reduction of false and missed detection at the segmentation stage, while reducing the computational overhead. Improved accuracy and reduced computational cost makes the mechanism suitable for real-time speaker diarization across a distributed IoT audio network.

关键词： Mel frequency cepstral coefficient Computational modeling Accuracy Oral communication Training Bayes methods Feature extraction Data models Computational intelligence Unsupervised Learning Bayesian methods federated learning distributed processing Hotelling's t-squared statistic Bayesian information criterion cepstral analysis cepstral analysis

来源：评论

学校读者我要写书评

暂无评论

Fully distributed multi-agent processing strategy applied to vehicular networks

引用

VEHICULAR COMMUNICATIONS 2024年 49卷

作者： de Lima, Vladimir R. de Campos, Marcello L. R. COPPE UFRJ Signals Multimedia & Telecommun Lab SMT POB 68504 BR-21941972 Rio De Janeiro RJ Brazil

This work explores distributed processing techniques, together with recent advances in multi-agent reinforcement learning (MARL) to implement a fully decentralized reward and decision -making scheme to efficiently allocate resources (spectrum and power). The method targets processes with strong dynamics and stringent requirements such as cellular vehicle-to-everything networks (C-V2X). In our approach, the C-V2X is seen as a strongly connected network of intelligent agents which adopt a distributed reward scheme in a cooperative and decentralized manner, taking into consideration their channel conditions and selected actions in order to achieve their goals cooperatively. The simulation results demonstrate the effectiveness of the developed algorithm, named distributed Multi -Agent Reinforcement Learning (DMARL), achieving performances very close to that of a centralized reward design, with the advantage of not having the limitations and vulnerabilities inherent to a fully or partially centralized solution.

关键词： distributed processing Vehicular networks 5G V2X Spectrum and power allocation Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

distributed out-of-memory NMF on CPU/GPU architectures

引用

JOURNAL OF SUPERCOMPUTING 2024年第3期80卷 3970-3999页

作者： Boureima, Ismael Bhattarai, Manish Eren, Maksim Skau, Erik Romero, Philip Eidenbenz, Stephan Alexandrov, Boian Los Alamos Natl Lab Theorit Div Los Alamos NM 87545 USA Los Alamos Natl Lab Comp Computat & Stat Sci Div Los Alamos NM 87545 USA Los Alamos Natl Lab HPC Div Los Alamos NM 87545 USA

We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/output latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library (NCCL) based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density 10-6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10<^>{-6}$$\end{document}.

关键词： NMF Out-of-memory Latent features Model selection distributed processing Parallel programming Big data Heterogeneous computing GPU CUDA NCCL Cupy

来源：评论

学校读者我要写书评

暂无评论

Resource Allocation for Multi-Cell Multi-Timeslot Transmission: Centralized and distributed Algorithms

引用

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2024年第3期21卷 3021-3034页

作者： Shan, Liqing Gao, Songtao Yu, Yiming Zhang, Fenghui Hu, Yuntao Wang, Yinlu Chen, Ming Southeast Univ Natl Mobile Commun Res Lab Nanjing 211189 Peoples R China China Mobile Grp Design Inst Co Ltd Dept Radio Engn Beijing 100080 Peoples R China Western Anhui Univ Sch Elect & Informat Engn Liuan 237012 Peoples R China Nanjing Univ Posts & Telecommun Sch Commun & Informat Engn Nanjing 210003 Peoples R China Pervas Commun Ctr Purple Mt Lab Nanjing 211200 Peoples R China

With the dramatic increase in the diverse service requirements and mobile devices, the application-specific data tends to span multiple consecutive timeslots to complete the transmission, while the demand for spectrum resources is further exacerbated. Recent works have suggested that integrating spatial frequency reuse with multi-cell networks can enhance the spectral efficiency and alleviate the scarcity of spectrum. Hence this paper considers a downlink multi-cell multi-timeslot orthogonal frequency division multiple access (OFDMA) cellular system where the users keep downloading data from the base stations (BS) until reaching a predetermined cache size. Specifically, we aim to minimize the transmission delay by jointly optimizing the BS selection, subcarrier assignment, and transmit power allocation, taking into account the current cache size. Due to inter-cell interference and multi-timeslot coupling, this problem is challenging to solve directly. We prove that this problem can be transformed into sequential online sum rate maximization subproblems under causal channel state information (CSI). To solve the subproblems, we first develop a centralized dynamic resource allocation algorithm based on the parameter transformation and the majorization-minimization (MM). In view of the trade-off between performance and complexity, we further propose a distributed algorithm by a designed BS selection scheme and the MM approach. Simulation results demonstrate that the distributed algorithm achieves comparable performance to the centralized algorithm, while they both outperform the benchmark schemes in terms of transmission delay.

关键词： Multi-cell OFDMA multi-timeslot resource allocation finite cache non-convex optimization distributed processing Multi-cell OFDMA multi-timeslot resource allocation finite cache non-convex optimization distributed processing

来源：评论

学校读者我要写书评

暂无评论

Concentrated mesh and fat tree usage efficiency in System-on-Chip based multiprocessor distributed processing architectures

Concentrated mesh and fat tree usage efficiency in System-on...

引用

Mediterranean Conference on Embedded Computing (MECO)

作者： Grzegorz Chmaj Henry Selvaraj Department of Electrical and Computer Engineering University of Nevada Las Vegas Las Vegas USA

distributed processing Systems (DPS) take sophisticated tasks as input, and process them in a distributed manner using spread resources. In this paper, we evaluate DPS implemented on low-level System-on-Chip (SoC) interconnection architectures: mesh, concentrated mesh and fat tree. We propose autonomous algorithms for nodes and routers and define efficiency metrics that are used later for evaluation. The evaluation is performed by a dedicated experimentation system, in which fully functional DPS is implemented. In this paper, we focus on resource utilization in interconnection networks and also present their energy consumption. Tradeoff between utilization and electrical energy consumption is presented. Research results show that the concentrated mesh is a promising interconnection network suitable to handle the needs of distributed processing systems.

关键词： Nickel distributed processing Topology System-on-chip Network topology Computer architecture Energy consumption

来源：评论

学校读者我要写书评

暂无评论

Optimizing Task Orchestration for distributed Real-Time Electromagnetic Transient Simulation

引用

IEEE ACCESS 2024年 12卷 74818-74830页

作者： Guo, Qi Guo, Haiping Lu, Yuanhong Guo, Tianyu Zhang, Jie Zhang, Jingyue Luo, Hui Huang, Libin Zhao, Yanjun China Southern Power Grid Elect Power Res Inst State Key Lab HVDC Guangzhou 510663 Peoples R China Guangdong Prov Key Lab Intelligent Operat & Contro Guangzhou 510663 Peoples R China China Southern Power Grid Elect Power Res Inst CSG Key Lab Power Syst Simulat Guangzhou 510663 Peoples R China China Southern Power Grid Co Ltd Guangzhou Peoples R China

Transient simulation in power engineering is crucial as it models the dynamic behavior of power systems during sudden events like faults or short circuits. Electromagnetic transient simulations involve multiple coordinated tasks. Traditional simulations are centralized and struggle to meet scalability requirements. To achieve these goals, distributed electromagnetic transient simulation has emerged as a new trend. Nevertheless, the distributed electromagnetic transient simulation introduces network communication. Achieving real-time simulation across distributed nodes poses the challenge of minimizing communication costs. In this paper, our proposal focuses on optimizing the task orchestration to reduce communication costs. Specifically, in the electromagnetic transient simulation, these tasks has certain communication pattern where the communicated objects of each task are pre-defined. We represent the pattern as a graph, with tasks represented as nodes and communications as edges. Furthermore, we propose to use graph partition with the objective of minimal communication costs and fine tune the partitions with the resource requirements of each distributed node. The experimental results demonstrate that our proposal has strength in achieving high-performance electromagnetic transient simulation.

关键词： Task analysis Real-time systems Power system dynamics Costs Partitioning algorithms Transient analysis Proposals Electromagnetic transients distributed processing Power distribution Electromagnetic transient simulation distributed system task orchestration

来源：评论

学校读者我要写书评

暂无评论

A Real-Time Simulation for P2P Energy Trading Using a distributed Algorithm

引用

IEEE ACCESS 2024年 12卷 44135-44146页

作者： Zahraoui, Younes Korotko, Tarmo Rosin, Argo Zidane, Tekai Eddine Khalil Mekhilef, Saad Tallinn Univ Technol FinEst Ctr Smart Cities EE-19086 Tallinn Estonia Tallinn Univ Technol Dept Elect Power Engn & Mech EE-19086 Tallinn Estonia Malardalen Univ Dept Sustainable Energy Syst S-72123 Vasteras Sweden Swinburne Univ Technol Sch Software & Elect Engn Fac Sci Engn & Technol Hawthorn Vic 3122 Australia Univ Malaya Dept Elect Engn Power Elect & Renewable Energy Res Lab PEARL Fac Engn Kuala Lumpur 50603 Malaysia Presidency Univ Dept Elect & Elect Engn Bengaluru 560064 Karnataka India

Increasing the deployment of Renewable Energy Resources (RES), along with innovations in Information and Communication Technologies (ICT), would allow prosumers to engage in the energy market and trade their excess energy with each other and with the main grid. To ensure an efficient and safe operation of energy trading, the Peer-to-Peer (P2P) energy trading approach has emerged as a viable paradigm to provide the necessary flexibility and coordinate the energy sharing between a pair of prosumers. The P2P approach is based on the concept of decentralized energy trading between prosumers (i.e., production capabilities or energy consumers). However, security protection and real-time transaction issues in the P2P market present serious challenges. In this paper, we propose a decentralized P2P energy trading approach for the energy market with high penetration of RE. First, the P2P energy market platform proposed coordinating the energy trading between energy providers and consumers to maximize their social welfare. A distributed algorithm is applied to solve the market-clearing problem based on the Alternating Direction Method of Multipliers (ADMM). In this way, the computational complexity can be reduced. Furthermore, a P2P Manager (P2PM) utility is introduced as an entity to solve the synchronization problem between peers during the market clearing. Finally, through a real-time application using Hardware-In-the-Loop (HIL), the effectiveness of the proposed P2PM is verified in terms of synchronizing the market participants and improving the power transaction.

关键词： Peer-to-peer computing Microgrids Biological system modeling Real-time systems Optimization distributed algorithms Synchronization Energy management Power markets distributed processing Energy consumption Peer-to-peer energy market energy trading distributed algorithm alternating direction method of multipliers

来源：评论

学校读者我要写书评

暂无评论

Advanced Control distributed processing Architecture (ACDPA) using SDN and Hadoop for identifying the flow characteristics and setting the quality of service(QoS) in the network

Advanced Control Distributed Processing Architecture (ACDPA)...

引用

IEEE International Advance Computing Conference, IACC

作者： Abhijeet Desai Nagegowda K S Department of Computer Science and engineering PESIT Bangalore India

ISBN: (纸本)9781479980482

Today the network is seemingly complex and vast and it is difficult to gauge its characteristics. Network administrators need information to check the network behavior for capacity planning, quality of service requirements and planning for the expansion of the network. Software defined networking (SDN) is an approach where we introduce abstraction to simplify the network into two layers, one used for controlling the traffic and other for forwarding the traffic. Hadoop is used for distributed processing. In this paper we combine the abstraction property of SDN and the processing power Hadoop to propose an architecture which we call as Advanced Control distributed processing Architecture (ACDPA), which is used to determine the flow characteristics and setting the priority of the flows i.e. essentially setting the quality of service(QoS). We provide experimental details with sample traffic to show how to setup this architecture. We also show the results of traffic classification and setting the priority of the hosts.

关键词： IP networks distributed processing Computer architecture Process control Internet Protocols Conferences

来源：评论

学校读者我要写书评

暂无评论

Cycle of development of a phased array radar with distributed processing

Cycle of development of a phased array radar with distribute...

引用

IET International Radar Conference 2015

作者： V. A. F. Santa Rita B. S. Pompeo B. S. de Carvalho G. T. Beltrão Information Technology Division DTI Brazilian Army Technological Center Rio de Janeiro Brazil Signal Processing Group SPG Bradar Industria S/A Sao Paulo Brazil

ISBN: (纸本)9781510822023

In this paper we present a model of cycle of development of a phased array radar with distributed processing to reduce the project development time. To this objective it was developed a new tool that shortened the total time required for the inclusion and modification of new algorithms by evaluating the impact of each one in the distributed processing resources prior to the implementation. This tool avoids the common step of implementing a module, testing it, and then correcting it. It is replaced by iterative steps of high level evaluation, and then with a deployment more likely to work.

关键词： Cycle of Development distributed processing Performance Evaluation Phased Array Radar Simulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：