Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-d...
详细信息
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.
This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in ...
详细信息
This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on cosine similarity of speaker embeddings. Moreover, the proposed diarization system solves the problem of speaker change detection via. unsupervised segmentation techniques using Hotelling's t-squared Statistic and Bayesian Information Criterion. In this new approach, speaker change detection is biased around detected quasi-silences, which reduces the severity of the trade-off between the missed detection and false detection rates. Additionally, the computational overhead due to frame-by-frame identification of speakers is reduced via. unsupervised clustering of speech segments. The results demonstrate the effectiveness of the proposed training method in the presence of non-IID speech data. It also shows a considerable improvement in the reduction of false and missed detection at the segmentation stage, while reducing the computational overhead. Improved accuracy and reduced computational cost makes the mechanism suitable for real-time speaker diarization across a distributed IoT audio network.
This work explores distributed processing techniques, together with recent advances in multi-agent reinforcement learning (MARL) to implement a fully decentralized reward and decision -making scheme to efficiently all...
详细信息
This work explores distributed processing techniques, together with recent advances in multi-agent reinforcement learning (MARL) to implement a fully decentralized reward and decision -making scheme to efficiently allocate resources (spectrum and power). The method targets processes with strong dynamics and stringent requirements such as cellular vehicle-to-everything networks (C-V2X). In our approach, the C-V2X is seen as a strongly connected network of intelligent agents which adopt a distributed reward scheme in a cooperative and decentralized manner, taking into consideration their channel conditions and selected actions in order to achieve their goals cooperatively. The simulation results demonstrate the effectiveness of the developed algorithm, named distributed Multi -Agent Reinforcement Learning (DMARL), achieving performances very close to that of a centralized reward design, with the advantage of not having the limitations and vulnerabilities inherent to a fully or partially centralized solution.
We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on p...
详细信息
We propose an efficient distributed out-of-memory implementation of the non-negative matrix factorization (NMF) algorithm for heterogeneous high-performance-computing systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/output latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library (NCCL) based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density 10-6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10<^>{-6}$$\end{document}.
With the dramatic increase in the diverse service requirements and mobile devices, the application-specific data tends to span multiple consecutive timeslots to complete the transmission, while the demand for spectrum...
详细信息
With the dramatic increase in the diverse service requirements and mobile devices, the application-specific data tends to span multiple consecutive timeslots to complete the transmission, while the demand for spectrum resources is further exacerbated. Recent works have suggested that integrating spatial frequency reuse with multi-cell networks can enhance the spectral efficiency and alleviate the scarcity of spectrum. Hence this paper considers a downlink multi-cell multi-timeslot orthogonal frequency division multiple access (OFDMA) cellular system where the users keep downloading data from the base stations (BS) until reaching a predetermined cache size. Specifically, we aim to minimize the transmission delay by jointly optimizing the BS selection, subcarrier assignment, and transmit power allocation, taking into account the current cache size. Due to inter-cell interference and multi-timeslot coupling, this problem is challenging to solve directly. We prove that this problem can be transformed into sequential online sum rate maximization subproblems under causal channel state information (CSI). To solve the subproblems, we first develop a centralized dynamic resource allocation algorithm based on the parameter transformation and the majorization-minimization (MM). In view of the trade-off between performance and complexity, we further propose a distributed algorithm by a designed BS selection scheme and the MM approach. Simulation results demonstrate that the distributed algorithm achieves comparable performance to the centralized algorithm, while they both outperform the benchmark schemes in terms of transmission delay.
distributed processing Systems (DPS) take sophisticated tasks as input, and process them in a distributed manner using spread resources. In this paper, we evaluate DPS implemented on low-level System-on-Chip (SoC) int...
详细信息
distributed processing Systems (DPS) take sophisticated tasks as input, and process them in a distributed manner using spread resources. In this paper, we evaluate DPS implemented on low-level System-on-Chip (SoC) interconnection architectures: mesh, concentrated mesh and fat tree. We propose autonomous algorithms for nodes and routers and define efficiency metrics that are used later for evaluation. The evaluation is performed by a dedicated experimentation system, in which fully functional DPS is implemented. In this paper, we focus on resource utilization in interconnection networks and also present their energy consumption. Tradeoff between utilization and electrical energy consumption is presented. Research results show that the concentrated mesh is a promising interconnection network suitable to handle the needs of distributed processing systems.
Transient simulation in power engineering is crucial as it models the dynamic behavior of power systems during sudden events like faults or short circuits. Electromagnetic transient simulations involve multiple coordi...
详细信息
Transient simulation in power engineering is crucial as it models the dynamic behavior of power systems during sudden events like faults or short circuits. Electromagnetic transient simulations involve multiple coordinated tasks. Traditional simulations are centralized and struggle to meet scalability requirements. To achieve these goals, distributed electromagnetic transient simulation has emerged as a new trend. Nevertheless, the distributed electromagnetic transient simulation introduces network communication. Achieving real-time simulation across distributed nodes poses the challenge of minimizing communication costs. In this paper, our proposal focuses on optimizing the task orchestration to reduce communication costs. Specifically, in the electromagnetic transient simulation, these tasks has certain communication pattern where the communicated objects of each task are pre-defined. We represent the pattern as a graph, with tasks represented as nodes and communications as edges. Furthermore, we propose to use graph partition with the objective of minimal communication costs and fine tune the partitions with the resource requirements of each distributed node. The experimental results demonstrate that our proposal has strength in achieving high-performance electromagnetic transient simulation.
Increasing the deployment of Renewable Energy Resources (RES), along with innovations in Information and Communication Technologies (ICT), would allow prosumers to engage in the energy market and trade their excess en...
详细信息
Increasing the deployment of Renewable Energy Resources (RES), along with innovations in Information and Communication Technologies (ICT), would allow prosumers to engage in the energy market and trade their excess energy with each other and with the main grid. To ensure an efficient and safe operation of energy trading, the Peer-to-Peer (P2P) energy trading approach has emerged as a viable paradigm to provide the necessary flexibility and coordinate the energy sharing between a pair of prosumers. The P2P approach is based on the concept of decentralized energy trading between prosumers (i.e., production capabilities or energy consumers). However, security protection and real-time transaction issues in the P2P market present serious challenges. In this paper, we propose a decentralized P2P energy trading approach for the energy market with high penetration of RE. First, the P2P energy market platform proposed coordinating the energy trading between energy providers and consumers to maximize their social welfare. A distributed algorithm is applied to solve the market-clearing problem based on the Alternating Direction Method of Multipliers (ADMM). In this way, the computational complexity can be reduced. Furthermore, a P2P Manager (P2PM) utility is introduced as an entity to solve the synchronization problem between peers during the market clearing. Finally, through a real-time application using Hardware-In-the-Loop (HIL), the effectiveness of the proposed P2PM is verified in terms of synchronizing the market participants and improving the power transaction.
Today the network is seemingly complex and vast and it is difficult to gauge its characteristics. Network administrators need information to check the network behavior for capacity planning, quality of service require...
详细信息
ISBN:
(纸本)9781479980482
Today the network is seemingly complex and vast and it is difficult to gauge its characteristics. Network administrators need information to check the network behavior for capacity planning, quality of service requirements and planning for the expansion of the network. Software defined networking (SDN) is an approach where we introduce abstraction to simplify the network into two layers, one used for controlling the traffic and other for forwarding the traffic. Hadoop is used for distributed processing. In this paper we combine the abstraction property of SDN and the processing power Hadoop to propose an architecture which we call as Advanced Control distributed processing Architecture (ACDPA), which is used to determine the flow characteristics and setting the priority of the flows i.e. essentially setting the quality of service(QoS). We provide experimental details with sample traffic to show how to setup this architecture. We also show the results of traffic classification and setting the priority of the hosts.
In this paper we present a model of cycle of development of a phased array radar with distributed processing to reduce the project development time. To this objective it was developed a new tool that shortened the tot...
详细信息
ISBN:
(纸本)9781510822023
In this paper we present a model of cycle of development of a phased array radar with distributed processing to reduce the project development time. To this objective it was developed a new tool that shortened the total time required for the inclusion and modification of new algorithms by evaluating the impact of each one in the distributed processing resources prior to the implementation. This tool avoids the common step of implementing a module, testing it, and then correcting it. It is replaced by iterative steps of high level evaluation, and then with a deployment more likely to work.
暂无评论