Recurrent neuralnetworks (RNNs) are widely used in control system due to their dynamic capabilities. However, the control accuracy of RNN-based systems can be compromised by noise interference, and there has been lit...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recurrent neuralnetworks (RNNs) are widely used in control system due to their dynamic capabilities. However, the control accuracy of RNN-based systems can be compromised by noise interference, and there has been little research on RNN-based control in disturbed multi-agent systems. To address this, we developed an enhanced distributed RNN (DRNN) structure and proposed a Novel DRNN-based Control Protocol (NDRNN-CP). This enhancement involves introducing a time-delay component, allowing the protocol to adaptively learn noise variation patterns. As a result, the NDRNN-CP effectively resists various periodic noise interferences and achieves more precise control of each agent. Additionally, our optimized activation function ensures that all agents reach consensus within a predefined time. To demonstrate the advantages of NDRNN-CP, we conducted extensive experiments that confirmed its significant improvements in noise signal resistance and convergence performance.
Graph neuralnetworks (GNNs) have delivered remarkable results in various fields. However, the rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference. Both compu...
详细信息
Solar photovoltaic (PV) power prediction is easily affected by weather factors. In order to reduce the solar photovoltaic (PV) power prediction deviation and improve the prediction accuracy, a distributed solar photov...
详细信息
This paper presents a work in progress that aims to reduce the overall training and processing time of feed-forward multi-layer neuralnetworks. If the network is large processing is expensive in terms of both; time a...
详细信息
This paper presents a work in progress that aims to reduce the overall training and processing time of feed-forward multi-layer neuralnetworks. If the network is large processing is expensive in terms of both; time and space. In this paper, we suggest a cost-effective and presumably a faster processing technique by utilizing a heterogeneous distributed system composed of a set of commodity computers connected by a local area network. neuralnetwork computations can be viewed as a set of matrix multiplication processes. These can be adapted to utilize the existing matrix multiplication algorithms tailored for such systems. With Java technology as an implementation means, we discuss the different factors that should be considered in order to achieve this goal highlighting some issues that might affect such a proposed implementation.
Privacy preservation is critical for neuralnetwork inference, which often involves collaborative execution of different parties to make predictions on sensitive data based on sensitive neuralnetwork models. However,...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
Privacy preservation is critical for neuralnetwork inference, which often involves collaborative execution of different parties to make predictions on sensitive data based on sensitive neuralnetwork models. However, the expensive cryptographic operations of privacy preservation also pose performance chal-lenges to neuralnetwork inference. We address this performance-security tension by designing PP-Stream, a distributed stream processing system for high-performance privacy-preserving neuralnetwork inference. PP-Stream adopts hybrid privacy-preserving mechanisms for linear and non-linear operations of neuralnetwork inference. It treats inference data as real-time data streams, and parallelizes the inference operations across multiple pipelined stages that are executed by multiple servers and threads. It also solves the load-balanced resource allocation across servers and threads as an optimization problem. We prototype PP-Stream and show via testbed experiments that it achieves low inference latencies on various neuralnetwork models.
Massive MIMO systems are promising for wireless communications beyond 5G, but scalable Direction-of-Arrival (DOA) estimation in these systems is challenging due to the increasing number of required antennas. Existing ...
详细信息
Massive MIMO systems are promising for wireless communications beyond 5G, but scalable Direction-of-Arrival (DOA) estimation in these systems is challenging due to the increasing number of required antennas. Existing solutions, model-based or data-driven (typically using neuralnetworks), face scalability issues with the growing antenna array size. To address this issue, we propose a hybrid system that makes the overall approach scalable. In the front-end, we employ a modular distributed approach namely, the method of sparse linear inverse to compute a proxy spectrum from the sampled covariance matrix of the antenna subarrays. The proxy drives a fixed lightweight back-end which consists of a 1-dimensional Convolution neuralnetwork (1D-CNN) and a simplified peak extraction. The input proxy dimension being independent of the antenna count makes the neuralnetwork input invariant of the array size, enabling it to handle multiple array sizes without requiring any modification of the neuralnetwork structure. To reduce the computation of the covariance matrix and proxy spectrum, we employ a system of subarrays with Nearest-Neighbor communication. The proposed approach was implemented on a Xilinx ZCU102 FPGA targeting 100 MHz frequency for 8 to 256-element arrays. We achieve below 1 ms processing time for an array of 256 antennas while requiring significantly less computation than both model-based and data-driven approaches for large antenna arrays.
This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure...
详细信息
This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge computing, this study focuses on achieving optimal latency for a distributed computing system. A novel methodology was adopted, using configurable hardware to examine clusters of DLAs, varying in architecture and scheduling techniques. The system demonstrated its capability to parallel-process diverse neuralnetwork (NN) models, manage compute graphs in a pipelined sequence, and allocate computational resources efficiently to intensive NN layers. We examined five configurable DLAs-Versatile Tensor Accelerator (VTA), Nvidia DLA (NVDLA), Xilinx Deep processing Unit (DPU), Tensil Compute Unit (CU), and Pipelined Convolutional neuralnetwork (PipeCNN)-across two FPGA cluster types consisting of Zynq-7000 and Zynq UltraScale+ System-on-Chip (SoC) processors, respectively. Four deep neuralnetwork (DNN) workloads were tested: Scatter-Gather, AI Core Assignment, Pipeline Scheduling, and Fused Scheduling. These methods revealed an exponential decay in processing time up to 90% speedup, although deviations were noted depending on the workload and cluster configuration. This research substantiates FPGAs' utility in adaptable, efficient DL deployment, setting a precedent for future experimental configurations and performance benchmarks.
For ECG signal processing, information extraction from a noisy background is the fundamental objective. Filtering (noise suppression, baseline wander elimination) is a very important step in efficient ECG signal featu...
详细信息
For ECG signal processing, information extraction from a noisy background is the fundamental objective. Filtering (noise suppression, baseline wander elimination) is a very important step in efficient ECG signal features extracting to enhance the performance of automatic detection and classification of different cardiac diseases. In this paper we used distributed approximating functional (DAF) wavelets to develop algorithms for signal approximation and filtering. These algorithms use moving average artificial neuralnetwork with wavelet type Hermite activating function. They are evaluated in MATLAB with signals from the MIT-BIH arrhythmia database and comparisons are made with the classical (radial basis function and sigmoid type activating function) artificial neuronal networks. New functions were created and integrated into MATLAB environment. The outcomes indicate a good performance tradeoff between accuracy and response time, making this type of algorithms desirable also for real-time implementation
Deep learning has emerged as a cornerstone technology across various domains, from image classification to natural language processing. However, the computational and data demands of training large-scale neural networ...
详细信息
Deep learning has emerged as a cornerstone technology across various domains, from image classification to natural language processing. However, the computational and data demands of training large-scale neuralnetworks pose significant challenges. distributed learning approaches, particularly those leveraging data parallelism, have become critical to addressing these challenges. Among these, the parameter server architecture stands out as a widely adopted and scalable solution, enabling efficient training of large models across distributed systems. This survey provides a comprehensive exploration of the parameter server architecture, detailing its design principles and operation. It categorizes and critically analyzes research advancements across five key aspects: consistency control, network optimization, parameter management, straggler handling, and fault tolerance. By synthesizing insights from a wide range of studies, this work highlights the trade-offs and practical effectiveness of various approaches while identifying open challenges and future research directions. The survey aims to serve as a foundational resource for researchers and practitioners striving to enhance the performance and scalability of distributed deep learning systems.
暂无评论