Large scale distributedneuralnetworks have demonstrated promise for various inference tasks in Internet of Things (IoT) devices, including intelligent security monitoring and defense against network threats. However...
详细信息
The hybrid algorithm strategy proposed in this paper aims to combine the optimal power flow with voltage-var optimization to meet the load demand, reduce the transmission line losses and maintain the voltage within a ...
详细信息
The hybrid algorithm strategy proposed in this paper aims to combine the optimal power flow with voltage-var optimization to meet the load demand, reduce the transmission line losses and maintain the voltage within a practicable range. A distributedneuralnetwork algorithm is used to seek an optimal solution of active power flow which minimizes the cost of active power. In order to ensure that the optimal power flow will not cause a serious impact to the stability of the power grid, voltage-var optimization engines which employ a multi-algorithm coordination are presented to optimize the losses of power grid and the bus voltage. The simulation of IEEE 30-bus shows that the proposed hybrid algorithm strategy can not only minimize the cost of active power generation, but also satisfy the load demand under the precondition that all the bus voltage is within the reference range. The percentages of power losses comparisons verify that the proposed hybrid algorithm strategy can decrease the transmission line losses of the power grid effectively, which will not bring a serious influence to the stability of the power grid.
The increase in the amount of neural recording data will bring challenges for data communication, which has either limited bandwidth or power budget. Real-time neural spike detection and classification will greatly re...
详细信息
It is more crucial than ever to maintain proper security in the constantly changing world of distributed computing due to the complexity and variety of cyber threats that are on the rise. In this study, we describe a ...
详细信息
Recent advances in Artificial Intelligence (AI) have accelerated the adoption of AI at a pace never seen before. Large Language Models (LLM) trained on tens of billions of parameters show the crucial importance of par...
详细信息
Scaling deep neuralnetwork training to more processors and larger batch sizes is key to reducing end-to-end training time;yet, maintaining comparable convergence and hardware utilization at larger scales is challengi...
详细信息
Scaling deep neuralnetwork training to more processors and larger batch sizes is key to reducing end-to-end training time;yet, maintaining comparable convergence and hardware utilization at larger scales is challenging. Increases in training scales have enabled natural gradient optimization methods as a reasonable alternative to stochastic gradient descent and variants thereof. Kronecker-factored Approximate Curvature (K-FAC), a natural gradient method, preconditions gradients with an efficient approximation of the Fisher Information Matrix to improve per-iteration progress when optimizing an objective function. Here we propose a scalable K-FAC algorithm and investigate K-FAC's applicability in large-scale deep neuralnetwork training. Specifically, we explore layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling, with the goal of preserving convergence while minimizing training time. We evaluate the convergence and scaling properties of our K-FAC gradient preconditioner, for image classification, object detection, and language modeling applications. In all applications, our implementation converges to baseline performance targets in 9-25% less time than the standard first-order optimizers on GPU clusters across a variety of scales.
As the number of edge devices with computing resources (e.g., embedded GPUs, mobile phones, and laptops) increases, recent studies demonstrate that it can be beneficial to collaboratively run convolutional neural netw...
详细信息
ISBN:
(纸本)9781665481069
As the number of edge devices with computing resources (e.g., embedded GPUs, mobile phones, and laptops) increases, recent studies demonstrate that it can be beneficial to collaboratively run convolutional neuralnetwork (CNN) inference on more than one edge device. However, these studies make strong assumptions on the devices' conditions, and their application is far from practical. In this work, we propose a general method, called DistrEdge, to provide CNN inference distribution strategies in environments with multiple IoT edge devices. By addressing heterogeneity in devices, network conditions, and nonlinear characters of CNN computation, DistrEdge is adaptive to a wide range of cases (e.g., with different network conditions, various device types) using deep reinforcement learning technology. We utilize the latest embedded AI computing devices (e.g., NVIDIA Jetson products) to construct cases of heterogeneous devices' types in the experiment. Based on our evaluations, DistrEdge can properly adjust the distribution strategy according to the devices' computing characters and the network conditions. It achieves 1.1 to 3x speedup compared to state-of-the-art methods.
Zeroing neuralnetwork (ZNN), an effective method for tracking solutions of dynamic equations, has been developed and improved by various strategies, typically the application of nonlinear activation functions (AFs) a...
详细信息
Zeroing neuralnetwork (ZNN), an effective method for tracking solutions of dynamic equations, has been developed and improved by various strategies, typically the application of nonlinear activation functions (AFs) and varying parameters (VPs). Unlike VPs, AFs applied in ZNN models act directly on real-time error. The processing unit of v needs to obtain neural state in real time. In the implementation process, highly nonlinear AFs become an important cause of time delays, which eventually leads to instability and oscillation. However, most studies focus on exploring new theoretically valid AFs to improve performance of ZNNs, while ignoring the adverse effects of highly nonlinear AFs. The nonlinearity of AFs requires us fully consider time-delay tolerance of ZNNs using nonlinear AFs, so as to ensure that the model is not unstable even when disturbed by time delays. In this work, delay-perturbed generalized ZNN (DP-GZNN) is proposed to investigate time-delay tolerance of generalized ZNN (G-ZNN) in solving dynamic Lyapunov equation. Considering the nonlinearity of AFs, two delay terms are elegantly added to G-ZNN and DP-GZNN is then derived. After rigorous mathematical derivations, sufficient conditions in a linear matrix inequality (LMI) manner are presented for global convergence of DP-GZNN. Through rich numerical experiments, hyperparameters involved in the analysis process are discussed in detail. Comparative simulations are also conducted to compare the ability of different ZNN models to resist time delays. It is worth to mention that this is the first time to consider the ability of G-ZNN to resist discrete and distributed time delays.
distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and...
详细信息
As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the tr...
详细信息
ISBN:
(纸本)9781450397339
As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the training of deep neuralnetworks to many workers. Synchronous algorithms, commonly used for distributing the training, are susceptible to faulty or straggling workers. Asynchronous algorithms do not suffer from the problems of synchronization, but introduce a new problem known as staleness. Staleness is caused by applying out-of-date gradients, and it can greatly hinder the convergence process. Furthermore, asynchronous algorithms that incorporate momentum often require keeping a separate momentum buffer for each worker, which cost additional memory proportional to the number of workers. We introduce a new asynchronous method, SMEGA(2), which requires a single momentum buffer regardless of the number of workers. Our method works in a way that lets us estimate the future position of the parameters, thereby minimizing the staleness effect. We evaluate our method on the CIFAR and ImageNet datasets, and show that SMEGA(2) outperforms existing methods in terms of final test accuracy while scaling up to as much as 64 asynchronous workers. Open-Source Code: https://***/rafi-cohen/SMEGA2
暂无评论