The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and ana...
详细信息
ISBN:
(纸本)9781617823800
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks.
Recent impressive growth of AI applications in the most diversified heterogeneous domains is largely motivated by the availability of hardware accelerators used from the backstage of data centers (such as TPU, Tensor ...
详细信息
ISBN:
(纸本)9781538674628
Recent impressive growth of AI applications in the most diversified heterogeneous domains is largely motivated by the availability of hardware accelerators used from the backstage of data centers (such as TPU, Tensor processing Units, or VPUs, Visual processing Units) to the far edge of embedded devices equipped with DPUs and Deep Learning processing Units. High level toolchains for a more friendly usability of these platform had similar relevance in the process. In this paper we considered edge devices that provide an essential contribution for the deployment of "distributed intelligence" and are used typically at the gateway, CPE or Edge computing level. One of the typical assumptions is that Field Programmable Gate Array (FPGA) are far more expensive - with respect to power consumption - than legacy SBCs (single board computers). The main contribution of the paper is a fair comparison (at the same clock frequency and with the same main CPU) of processing time and power consumption of two different boards used for deep neuralnetwork classification. We will highlight the relevance of classification speed with respect to common KPIs adopted to compare the performances of automatic classification such as Loss, Precision, Recall, etc. This will be particularly relevant in the challenging domains of hardware accelerated real time control loops to provide distributed intelligence at the application level but also at the inner functions of emerging networking architectures.
While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data cond...
详细信息
ISBN:
(纸本)9781713845393
While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.
As a step towards thinking machines a modular neural architecture for an artificial cognitive system is presented. This system is designed to reproduce inner imagery and inner speech and to emulate cognitive functions...
详细信息
As a step towards thinking machines a modular neural architecture for an artificial cognitive system is presented. This system is designed to reproduce inner imagery and inner speech and to emulate cognitive functions like perception, attention, learning, etc. The system utilizes distributed signal representation and associative processing.
Decentralized optimization are playing an important role in applications such as training large machine learning models, among others. Despite its superior practical performance, there has been some lack of fundamenta...
详细信息
ISBN:
(纸本)9781713871088
Decentralized optimization are playing an important role in applications such as training large machine learning models, among others. Despite its superior practical performance, there has been some lack of fundamental understanding about its theoretical properties. In this work, we address the following open research question: To train an overparameterized model over a set of distributed nodes, what is the minimum communication overhead (in terms of the bits got exchanged) that the system needs to sustain, while still achieving (near) zero training loss? We show that for a class of overparameterized models where the number of parameters D is much larger than the total data samples N, the best possible communication complexity is Omega(N), which is independent of the problem dimension D. Further, for a few specific overparameterized models (i.e., the linear regression, and certain multi-layer neuralnetwork with one wide layer), we develop a set of algorithms which uses certain linear compression followed by adaptive quantization, and show that they achieve dimension independent, near-optimal communication complexity. To our knowledge, this is the first time that dimension independent communication complexity has been shown for distributed optimization.
The increasing amount of data to be processed coming from multiple sources, as in the case of sensor networks, and the need to cope with constraints of security and privacy, make necessary the use of computationally e...
详细信息
ISBN:
(纸本)9781538618950
The increasing amount of data to be processed coming from multiple sources, as in the case of sensor networks, and the need to cope with constraints of security and privacy, make necessary the use of computationally efficient techniques on simple and cheap hardware architectures often distributed in pervasive scenarios. Random Vector Functional-Link is a neuralnetwork model usually adopted for processingdistributed big data, but no constraints have been considered so far to deal with limited hardware resources. This paper is focused on implementing a modified version of the Random Vector Functional-Link network with finite precision arithmetic, in order to make it suited to hardware architectures even based on a simple microcontroller. A genetic optimization is also proposed to ensure that the overall performance is comparable with standard software implementations. The numerical results prove the efficacy of the proposed approach.
We study acceleration for distributed sparse regression in high-dimensions, which allows the parameter size to exceed and grow faster than the sample size. When applicable, existing distributed algorithms employing ac...
详细信息
ISBN:
(纸本)9781713871088
We study acceleration for distributed sparse regression in high-dimensions, which allows the parameter size to exceed and grow faster than the sample size. When applicable, existing distributed algorithms employing acceleration perform poorly in this setting, theoretically and numerically. We propose a new accelerated distributed algorithm suitable for high-dimensions. The method couples a suitable instance of accelerated Nesterov's proximal gradient with consensus and gradient-tracking mechanisms, aiming at estimating locally the gradient of the empirical loss while enforcing agreement on the local estimates. Under standard assumptions on the statistical model and tuning parameters, the proposed method is proved to globally converge at linear rate to an estimate that is within the statistical precision of the model. The iteration complexity scales as O(root kappa), while the communications per iteration are at most (O) over tilde (log m/(1-rho)), where kappa is the restricted condition number of the empirical loss, m is the number of agents, and rho is an element of[0,1) measures the network connectivity. As by-product of our design, we also report an accelerated method for high-dimensional estimations over master-worker architectures, which is of independent interest and compares favorably with existing works.
neuralnetwork (or Parallel distributedprocessing) models have been shown to have some potential for solving optimization problems. Most formulations result in NP-complete problems and solutions rely on energy based ...
详细信息
ISBN:
(纸本)0780314212
neuralnetwork (or Parallel distributedprocessing) models have been shown to have some potential for solving optimization problems. Most formulations result in NP-complete problems and solutions rely on energy based models, so there is no guarantee that the network converges to a global optimal solution. In this paper, we propose a non-energy based neural shortest path network based on the principle of dynamic programming and least take all network. No problem of local minima exists and it guarantees to reach the optimal solution. The network can work purely in an asynchronous mode which greatly increases the computation speed.
Many belief networks have been proposed that are composed of binary units. However, for tasks such as object and speech recognition which produce real-valued data, binary network models are usually inadequate. Indepen...
详细信息
ISBN:
(纸本)0262112450
Many belief networks have been proposed that are composed of binary units. However, for tasks such as object and speech recognition which produce real-valued data, binary network models are usually inadequate. Independent component analysis (ICA) learns a model from real data, but the descriptive power of this model is severly limited. We begin by describing the independent factor analysis (IFA) technique, which overcomes some of the limitations of ICA. We then create a multilayer network by cascading singlelayer IFA models. At each level, the IFA network extracts real-valued latent variables that are non-linear functions of the input data with a highly adaptive functional form, resulting in a hierarchical distributed representation of these data. Whereas exact maximum-likelihood learning of the network is intractable, we derive an algorithm that maximizes a lower bound on the likelihood, based on a variational approach.
distributed Denial-of-Service (DDoS) attacks are serious threats to a smart grid infrastructure services' availability, and can cause massive blackouts. This study describes an anomaly detection method for improvi...
详细信息
ISBN:
(纸本)9781538633601
distributed Denial-of-Service (DDoS) attacks are serious threats to a smart grid infrastructure services' availability, and can cause massive blackouts. This study describes an anomaly detection method for improving the detection rate of a DDoS attack in a smart grid. This improvement was achieved by increasing the classification of the training and testing phases in a convolutional neuralnetwork (CNN). An improved version of the variance fractal dimension trajectory (VFDTv2) was used to extract inherent features from the non-pure fractal input data. A discrete wavelet transform (DWT) was applied to the input data and the VFDTv2 to extract distinguishing features during data pre-processing. A support vector machine (SVM) was used for post data-processing. The implementation detected the DDoS attack with 87.35% accuracy.
暂无评论