This work presents a novel approach to distributed training of deep neuralnetworks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data p...
详细信息
ISBN:
(纸本)9781728116440
This work presents a novel approach to distributed training of deep neuralnetworks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.
This work attempts to demonstrate the advantages provided by Globus software in a distributed memory environment when running a parallel application. For this, we have programmed with MPI and evaluated the code, which...
详细信息
ISBN:
(纸本)0769520839
This work attempts to demonstrate the advantages provided by Globus software in a distributed memory environment when running a parallel application. For this, we have programmed with MPI and evaluated the code, which is part of the training of an artificial neural net for the search for the Higgs's boson, in a Grid/Globus environment using the MPICH-G2 message passing library.
Computer networks need routing methods that maximize their performance. To achieve high-speed communications in a network with changing load patterns, an adoptive method that makes routing decisions at a high speed mu...
详细信息
ISBN:
(纸本)1892512459
Computer networks need routing methods that maximize their performance. To achieve high-speed communications in a network with changing load patterns, an adoptive method that makes routing decisions at a high speed must be wed. This paper presents a neuralnetwork method that enables high-speed hot-potato routing on general network topologies. Hot-potato routing is a distributed adoptive method. It does not buffer packets and hence avoids problems that may arise from the general store-and-forward method.
A hybrid neuralnetwork system for the recognition of handwritten character using SOFM,BP and Fuzzy network is presented. The horizontal and vertical project of preprocessed character and 4_directional edge project ar...
详细信息
ISBN:
(纸本)0819442836
A hybrid neuralnetwork system for the recognition of handwritten character using SOFM,BP and Fuzzy network is presented. The horizontal and vertical project of preprocessed character and 4_directional edge project are used as feature vectors. In order to improve the recognition effect, the GAT algorithm is applied. Through the hybrid neuralnetwork system, the recognition rate is improved visibly.
We introduce Arbor, a performance portable library for simulation of large networks of multi-compartment neurons on HPC systems. Arbor is open source software, developed under the auspices of the HBP. The performance ...
详细信息
ISBN:
(纸本)9781728116440
We introduce Arbor, a performance portable library for simulation of large networks of multi-compartment neurons on HPC systems. Arbor is open source software, developed under the auspices of the HBP. The performance portability is by virtue of back-end specific optimizations for x86 multicore, Intel KNL, and NVIDIA GPUs. When coupled with low memory overheads, these optimizations make Arbor an order of magnitude faster than the most widely used comparable simulation software. The single-node performance can be scaled out to run very large models at extreme scale with efficient weak scaling.
In this study, a new algorithm with distributed systems is proposed in order to optimise the structure of the classifiers, which have a great importance in pattern recognition. The algorithm is applied to a multi laye...
详细信息
ISBN:
(纸本)0769511538
In this study, a new algorithm with distributed systems is proposed in order to optimise the structure of the classifiers, which have a great importance in pattern recognition. The algorithm is applied to a multi layer neuralnetwork classifier which uses the rule of back propagation learning. The long process period is shortened and expected high operation speed is achieved in pattern recognition by minimizing the hardware realization of the classifier.
The emergence of the Internet of Things (IoT) has led to a remarkable increase in the volume of data generated at the network edge. In order to support real-time smart IoT applications, massive amounts of data generat...
详细信息
ISBN:
(纸本)9781450388160
The emergence of the Internet of Things (IoT) has led to a remarkable increase in the volume of data generated at the network edge. In order to support real-time smart IoT applications, massive amounts of data generated from edge devices need to be processed using methods such as deep neuralnetworks (DNNs) with low latency. To improve application performance and minimize resource cost, enterprises have begun to adopt Edge computing, a computation paradigm that advocates processing input data locally at the network edge. However, as edge nodes are often resource-constrained, running data-intensive DNN inference tasks on each individual edge node often incurs high latency, which seriously limits the practicality and effectiveness of this model. In this paper, we study the problem of distributed execution of inference tasks on edge clusters for Convolutional neuralnetworks (CNNs), one of the most prominent models of DNN. Unlike previous work, we present Fully Decomposable Spatial Partition (FDSP), which naturally supports resource heterogeneity and dynamicity in edge computing environments. We then present a compression technique that further reduces network communication overhead. Our system, called ADCNN, provides up to 2.8x speed up compared to state-of-the-art approaches, while achieving a competitive inference accuracy.
The paper proposes a distributed computational structure called knowledge-based network as a classification scheme in pattern recognition. Unlike the existing architectures and algorithms of pattern recognition the ne...
详细信息
The paper proposes a distributed computational structure called knowledge-based network as a classification scheme in pattern recognition. Unlike the existing architectures and algorithms of pattern recognition the network allows for an explicit representation of domain classification knowledge while maintaining its learning capabilities. The knowledge-based network blends useful properties of knowledge-based systems (namely explicit knowledge representation) with those advantageous for neuralnetworks (viz. learning). The network is composed of basic AND and OR neurons. Fuzzy clustering constitutes a preprocessing phase leading towards developing geometric constructs. They contribute to a conceptual level around which numerical processing of the classifier is centred.
In this paper, it is proposed that a novel strategy of based hierarchical data distribution and deep neuralnetworks distribution over edge and end devices. In the Industrial Internet of Things environment, deep learn...
详细信息
ISBN:
(纸本)9781728186160
In this paper, it is proposed that a novel strategy of based hierarchical data distribution and deep neuralnetworks distribution over edge and end devices. In the Industrial Internet of Things environment, deep learning tasks such as smoke and fire classification based on convolutional neuralnetwork usually need to be performed on edge servers and end devices, which have limited computing resources, while edge servers have abundant computing resources. While being able to accommodate inference of a deep neuralnetwork (DNN) at the edge server, a distributed deep neuralnetwork (DDNN) also allows localized inference using a portion of the neuralnetwork at the end sensing devices. Therefore, this article proposed the distributed strategy can dynamically adjust network layers and data allocation proportion of end devices and edge servers according to different tasks to shorten the data processing time. A joint optimization problem is proposed to minimize the total delay, which is affected by the complexity of the DL model, the inference error rate, the computing power of the end devices and the edge servers. An analytical solution of a closed solution is derived and an optimal distributed data allocation and neuralnetwork allocation algorithm is proposed
As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the tr...
详细信息
ISBN:
(纸本)9781450397339
As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the training of deep neuralnetworks to many workers. Synchronous algorithms, commonly used for distributing the training, are susceptible to faulty or straggling workers. Asynchronous algorithms do not suffer from the problems of synchronization, but introduce a new problem known as staleness. Staleness is caused by applying out-of-date gradients, and it can greatly hinder the convergence process. Furthermore, asynchronous algorithms that incorporate momentum often require keeping a separate momentum buffer for each worker, which cost additional memory proportional to the number of workers. We introduce a new asynchronous method, SMEGA(2), which requires a single momentum buffer regardless of the number of workers. Our method works in a way that lets us estimate the future position of the parameters, thereby minimizing the staleness effect. We evaluate our method on the CIFAR and ImageNet datasets, and show that SMEGA(2) outperforms existing methods in terms of final test accuracy while scaling up to as much as 64 asynchronous workers. Open-Source Code: https://***/rafi-cohen/SMEGA2
暂无评论