As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the tr...
详细信息
ISBN:
(纸本)9781450397339
As the field of deep learning progresses, and neuralnetworks become larger, training them has become a demanding and time consuming task. To tackle this problem, distributed deep learning must be used to scale the training of deep neuralnetworks to many workers. Synchronous algorithms, commonly used for distributing the training, are susceptible to faulty or straggling workers. Asynchronous algorithms do not suffer from the problems of synchronization, but introduce a new problem known as staleness. Staleness is caused by applying out-of-date gradients, and it can greatly hinder the convergence process. Furthermore, asynchronous algorithms that incorporate momentum often require keeping a separate momentum buffer for each worker, which cost additional memory proportional to the number of workers. We introduce a new asynchronous method, SMEGA(2), which requires a single momentum buffer regardless of the number of workers. Our method works in a way that lets us estimate the future position of the parameters, thereby minimizing the staleness effect. We evaluate our method on the CIFAR and ImageNet datasets, and show that SMEGA(2) outperforms existing methods in terms of final test accuracy while scaling up to as much as 64 asynchronous workers. Open-Source Code: https://***/rafi-cohen/SMEGA2
distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is...
详细信息
ISBN:
(纸本)9781665405409
distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is required to transmit information observed at the subarrays to the fusion center, where DOA estimation is performed. For noncoherent data fusion, the covariance matrices are used for subarray fusion. To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. The parallel encoders at the distributed subarrays are trained to compress the respective covariance matrices. The compressed results are sent to the fusion center where the signal DOAs are estimated using a classifier based on the compressed covariance matrices.
Drowsiness Detection (DD) is the procedure of identifying signs of drowsiness in individuals, especially in critical situations like driving, heavy machinery operation, or aircraft piloting. Hybrid Bi-directional Long...
详细信息
The long-term memory (LTM) is generally considered as the unlimited and permanent storage of information in the human brain. This concept has spurred numerous researches focusing on associative memory models, such as ...
详细信息
In this paper, we present design, implementation and evaluation of a novel predictive control framework to enable reliable distributed stream data processing, which features a Deep Recurrent neuralnetwork (DRNN) mode...
详细信息
ISBN:
(纸本)9781728112466
In this paper, we present design, implementation and evaluation of a novel predictive control framework to enable reliable distributed stream data processing, which features a Deep Recurrent neuralnetwork (DRNN) model for performance prediction, and dynamic grouping for flexible control. Specifically, we present a novel DRNN model, which makes accurate performance prediction with careful consideration for interference of co-located worker processes, according to multi-level runtime statistics. Moreover, we design a new grouping method, dynamic grouping, which can distribute/re-distribute data tuples to downstream tasks according to any given split ratio on the fly. So it can be used to re-direct data tuples to bypass misbehaving workers. We implemented the proposed framework based on a widely used distributed Stream Data processing System (DSDPS), Storm. For validation and performance evaluation, we developed two representative stream data processing applications: Windowed URL Count and Continuous Queries. Extensive experimental results show: 1) The proposed DRNN model outperforms widely used baseline solutions, ARIMA and SVR, in terms of prediction accuracy;2) dynamic grouping works as expected;and 3) the proposed framework enhances reliability by offering minor performance degradation with misbehaving workers.
Spiking neuralnetwork(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial ***,GPU-based simulators have been developed to ...
详细信息
Spiking neuralnetwork(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial ***,GPU-based simulators have been developed to support the real-time simulation of ***,these simulators’simulating performance and scale are severely limited,due to the random memory access pattern and the global communication between ***,we propose an efficient distributed heterogeneous SNN simulator based on the Sunway accelerators(including SW26010 and SW26010pro),named SWsnn,which supports accurate simulation with small time step(1/16 ms),randomly delay sizes for synapses,and larger scale network *** with existing GPUs,the Local Dynamic Memory(LDM)(similar to cache)in Sunway is much bigger(4 MB or 16 MB in each core group).To improve the simulation performance,we redesign the network data storage structure and the synaptic plasticity flow to make most random accesses occur in *** hides Message Passing Interface(MPI)-related operations to reduce communication costs by separating SNN general ***,SWsnn relies on parallel Compute processing Elements(CPEs)rather than serial Manage processing Element(MPE)to control the communicating buffers,using Register-Level Communication(RLC)and Direct Memory Access(DMA).In addition,SWsnn is further optimized using vectorization and DMA hiding *** results show that SWsnn runs 1.4−2.2 times faster than state-of-the-art GPU-based SNN simulator GPU-enhanced Neuronal networks(GeNN),and supports much larger scale real-time simulation.
Scalable data management is essential for processing large scientific dataset on HPC platforms for distributed deep learning. In-memory distributed storage is preferred for its speed, enabling rapid, random, and frequ...
详细信息
Wide dielectric barrier discharge (DBD) has broad application prospects in the modification of insulating materials, but the aging of the electrode directly affects the modification effect in the application process. ...
详细信息
ISBN:
(纸本)9789819648559
Wide dielectric barrier discharge (DBD) has broad application prospects in the modification of insulating materials, but the aging of the electrode directly affects the modification effect in the application process. As the size of the DBD device increases, the real-time evaluation of its modification effect becomes more complicated. Therefore, this paper proposes a real-time prediction and evaluation method for the modification effect of wide DBD insulation materials based on distributed current measurement and neuralnetwork model. The operating condition parameters such as DBD excitation voltage amplitude, repetition frequency, discharge working gas flow rate and reaction medium flow rate were changed. The discharge current at different positions was measured by self-made current coil. The water contact angle and flashover voltage at the corresponding position were tested experimentally as the evaluation criteria of the modification effect. The feature extraction of distributed current is carried out by manual and image recognition methods, and the prediction and evaluation models are established by BP neuralnetwork and convolutional neuralnetwork (CNN) respectively. The accuracy and generalization ability of the two models are compared. The results show that the CNN model based on image recognition has higher accuracy and generalisation ability in predicting the water contact angle and flashover voltage on the material surface compared to the BP neuralnetwork model based on manual feature extraction. Compared with the BP neuralnetwork, the CNN model reduces the mean absolute error (MAE) by 41.3% and the root mean square error (RMSE) by 36.1% in predicting the water contact angle of the material surface, and the mean absolute error (MAE) reduces by 47.7% and the RMSE reduces by 40.2% in the flashover voltage. The experimental results with different processing distances are used to examine the generalisation ability of the two models, and the results show that
The continued development of neuralnetwork architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inferen...
详细信息
Graph neuralnetworks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottlene...
详细信息
ISBN:
(纸本)9798400717932
Graph neuralnetworks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability. Sparse-matrix dense-matrix multiplication (SpMM) is the core computational operation in full-graph training of GNNs. Previous work parallelizing this operation focused on sparsity-oblivious algorithms, where matrix elements are communicated regardless of the sparsity pattern. This leads to a predictable communication pattern that can be overlapped with computation and enables the use of collective communication operations at the expense of wasting significant bandwidth by communicating unnecessary data. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches. First, we communicate only the necessary matrix elements. Second, we utilize a graph partitioning model to reorder the matrix and drastically reduce the amount of communicated elements. Finally, we address the high load imbalance in communication with a tailored partitioning model, which minimizes both the total communication volume and the maximum sending volume. We further couple these sparsity-exploiting approaches with a communication-avoiding approach (1.5D parallel SpMM) in which submatrices are replicated to reduce communication. We explore the tradeoffs of these combined optimizations and show up to 14x improvement on 256 GPUs and on some instances reducing communication to almost zero resulting in a communication-free parallel training relative to a popular GNN framework based on communication-oblivious SpMM.
暂无评论