Recurrent neuralnetworks (RNN) are extensively used to determine the optimal solutions to the various class recognition problems such as image processing, prediction of biomedical data and speech recognition. With th...
详细信息
Recurrent neuralnetworks (RNN) are extensively used to determine the optimal solutions to the various class recognition problems such as image processing, prediction of biomedical data and speech recognition. With the gradient problems, RNN is slowing losing its shade which is replaced by the Long short term memory (LSTM). However the hardware implementation of the LSTM requires more challenge due to its complexity and high power consumption which makes it unsuitable for implementing in Biological Internet of things networks for prediction of heart diseases. Several algorithms were proposed for an effective implementation of LSTM, but hand-offs between the performance and utilization still needs improvisation. The paper proposes the novel energy efficient and high performance architecture Pipelined Stochastic Adaptive distributed Architectures (P-SCADA) for LSTM networks. In this architecture, hybrid structure has been developed with the help of new distributed arithmetic stochastic computing (DSC) along with the binary circuits to advance the performance of the FPGA such as energy, area and accuracy. The proposed system has been implemented in ARTIX-7 FPGA with special purpose software has been designed and evaluated with different ECG datasets. For the different series data, area utilization is about 40%-44% and power consumption is about 20%-25% with the prediction of accuracy of 98%. Moreover the proposed architecture has been compared with the other existing architecture such as SPARSE architectures, normal stochastic architectures in which the proposed architecture excels in terms area, power and efficiency.
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each ...
详细信息
ISBN:
(纸本)9781713871088
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richtarik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. These two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. This allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, with the convergence rate improving when the number of parallel workers is large. This is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning.
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neuralnetworks (CNNs), along live axes: batch size, node (floati...
详细信息
ISBN:
(纸本)9781665414555
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neuralnetworks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.
Typhoons, formidable natural phenomena, typically unleash a trail of destruction, inflicting severe wind damage, floods, and even triggering tsunamis in coastal regions. Therefore, accurate prediction of typhoon inten...
详细信息
The use of new possibilities introduced by 5G networks also creates new problems and concerns, specifically in the field of user mobility in wireless communication systems. In this paper, the Authors investigate the e...
详细信息
Graph neuralnetworks (GNNs) have received much attention as GNNs have recently been successfully applied on non-euclidean data. However, artificially designed graph neuralnetworks often fail to get satisfactory mode...
详细信息
Graph neuralnetworks (GNNs) have received much attention as GNNs have recently been successfully applied on non-euclidean data. However, artificially designed graph neuralnetworks often fail to get satisfactory model performance for a given graph data. Graph neural architecture search effectively constructs the GNNs that achieve the expected model performance with the rise of automatic machine learning. The challenge is efficiently and automatically getting the optimal GNN architecture in a vast search space. Existing search methods serially evaluate the GNN architectures, severely limiting system efficiency. To solve these problems, we develop an Automatic Graph neural Architecture Search framework (Auto-GNAS) with parallel estimation to implement an automatic graph neural search process that requires almost no manual intervention. In Auto-GNAS, we design the search algorithm with multiple genetic searchers. Each searcher can simultaneously use evaluation feedback information, information entropy, and search results from other searchers based on sharing mechanism to improve the search efficiency. As far as we know, this is the first work using parallel computing to improve the system efficiency of graph neural architecture search. According to the experiment on the real datasets, Auto-GNAS obtain competitive model performance and better search efficiency than other search algorithms. Since the parallel estimation ability of Auto-GNAS is independent of search algorithms, we expand different search algorithms based on Auto-GNAS for scalability experiments. The results show that Auto-GNAS with varying search algorithms can achieve nearly linear acceleration with the increase of computing resources.
Recently, AI and deep neuralnetworks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal ...
详细信息
ISBN:
(纸本)9798400716713
Recently, AI and deep neuralnetworks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal latency, there is a need. However, IoT devices, with their limited computing capabilities, are not well-suited for AI inference. Moreover, considering the diverse requirements of different services, it is necessary to provide inference services that address these variations. To address these challenges, many previous studies have explored collaborative approaches between edge servers and cloud servers by partitioning DNN models. However, these methods face difficulties in finding optimal partitioning points for splitting DNN models and are heavily influenced by network bandwidth since intermediate computation results need to be transmitted to other devices. In this paper, we propose the Adaptive block-based DNN network inference framework. This involves breaking down a large DNN model into block-level networks, training them using knowledge distillation techniques to enable inference only through each block network. Subsequently, dynamic block-level inference calculations are offloaded based on the computing capabilities of edge clusters, providing inference results. Even when using multiple devices, our method is not affected by network bandwidth since only input images need to be transmitted. Experimental results demonstrate that our approach consistently reduces inference latency as the number of devices increases. Additionally, by controlling the trade-off between latency and accuracy, we can provide inference services tailored to various latency requirements.
This paper focuses on synchronization for complex-valued shunting inhibitory cellular neuralnetworks (SICNNs) with distributed delays and designs a novel feedback controller to ensure module-phase synchronization. Fo...
详细信息
ISBN:
(纸本)9789811903908;9789811903892
This paper focuses on synchronization for complex-valued shunting inhibitory cellular neuralnetworks (SICNNs) with distributed delays and designs a novel feedback controller to ensure module-phase synchronization. For the discussion of module-phase synchronization, a lemma is given to show the existence of the bounded solution of the drive system. By constructing a Lyapunov functional and employing the inequality technique, sufficient conditions for module-phase synchronization of complex-valued SICNNs are derived. Finally, the validity of obtained results is demonstrated by a numerical example.
With the network model architecture of Google Inception, research is conducted on issues such as the structural design of the model, data preprocessing, tuning of training parameters, computing clusters in a distribut...
详细信息
The large volumes of video data cause network congestion and high latency in the centralized cloud computing system. Fog computing architecture that enables employing edge devices has already been used to address thes...
详细信息
ISBN:
(数字)9781713852889
ISBN:
(纸本)9781713852889
The large volumes of video data cause network congestion and high latency in the centralized cloud computing system. Fog computing architecture that enables employing edge devices has already been used to address these problems. This paper proposes an application model called Video Analytic Data Reduction Model (VADRM) that divides video analytic jobs into smaller tasks with fewer processing requirements. The prototype of VADRM application model for typical video analytics applications (i.e., surveillance cameras) is implemented by Convolutional neuralnetwork (CNN). The analytical model is created based on the workload characterization of the prototype and used in the general simulation to measure the effectiveness of VADRM for employing edge computing instead of the cloud. The results show VADRM can allocate 45% of the data size for edge processing and 55.50% for cloud processing. iFogSim toolkit is used to simulate the fog environment and measure network performance when using VADRM model.
暂无评论