Recently, the data-parallel pipeline approach has been widely used in training DNN models on commodity GPU servers. However, there are still three challenges for hybrid parallelism on commodity GPU servers: i) a balan...
详细信息
Recently, the data-parallel pipeline approach has been widely used in training DNN models on commodity GPU servers. However, there are still three challenges for hybrid parallelism on commodity GPU servers: i) a balanced model partition is crucial for efficiency, whereas prior works lack a sound solution to generate a balanced partition automatically;ii) an orchestrated device mapping is essential to reduce communication contention, however, prior works ignore server heterogeneity, exacerbating communication contention;iii) the startup overhead is inevitable and especially significant for deep pipelines, which is an essential source of pipeline bubbles and severely affects pipeline scalability. We propose AutoPipe-H to solve these three problems, which contains i) a pipeline partitioner component for automatically and quickly generating a balanced sub-block partition scheme;ii) a device mapping component that assigns pipeline stages to devices, considering server heterogeneity, to reduce communication contention;and iii) a distributed training runtime component that reduces pipeline startup overhead by splitting the micro-batch evenly. The experimental results show that AutoPipe-H can accelerate training by up to 1.26x over the hybrid parallelism framework DAPPLE and Piper, with a 2.73x-12.7x improvement in the partition balance and an order-of-magnitude time reduction in partition scheme searching.
Mobile edge computing (MEC) facilitates storage, cloud computing, and analysis capabilities near to the users in 5G communication systems. MEC and deep learning (DL) are combined in 5G networks to enable automated net...
详细信息
Mobile edge computing (MEC) facilitates storage, cloud computing, and analysis capabilities near to the users in 5G communication systems. MEC and deep learning (DL) are combined in 5G networks to enable automated network management that provides resource allocation (RA), energy efficiency (EE), and adaptive security, thereby reducing computational costs and enhancing user services. A hybrid quantum-classical convolutional neuralnetwork (HQCCNN) with simplicial attention network (SAN) is presented in the study that allocates appropriate resources for various users in the network. First, the green anaconda optimization (GAO) algorithm is used to optimize the objective function for effective RA. Consequently, the neuralnetwork receives the optimized objective functions to allocate resources. In the study, the suggested HQCCNN-GAO model assesses the degree of need for every user and, based on those needs, allots resources to every user in the 5G network while preserving higher throughput and EE. Throughput, latency, mean square errors, processing time, bit error rates, and EE are used to measure the proposed model's efficiency. A few of the RA models that are now in use are contrasted with the outcomes of the suggested method. From the obtained outcomes, it is noticed that the suggested model provides a low latency of 0.08 s and a high throughput of 790 kbps for a range of network users.
The Hopfield network is an example of an artificial neuralnetwork used to implement associative memories. A binary digit represents the neuron's state of a traditional Hopfield neuralnetwork. Inspired by the hum...
详细信息
The Hopfield network is an example of an artificial neuralnetwork used to implement associative memories. A binary digit represents the neuron's state of a traditional Hopfield neuralnetwork. Inspired by the human brain's ability to cope simultaneously with multiple sensorial inputs, this paper presents three multi-modal Hopfield-type neuralnetworks that treat multi-dimensional data as a single entity. In the first model, called the vector-valued Hopfield neuralnetwork, the neuron's state is a vector of binary digits. Synaptic weights are modeled as finite impulse response (FIR) filters in the second model, yielding the so-called convolutional associative memory. Finally, the synaptic weights are modeled by linear time-varying (LTV) filters in the third model. Besides their potential applications for multi-modal intelligence, the new associative memories may also be used for signal and image processing and solve optimization and classification tasks.
Artificial neuralnetworks (ANNs) represent a fundamentally connectionist and distributed approach to computing, and as such they differ from classical computers that utilize the von Neumann architecture. This has rev...
详细信息
Artificial neuralnetworks (ANNs) represent a fundamentally connectionist and distributed approach to computing, and as such they differ from classical computers that utilize the von Neumann architecture. This has revived research interest in new unconventional hardware for more efficient ANNs rather than emulating them on traditional machines. To fully leverage ANNs, optimization algorithms must account for hardware limitations and imperfections. Photonics offers a promising platform with scalability, speed, energy efficiency, and parallel processing capabilities. However, fully autonomous optical neuralnetworks (ONNs) with in-situ learning are scarce. In this work, we propose and demonstrate a ternary weight high-dimensional semiconductor laser-based ONN and introduce a method for achieving ternary weights using Boolean hardware, enhancing the ONN's information processing capabilities. Furthermore, we design an in-situ optimization algorithm that is compatible with both Boolean and ternary weights. Our algorithm results in benefits, both in terms of convergence speed and performance. Our experimental results show the ONN's long-term inference stability, with a consistency above 99% for over 10 h. Our work is of particular relevance in the context of in-situ learning under restricted hardware resources, especially since minimizing the power consumption of auxiliary hardware is crucial to preserving efficiency gains achieved by non-von Neumann ANN implementations.
Edge nodes, which are expected to grow into a multi-billion-dollar market, are essential for detection against a variety of cyber threats on Internet-of-Things endpoints. Adopting the current network intrusion detecti...
详细信息
Edge nodes, which are expected to grow into a multi-billion-dollar market, are essential for detection against a variety of cyber threats on Internet-of-Things endpoints. Adopting the current network intrusion detection system with deep learning models (DLM) based on FedACNN is constrained by the resource limitations of this network equipment layer. We solve this issue by creating a unique, lightweight, quick, and accurate edge detection model to identify DLM-based distributed denial service attacks on edge nodes. Our approach can generate real results at a relevant pace even with limited resources, such as low power, memory, and processing capabilities. The Federated Convolution neuralnetwork (FedACNN) deep learning method uses attention mechanisms to minimise communication delay. The developed model uses a recent cybersecurity dataset deployed on an edge node simulated by a Raspberry Pi (UNSW 2015). Our findings show that, compared to traditional DLM methodologies, our model retains a high accuracy rate of about 99%, even with decreased CPU and memory resource use. Also, it is about three times smaller in volume than the most advanced model while requiring a lot less testing time.
We designed an efficient signal processing implemented with artificial intelligence using a deep neuralnetwork for image monitoring of underwater laser cutting for nuclear power plant dismantling. Monitoring images f...
详细信息
We designed an efficient signal processing implemented with artificial intelligence using a deep neuralnetwork for image monitoring of underwater laser cutting for nuclear power plant dismantling. Monitoring images for underwater laser cutting with intense flames in turbid water are characterized by low visibility while pixel values of an image are distributed over the entire dynamic range. The visibility for underwater laser cutting operations was improved by widely stretching pixel value distribution to the full possible dynamic range after removing excessively dark or bright pixels that are far from the dominant pixel intensity distribution. Here, areas of intense flame where pixel values are close to saturation values are preserved. In addition, an efficiently designed look-up table increases contrast in cutting areas with intense flames, and an image acquisition method using the lowest pixel values in the latest frames reduces intermittent monitoring interference caused by the flames erupting in irregular patterns and flowing bubbles. A deep learning neuralnetwork trained with the designed signal processing datasets effectively improved the image monitoring performance in underwater laser cutting experiments.
In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,bu...
详细信息
In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neuralnetwork model for intelligent mesh *** adopts graph neuralnetworks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.
In this paper we develop a novel learning-based approach for mobile distributed beamforming without channel state information. We consider narrowband beamforming between a mobile UAV group and a base station under lim...
详细信息
ISBN:
(纸本)9798350344820;9798350344813
In this paper we develop a novel learning-based approach for mobile distributed beamforming without channel state information. We consider narrowband beamforming between a mobile UAV group and a base station under limited feedback, and propose a graph recurrent neuralnetwork (GRNN) approach to leverage local collaboration among the UAVs. The GRNN method is shown to be robust to variations in UAV speeds and group heading, and scales with the UAV group size. We compare to codebook and binary feedback methods and show that better performance is achieved with the proposed GRNN method.
The continued development of neuralnetwork architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inferen...
详细信息
ISBN:
(数字)9798350368499
ISBN:
(纸本)9798350368505
The continued development of neuralnetwork architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inference on multiple devices. Most prior efforts have focused on optimizing singledevice inference or partitioning models to enhance inference throughput. Meanwhile, energy consumption continues to grow in importance as a factor of consideration. This work proposes a framework that searches for optimal model splits and distributes the partitions across the combination of devices taking into account throughput and energy. Participating devices are strategically grouped into homogeneous and heterogeneous clusters consisting of general-purpose CPU and GPU architectures, as well as emerging Compute-In-Memory (CIM) accelerators. The framework simultaneously optimizes inference throughput and energy consumption. It is able to demonstrate up to $4 \times$ speedup with approximately $4 \times$ per-device energy reduction in a heterogeneous setup compared to single GPU inference. The algorithm also finds a smooth Pareto-like curve in the energy-throughput space for CIM devices.
Traffic management systems have primarily relied on live traffic sensors for real-time traffic guidance. However, this dependence often results in uneven service delivery due to the limited scope of sensor coverage or...
详细信息
Traffic management systems have primarily relied on live traffic sensors for real-time traffic guidance. However, this dependence often results in uneven service delivery due to the limited scope of sensor coverage or potential sensor failures. This research introduces a novel approach to overcome this limitation by synergistically integrating a Physics-Informed neuralnetwork-based Traffic State Estimator (PINN-TSE) with a powerful Natural Language processing model, GPT-4. The purpose of this integration is to provide a seamless and personalized user experience, while ensuring accurate traffic density prediction even in areas with limited data availability. The innovative PINN-TSE model was developed and tested, demonstrating a promising level of precision with a Mean Absolute Error of less than four vehicles per mile in traffic density estimation. This performance underlines the model's ability to provide dependable traffic information, even in regions where conventional traffic sensors may be sparsely distributed or data communication is likely to be interrupted. Furthermore, the incorporation of GPT-4 enhances user interactions by understanding and responding to inquiries in a manner akin to human conversation. This not only provides precise traffic updates but also interprets user intentions for a tailored experience. The results of this research showcase an AI-integrated traffic guidance system that outperforms traditional methods in terms of traffic estimation, personalization, and reliability. While the study primarily focuses on a single road segment, the methodology shows promising potential for expansion to network-level traffic guidance, offering even greater accuracy and usability. This paves the way for a smarter and more efficient approach to traffic management in the future.
暂无评论