Identification of bird species from audio records is one of the challenging tasks due to the existence of multiple species in the same recording, noise in the background, and long-term recording. Besides, choosing a p...
详细信息
Parallel Split Learning (SL) allows resource-constrained devices that cannot participate in Federated Learning (FL) to train deep neuralnetworks (NNs) by splitting the NN model into parts. In particular, such devices...
详细信息
Parallel Split Learning (SL) allows resource-constrained devices that cannot participate in Federated Learning (FL) to train deep neuralnetworks (NNs) by splitting the NN model into parts. In particular, such devices (clients) may offload the processing task of the largest model part to a computationally powerful helper, and multiple helpers may be employed and work in parallel. In hybrid federated and split learning (HFSL), on the other hand, devices can participate in the training process through any of the two protocols (SL and FL), depending on the system's characteristics. This could considerably reduce the maximum training time over all clients (makespan), especially in highly heterogeneous scenarios. In this paper, we study the joint problem of the training protocol selection, client-helper assignments, and scheduling decisions, to minimize the training makespan. We prove this problem is NP-hard and propose two solution methods: one based on the decomposition of the problem by leveraging its inherent symmetry, and a second fully scalable one. Through numerical evaluations using our testbed's measurements, we build a solution strategy comprising these methods. Moreover, this strategy finds a near-optimal solution and achieves a shorter makespan than the baseline schemes by up to 71%.
Background: Pneumonia is a respiratory disease caused by bacteria;it affects many people, particularly in impoverished countries where pollution, unclean living standards, overpopulation, and insufficient medical infr...
详细信息
Background: Pneumonia is a respiratory disease caused by bacteria;it affects many people, particularly in impoverished countries where pollution, unclean living standards, overpopulation, and insufficient medical infrastructures are prevalent. To guarantee curative therapy and boost survival chances, it is vital to detect pneumonia soon enough. Imaging using chest X-rays is the most common way of detecting pneumonia. However, analyzing chest X-rays is a complex process vulnerable to subjective variation. Moreover, the data available is growing exponentially, and it will take hours and days to train the model to predict pneumonia. Timely prediction is significant to guarantee a better cure and treatment. Existing work provided by different authors needs more precision, and the computation time for predicting pneumonia is also much longer. Therefore, there is a requirement for early forecasting. Using X-ray picture samples, the system must have a continuous and unsupervised learning system for early diagnosis. Methods: In this article, the training time of the model is accelerated using the distributed data-parallel approach and the computational power of high-performance computing devices. This research aims to diagnose pneumonia using X-ray pictures with more precision, greater speed, and fewer processing resources. distributed deep learning techniques are gaining popularity owing to the rising need for computational resources for deep learning models with several parameters. In contrast to conventional training methods, data-parallel training enables several compute nodes to train massive deep-learning models to improve training efficiency concurrently. Deploying the model in Spark solves the scalability and acceleration. Spark's distributedprocessing capability reads data from multiple nodes, and the results demonstrate that training time can be drastically reduced by utilizing these techniques, which is a significant necessity when dealing with large datasets. R
Based on the theory of speech recognition and the characteristics of music recognition, this study studies, recognizes and processes music sounds. Artificial neuralnetwork (ANN) is a distributed parallel information ...
详细信息
With machine learning workloads currently at very large scales, models are distributed across large compute systems. On distributed systems, the performance of these models are limited by the bandwidth limitations of ...
With machine learning workloads currently at very large scales, models are distributed across large compute systems. On distributed systems, the performance of these models are limited by the bandwidth limitations of chip-to-chip communication. To relieve this bottleneck, spiking neuralnetworks (SNNs) can be utilized to reduce inter-chip communication traffic utilizing inherit network sparsity. However, in comparison to traditional artificial neuralnetworks (ANNs), SNNs can have significant degradation in performance with increased network scale and *** research proposes a hybrid neuralnetwork accelerator that uses the best of both spiking and non-spiking layers by allocating a majority of resources to nonspiking layers on the interior of the chip while bandwidth-limited areas (e.g., I/O pads, or chip separation boundaries) employ spike-based data traffic. By limiting the overall use of spiking layers within the network, we realize the energy savings of SNNs without the a degradation in accuracy which comes with large spike-based *** present a scalable chiplet architecture and show how hybrid data is managed with both spike and non-spiking data communication. We also demonstrate how the asynchronous spike-based model is integrated efficiently with the synchronous artificial-based deep learning workloads. We demonstrate that our hybrid architecture offers significant improvements in performance, accuracy, and energy consumption in comparison to SNNs and ANNs. With up to a 1.34× increase in energy efficiency and 1.56× decrease in single inference latency, the versatility of the architecture is demonstrated by its validation across multiple datasets, encompassing both language processing and computer vision tasks.
It is important to scale out deep neuralnetwork (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multipl...
详细信息
It is important to scale out deep neuralnetwork (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our investigations have shown that popular open-source DNN systems could only achieve 2.5 speedup ratio on 64 GPUs connected by 56 Gbps network. To address this problem, we propose a communication backend named GradientFlow for distributed DNN training, and employ a set of network optimization techniques. First, we integrate ring-based allreduce, mixed-precision training, and computation/communication overlap into GradientFlow. Second, we propose lazy allreduce to improve network throughput by fusing multiple communication operations into a single one, and design coarse-grained sparse communication to reduce network traffic by only transmitting important gradient chunks. When training AlexNet and ResNet-50 on the ImageNet dataset using 512 GPUs, our approach could achieve 410.2 and 434.1 speedup ratio, respectively.
distributed clustering algorithms are employed in wireless sensor network (WSN) to improve the local data analysis. This process is carried out collaboratively with the help of nearby neighbours without a central cont...
详细信息
distributed Acoustic Sensing (DAS) technology leverages optical fibers to detect acoustic signals over long distances, offering high-resolution data critical for applications such as seismic monitoring, structural hea...
distributed Acoustic Sensing (DAS) technology leverages optical fibers to detect acoustic signals over long distances, offering high-resolution data critical for applications such as seismic monitoring, structural health monitoring, and security. A significant challenge in DAS systems is the accurate classification of detected events, which is crucial for their reliability. Traditional signal processing methods often struggle with the high-dimensional, noisy data produced by DAS systems, making advanced machine learning techniques essential for improved event classification. However, the lack of large, high-quality datasets has hindered progress. In this study, we present a comprehensive labeled dataset of DAS measurements collected around a university campus, featuring events such as walking, running, and vehicular movement, as well as potential security threats. This dataset provides a valuable resource for developing and validating machine learning models, enabling more accurate and automated event classification. The quality of the dataset is demonstrated through the successful training of a Convolutional neuralnetwork (CNN).
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neuralnetworks using a modern parallel multi-GPU system, by enforcing s...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
In this paper we demonstrate that it is possible to obtain considerable improvement of performance and energy aware metrics for training of deep neuralnetworks using a modern parallel multi-GPU system, by enforcing selected, non-default power caps on the GPUs. We measure the power and energy consumption of the whole node using a professional, certified hardware power meter. For a high performance workstation with 8 GPUs, we were able to find non-default GPU power cap settings within the range of 160-200W to improve the difference between percentage energy gain and performance loss by over 15.0%, EDP (Abbreviations and terms used are described in main text.) by over 17.3%, EDS with k = 1.5 by over 2.2%, EDS with k = 2.0 by over 7.5% and pure energy by over 25%, compared to the default power cap setting of 260W per GPU. These findings demonstrate the potential of today's CPU+GPU systems for configuration improvement in the context of performance-energy consumption metrics.
This paper addresses the challenge of joint active user detection (AUD) and channel estimation (CE) for grant-free random access within massive machine-type communications (mMTC) enabled distributed antenna systems (D...
详细信息
暂无评论