Serverless computing offers attractive scalability, elasticity and cost-effectiveness. However, constraints on memory, CPU and function runtime have hindered its adoption for dataintensive applications and machine lea...
详细信息
Based on a comprehensive analysis of different work and management indicators, as well as the quantitative assessment outcomes of the implemented personal safety responsibility system, this study proposes the utilizat...
详细信息
Deep neural networks used for computer vision tasks are typically trained on datasets consisting of thousands of images, called examples. Recent studies have shown that examples in a dataset are not of equal importanc...
详细信息
ISBN:
(纸本)9783031352591;9783031352607
Deep neural networks used for computer vision tasks are typically trained on datasets consisting of thousands of images, called examples. Recent studies have shown that examples in a dataset are not of equal importance for model training and can be categorized based on quantifiable measures reflecting a notion of "hardness" or "importance". In this work, we conduct an empirical study of the impact of importance-aware partitioning of the dataset examples across workers on the performance of data-parallel training of deep neural networks. Our experiments with CIFAR-10 and CIFAR-100 image datasets show that data-parallel training with importance-aware partitioning can perform better than vanilla data-parallel training, which is oblivious to the importance of examples. More specifically, the proper choice of the importance measure, partitioning heuristic, and the number of intervals for dataset repartitioning can improve the best accuracy of the model trained for a fixed number of epochs. We conclude that the parameters related to importance-aware data-parallel training, including the importance measure, number of warmup training epochs, and others defined in the paper, may be considered as hyperparameters of data-parallel model training.
Checkpointing (or snapshotting) a system's state has always been a problem of great interest and has found a lot of use in ensuring system reliability, record-replay debugging, job migration and running high-throu...
详细信息
ISBN:
(纸本)9798350383225
Checkpointing (or snapshotting) a system's state has always been a problem of great interest and has found a lot of use in ensuring system reliability, record-replay debugging, job migration and running high-throughput transaction systems. In the last few years ultra-fast hardware-assisted NVM-based checkpointing schemes have come up that can collect incremental full-system checkpoints in milliseconds. Unfortunately, such systems have large overheads in terms of their write amplification (increased number of writes). This, in turn, seriously reduces the reliability and lifetime of NVM devices. We propose the first tunable scheme in this space, JASS, where given a checkpoint latency (CL), we near-optimally minimize the write amplification (WA). This allows us to run parallel programs in a disciplined fashion. To realize this goal, we propose many novel hardware along the way such as a rigorous method of flushing pre-checkpoint messages in the NoC, a novel DRAM scrubber and locality predictor, and a control-theoretic algorithm to guarantee a CL while minimizing the WA. We reduce WA by 35-96% as compared to the nearest state-of-the-art competing method and improve performance of PARSEC benchmarks by 19.4%.
An electrical grid-connected wind energy system tries to effectively incorporate wind power while maintaining constant voltage and frequency.A power electronics-based system called the STATCOM offers quick and adaptab...
详细信息
Deep Neural Network (DNN) frameworks need parallelism plans to execute immense models. The computed plans often combine data, model, and pipeline parallelism. Unfortunately, due to the intractable property of the prob...
详细信息
ISBN:
(纸本)9798350383461;9798350383454
Deep Neural Network (DNN) frameworks need parallelism plans to execute immense models. The computed plans often combine data, model, and pipeline parallelism. Unfortunately, due to the intractable property of the problem, the current parallelism planners often fail to derive plans for immense DNNs. They either rely on experts to generate plans manually or profiling for their evaluation, making planners expensive and sub-optimal. We propose RAPID, an automatic parallelism planner for immense DNNs driven by a hierarchical abstract machine model. This model enables the design of a symbolic-based cost model that achieves robust prediction of parallelism cost with symbolic simplification. RAPID divides the parallelization problem hierarchically and symmetrically into linear-time sub-problems. We prove that the composition of the sub-problem solutions is optimal. Large-scale cluster experiments show that RAPID can reduce the planning time of immense DNNs (e.g., BERT) by up to 67x compared to state-of-the-art planners;while exhibiting high performance that matches expert-optimized plans.
The amazing success of deep neural network benefits from the rise of big data. As deep learning models are becoming more scale than ever before, their requirements for memory bandwidth are growing at a tremendous pace...
详细信息
The number of Web services on the Internet has been steadily increasing in recent years due to their growing popularity. Under the big data environment, how to effectively manage Web services is of significance for se...
详细信息
Dataset Distillation (DD) is a technique for synthesizing smaller, compressed datasets from large original datasets while retaining essential information to maintain efficacy. Efficient DD is a current research focus ...
详细信息
ISBN:
(纸本)9798400704369
Dataset Distillation (DD) is a technique for synthesizing smaller, compressed datasets from large original datasets while retaining essential information to maintain efficacy. Efficient DD is a current research focus among scholars. Squeeze, Recover and Relabel ((SReL)-L-2) and Adversarial Prediction Matching (APM) are two advanced and efficient DD methods, yet their performance is moderate with lower volumes of distilled data. This paper proposes an ingenious improvement method, distributed Boosting (DB), capable of significantly enhancing the performance of these two algorithms at low distillation volumes, leading to DB-(SReL)-L-2 and DB-APM. Specifically, DB is divided into three stages: Distribute&Encapsulate, Distill, and Integrate&Mix-relabel. DB-(SReL)-L-2, compared to (SReL)-L-2, demonstrates performance improvements of 25.2%, 26.9%, and 26.2% on full 224x224 ImageNet-1k at Images Per Class (IPC) 10, CIFAR-10 at IPC 10, and CIFAR-10 at IPC 50, respectively. Meanwhile, DB-APM, in comparison to APM, exhibits performance enhancements of 21.2% and 20.9% on CIFAR-10 at IPC 10, CIFAR-100 at IPC 1, respectively. Additionally, we provide a theoretical proof of convergence for DB. To the best of our knowledge, DB is the first method suitable for distributedparallelcomputing scenarios.
Hybrid fuzzing as an automated vulnerability detection technique, has gained widespread attention in recent years. It combines the advantages of fuzzing and concolic execution. Yet, existing hybrid fuzzing techniques ...
详细信息
暂无评论