Deep Neural Networks (DNNs) have become increasingly computationally intensive and have larger parameters, requiring efficient parallelization or distribution using multiple accelerators. Pipeline parallelism has been...
详细信息
ISBN:
(纸本)9798400708893
Deep Neural Networks (DNNs) have become increasingly computationally intensive and have larger parameters, requiring efficient parallelization or distribution using multiple accelerators. Pipeline parallelism has been proposed as an effective way to distribute models and improve hardware utilization. However, the problem with pipeline parallelism is the trade-off between speedup and accuracy: synchronous approaches do not provide sufficient speedup, while asynchronous approaches suffer from accuracy degradation due to a different scheme from a single worker. In this paper, we propose AshPipe, a hybrid parallel framework that combines data parallelism and asynchronous pipeline parallelism to achieve efficient speedup for training. the proposed runtime uses the 1F1B schedule and data parallelism, with non-uniform numbers of workers and identical global batch sizes across stages. A Switch parallelism (SP) mechanism is also proposed as an option to mitigate accuracy degradation, which switches over from data parallelism to hybrid parallelism in the course of training. Experimental results show that AshPipe achieves 1.844x the throughput of data parallelism for ViT-H/14 whose parameter size is 632M. Withthe SP mechanism, AshPipe achieved a 30.2% reduction in training time with comparable accuracy compared to data parallelism when training on the CIFAR100 dataset.
the Encoder-Decoder architecture is often used for the seq-to-seq conversion problems in natural language processing. the Encoder is used for feature extraction and the Decoder to generate the sequence. this paper add...
详细信息
Malaria is a common and life-threatening disease in our country, with high-risk areas in several villages and hill tracts. Current detection methods are time-consuming and inaccessible. Our system analyzes digital ima...
详细信息
the DSP pipeline serial structure restricts the performance of the system, while the FPGA parallelprocessing data high frequency sampling data to meet the design needs. Firstly, this paper introduces the FPGA process...
详细信息
In contemporary social networks, dynamic privacy protection remains a pivotal yet challenging endeavor due to the intricate and evolving nature of information exchange. Traditional privacy models, predominantly static...
详细信息
Withthe increasing concern for environmental protection and resource optimization, efficient waste sorting has become a serious challenge today. In this paper, we propose a new offloading control problem that aims to...
详细信息
Adders are essential components of modern digital circuits, and their primary design goal is to achieve high speed. However, power consumption and chip area are also important considerations in modern circuit design. ...
详细信息
ISBN:
(纸本)9783031751691;9783031751707
Adders are essential components of modern digital circuits, and their primary design goal is to achieve high speed. However, power consumption and chip area are also important considerations in modern circuit design. Optimizing digital adder performance plays a crucial role in enhancing the speed of binary operations within complex circuits. Various architectures address the carry propagation bottleneck, each with its own strengths and weaknesses. Choosing the most appropriate architecture depends on the specific application requirements, ensuring optimal performance within the available resource constraints. this paper provides a comprehensive analysis of various adder topologies and their performance characteristics. By carefully considering the trade-offs between delay, power consumption, and area, engineers can choose the optimal architecture for their specific application requirements, leading to significant improvements in digital system performance and efficiency. the analyzed adder topologies include Ripple Carry Adder (RCA), Carry Lookahead Adder (CLA), Carry Skip Adder (CSK), Carry Select Adder (CSLA), Carry Increment Adder (CIA), Brent kung adder (BKA), Kong stone adder. the analysis is conducted using HDL on the Xilinx ISE 14.7 platform.
Text-based person retrieval primarily aims to retrieve the images of target persons represented by a given text query. In this task, how to effectively align images and text globally and locally is an important challe...
详细信息
Graphics processing units (GPUs) are widely used in the area of scientific computing. While GPUs provide much higher peak performance, efficient implementation of real applications on the GPU architectures is still a ...
详细信息
In the present context, the rise of artificial intelligence (AI) has brought to light the importance of expediting processes due to the advancement in AI. this issue holds significance across various domains of machin...
详细信息
暂无评论