In view of the disadvantages of traditional global maximum power point tracking (GMPPT) algorithms such as particle swarm optimization and cuckoo search algorithm that the structure is complex, the tracking path is un...
详细信息
With the development of deep learning, DNN models have become more complex. Large-scale model parameters enhance the level of AI by improving the accuracy of DNN models. However, they also present more severe challeng...
详细信息
ISBN:
(纸本)9789819708338;9789819708345
With the development of deep learning, DNN models have become more complex. Large-scale model parameters enhance the level of AI by improving the accuracy of DNN models. However, they also present more severe challenges to the hardware training platform for training a large model needs a lot of computing and memory resources, which can easily exceed the capacity of an accelerator. In addition, with the increasing demand for the accuracy of DNN models in academia and industry, the number of training iterations is also skyrocketing. In these backgrounds, more accelerators are integrated on a hierarchical platform to conduct distributed training. In distributed training platforms, the computation of the DNN model and the communication of the intermediate parameters are handled by different hardware modules, so their degree of parallelism profoundly affects the training speed. In this work, based on the widely used hierarchical Torus-Ring training platform and the Ring All-Reduce collective communication algorithm, we improve the speed of distributed training by optimizing the parallelism of communication and computation. Specifically, based on the analysis of the distributed training process, we schedule the computation and communication so that they execute simultaneously as much as possible. Finally, for data parallelism and model parallelism, we reduce the communication exposure time and the computation exposure time, respectively. Compared with the previous work, the training speed (including 5 training iterations) of the Resnet50 model and the Transformer model is increased by 23.77%-25.64% and 11.66%-12.83%.
At present, the concept of ecological civilization has been widely recognized by the whole world, and a series of policies that guarantees the evolution of the electric vehicle (EV) industry has been implemented by mu...
详细信息
At present, the concept of ecological civilization has been widely recognized by the whole world, and a series of policies that guarantees the evolution of the electric vehicle (EV) industry has been implemented by multiple countries. Therefore, it is significant to predict the charging load of EVs to solve the challenges of power system planning and operation. The Monte Carlo (MC) method is preferred by many scholars in EV charging load prediction because it is very suitable for describing random characteristics with a good prediction effect. To obtain more reliable and efficient prediction results, this paper analyzes the application of parallelcomputing technology in MC simulation. Firstly, EVs in the region are classified according to their battery capacity. Based on the voltage change curve of lithium-ion power batteries in the process of constant current charging under different capacities, the charging power, charging time, and state of charge (SOC) of EVs are investigated. Secondly, the behavior characteristics of users and the driving parameter characteristics of EVs are studied respectively, and the probability distribution model of multi-source information is established. Thirdly, parallelcomputing technology in the computer field is introduced, and an improved MC method is proposed based on the multi-core CPU architecture. After fully considering the complex constant current charging process after fitting, the charging load of EVs in a region of East China is simulated. Finally, the time cost and the load forecasting results of serial and parallel methods are compared and analyzed, and the progressiveness and effectiveness of the parallel method are verified. Results show that the charging load has four peaks in a day, taxis are the main source of the peak load of the power grid, the charging load of buses fluctuates the most, and private cars are the main backup capacity to participate in V2G dispatching in the future. In addition, under the experimen
parallel/distributed particle filters estimate the states of dynamic systems by using Bayesian interference and stochastic sampling techniques with multiple processing units (PUs). The sampling procedure and the resam...
详细信息
Component trees are powerful image processing tools to analyze the connected components of an image. One attractive strategy consists in building the nested relations at first and then deriving the components' att...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Component trees are powerful image processing tools to analyze the connected components of an image. One attractive strategy consists in building the nested relations at first and then deriving the components' attributes afterward, such that the user can switch between different attribute functions without having to re-compute the entire tree. Only sequential algorithms allow such an approach, while no parallel algorithm is available. In this paper, we extend a recent method using distributed memory techniques to enable posterior attribute computation in a parallel or distributed manner. This novel approach significantly reduces the computational time needed for combining several attribute functions interactively in Giga and Tera-Scale data sets.
Sparse LU factorization is a critical kernel in scientific computing and engineering applications. A better nonzero pattern of sparse matrixes can accelerate LU factorization by reordering. Traditionally it's diff...
详细信息
Load Balancing is generally referred as the technique to properly partition computation among processing elements in order to achieve optimal resource usage and thus reduce computation time. In this paper, we present ...
详细信息
ISBN:
(纸本)9781665469586
Load Balancing is generally referred as the technique to properly partition computation among processing elements in order to achieve optimal resource usage and thus reduce computation time. In this paper, we present a dynamic load balancing application in the context of the parallel execution of Cellular Automata where the domain space is partitioned in two dimensional regions that are assigned to different processing elements. Starting from general closed-form expressions that allow to compute the optimal workload assignment in a dynamic fashion when partitioning takes place along only one dimension, we extend the procedure to allow partitioning and balancing along both dimensions. As confirmed by the experimental results, two dimensional partitioning itself enables to speedup the execution, and further improvements are obtained when the load balancing occurs along both dimensions.
The proceedings contain 118 papers. The topics discussed include: research on improving sagging control strategy based on distributed inverter;thermal electric coupling analysis of direct heating of oil shale by power...
ISBN:
(纸本)9781510686793
The proceedings contain 118 papers. The topics discussed include: research on improving sagging control strategy based on distributed inverter;thermal electric coupling analysis of direct heating of oil shale by power supply;research on control strategy of grid-connected inverter for compressed air energy storage system;research on high performance analysis technology of distribution network digital twin based on parallel acceleration;research on the feedback braking optimization system of electric vehicle that makes full use of ground adhesion;lightweight method for cognitive large model of electric power equipment operation and inspection based on knowledge distillation;research on intelligent control algorithms and applications for mechatronic systems of robots;and research on the application of computer vision technology in automatic detection of substation equipment.
Accurate and effective load forecasting holds utmost importance in ensuring the secure and stable operation of power grids. To bolster the precision and efficiency of load forecasting models, this research integrates ...
详细信息
Sufficient reduction in energy loss of distribution networks is achievable through simultaneous network switching (NS), capacitor banks (CBs) placement, and the control of distributed energy sources (DESs), where the ...
详细信息
暂无评论