This paper proposes one approach with federated learning technique to address practical challenges faced by the emerging green energy industries, i.e., wind turbines in terms of Predictive Health Management (PHM). Not...
详细信息
ISBN:
(纸本)9798350389463;9798350389470
This paper proposes one approach with federated learning technique to address practical challenges faced by the emerging green energy industries, i.e., wind turbines in terms of Predictive Health Management (PHM). Not as many federated learning applications being used in the scenarios only for simulation, the application of federated learning in this paper is focused on the real industrial problems with raw data collected from the fields. Huge amount of real data was collected by sensors on more than ten wind turbines across different areas in China and transmitted to the storage for in-time processing. The framework proposed in this paper called TurboFed, can handle the raw data and achieves good prediction performance in the practical wind generated power systems. The framework showed its help on improving the efficiency of the wind turbines. The paper has brought three novel results. First, as far as known, the framework here is the first federated learning framework addressing position adjustment of wind turbines in the real environment. Second, it deploys customized recurrent neural computing models to the wind turbines which are considered the client devices under the federated learning paradigm. Finally, it incorporates new customized aggregation algorithms on the sever side.
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help o...
详细信息
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain high accuracy. To address this problem, we propose an automated DSE framework-AutoDSE-that leverages a bottleneck-guided coordinate optimizer to systematically find a better design point. AutoDSE detects the bottleneck of the design in each step and focuses on high-impact parameters to overcome it. The experimental results show that AutoDSE is able to identify the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for MachSuite and Rodinia benchmarks. Compared to the manually optimized HLS vision kernels in Xilinx Vitis libraries, AutoDSE can reduce their optimization pragmas by 26.38x while achieving similar performance. With less than one optimization pragma per design on average, we are making progress towards democratizing customizable computing by enabling software programmers to design efficient FPGA accelerators.
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help o...
详细信息
ISBN:
(纸本)9781450382182
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still must manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain high accuracy. We address this problem by incorporating an automated DSE framework - AutoDSE - that leverages bottleneck-guided gradient optimizer to systematically find a better design point. AutoDSE finds the bottleneck of the design in each step and focuses on high-impact parameters to overcome that, which is like the approach an expert would take. The experimental results show that AutoDSE is able to find the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for Machsuite and Rodinia benchmarks and 1.04x over the manually designed HLS accelerated vision kernels in Xilinx Vitis libraries yet with 26x reduction of their optimization pragmas.
Convolutional Neural Network (CNN) accelerator design on resource limited platform faces the challenge of lacking efficient design space exploration (DSE) method because of its huge and irregular design space. Numerou...
详细信息
Convolutional Neural Network (CNN) accelerator design on resource limited platform faces the challenge of lacking efficient design space exploration (DSE) method because of its huge and irregular design space. Numerous parameters belong to accelerator architecture and dataflow mode jointly construct a huge design space while power and resource constrains make the design space become quite irregular. Under such circumstances, traditional DSE methods based on exhaustive search is infeasible for the non-trivial design space and methods based on general optimization algorithms will also be inefficient because of the irregular distribution of design points. In this paper, we provide an efficient DSE method named ERDSE for CNN accelerator design on resource limited platform. ERDSE is based on reinforcement learning algorithm REINFORCE but refines it to adapt the complex design space. ERDSE implements off-policy strategy to decouple sampling and learning phase, then separately refines them to further improve exploration ability and samples utilization. We implement ERDSE to optimize the computing latency of CNN accelerator for VGG-16 and MobileNet-V3. Under the tightest constraints, ERDSE achieves 1.2x-1.7x (on VGG-16) and 2.3-4.9x (on MobileNet-V3) latency improvement compared with other DSE methods, which demonstrates the efficiency of ERDSE.
This paper focuses on the development of an infrastructure to enable FPGA-based acceleration in data centers. We present an initial version of an integrated solution that includes automated compilation for accelerator...
详细信息
ISBN:
(纸本)9781450341851
This paper focuses on the development of an infrastructure to enable FPGA-based acceleration in data centers. We present an initial version of an integrated solution that includes automated compilation for accelerator generation, runtime accelerator resource scheduling and management, and acceleration libraries for FPGA-based customized computing for big data applications. The solution can help overcome some of the main challenges with FPGA-based accelerated computing. It has the potential to bring significant performance and energy efficiency improvement for data center applications.
暂无评论