In recent years, driven by hardware technology, the computing power and programmability of GPUs have been rapidly developed. With the characteristics of highly parallelcomputing, GPUs are no longer limited to daily g...
详细信息
Deep learning has been successful in many fields such as acoustics, image, and natural language processing. However, due to the unique characteristics of graphs, deep learning using universal graph data is not easy. T...
详细信息
ISBN:
(纸本)9781728190747
Deep learning has been successful in many fields such as acoustics, image, and natural language processing. However, due to the unique characteristics of graphs, deep learning using universal graph data is not easy. The Graph Attention Networks (GATs) show the best performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to design and implement an FPGA-based accelerator called S-GAT for graph attention networks that achieves excellent performance on acceleration and energy efficiency without losing accuracy, and does not rely on DSPs and large amounts of on-chip memory. We design S-GAT with software and hardware co-optimization. Specifically, we use model compression and feature quantization to reduce the model size, and use shift addition units (SAUs) to convert multiplication into shift operation to further reduce the computation requirements. We integrate the above optimizations into a universal hardware pipeline for various structures of GATs. At last, we evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that S-GAT can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.
Transformer discharge is a serious power system fault, which can lead to catastrophic accidents. Intelligent analysis of discharge faults by collecting transformer sound provides a non-invasive and sustainable health ...
Transformer discharge is a serious power system fault, which can lead to catastrophic accidents. Intelligent analysis of discharge faults by collecting transformer sound provides a non-invasive and sustainable health monitoring method. This paper proposes a lightweight voice fault diagnosis model based on knowledge distillation. The method utilizes the computing power of the supercomputing platform to train multiple teacher models in parallel, and then distill the knowledge of the teacher model into a lightweight student model, which not only achieves good classification accuracy but also can be easily deployed to edge computing platform. The experimental results show that the accuracy of our method is 95.65%, with only 1.28M parameters and only 0.35GFlops calculation amount, which proves the superiority of the method.
Outcome-based education is being adapted in most of the engineering colleges across India. This process is being accredited by a national body called National Board of Accreditation (NBA) in India. This accreditation ...
详细信息
Outcome-based education is being adapted in most of the engineering colleges across India. This process is being accredited by a national body called National Board of Accreditation (NBA) in India. This accreditation process involves a continuous process of defining Program Outcomes (POs) and Course Outcomes (COs) and these are evaluated to check the OBE outcome level. This course evaluation includes course content, evaluation methods used for students' evaluation, pedagogy of teaching, Bloom's taxonomy mapping of learning levels, feedback from stakeholders and results and outcome analysis. If the required level is not attained, the COs and POs are reviewed and accordingly curriculum is revised. There are a lot of challenges being faced by faculty in this process of creating and assessing the curriculum as per NBA standards. These challenges are unique and non-trivial for PDC/HPC (parallel and distributedcomputing/High-Performance computing) courses. The preparation of OBE-based PDC/HPC course involves a lot of stakeholders and brainstorming over multiple sessions. Many universities are adopting PDC/HPC courses more recently across the world. With an intention to create a pointer for developing, delivering and reviewing a PDC/HPC course, this paper presents the course development process for the benefit of various stakeholders and specially PDC/HPC educators and institutions. This research paper presents the undergraduate teaching experience of parallelcomputing (PC) course with a critical evaluation based on course Outcomes (COs) and Bloom's taxonomy mapping of learning levels. This paper compiles a list of challenges faced by PDC/HPC educators and stakeholders focusing on the Indian education scenario. It lists the activities and the suggestions that can be applied to address these challenges suitably. The research and teaching experience-based discussions and strategies proposed in this paper help other PDC/HPC educators and stakeholders in the CS (Computer Sci
As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful,...
详细信息
As an alternative to traditional computing architecture, cloud computing now is rapidly growing. However, it is based on models like cluster computing in general. Now supercomputers are getting more and more powerful, helping scientists have more indepth understanding of the world. At the same time, clusters of commodity servers have been mainstream in the IT industry, powering not only large Internet services but also a growing number of data-intensive scientific applications, such as MPI based deep learning applications. In order to reduce the energy cost, more and more efforts are made to improve the energy consumption of HPC systems. Because I/O accesses account for a large portion of the execution time for data intensive applications, it is critical to design energy-aware parallel I/O functions for addressing challenges related to HPC energy efficiency. As the de facto standard for designing parallel applications in cluster environment, the Message Passing Interface has been widely used in high performance computing, therefore, getting the energy consumption information of MPI applications is critical for improving the energy efficiency of HPC systems. In this work we first present our energy measurement tool, a software framework that eases the energy collection in cluster environment. And then we present an approach which can optimise the parallel I/O operation's energy efficiency. The energy scheduling algorithm is evaluated in a cluster.
Under the “Double Carbon” strategy, the speed of new energy installation and grid connection has been continuously improved. In order to better absorb new energy power generation and ensure the safe and stable opera...
详细信息
Under the “Double Carbon” strategy, the speed of new energy installation and grid connection has been continuously improved. In order to better absorb new energy power generation and ensure the safe and stable operation of the power system, it is urgent to provide ancillary service resources such as frequency regulation and peak shaving. The virtual power plant aggregates the distributed energy scattered in the power grid through advanced communication, computing, dispatching, market and other means, making it a “power generation system” that can be uniformly dispatched, and then follow the dispatching instructions and participate in the ancillary service market. Firstly, the relevant structure and development of virtual power plant is introduced. Secondly, the relevant types of virtual power plants participating in the ancillary service market are analyzed. Thirdly, the actual situation of virtual power plants participating in ancillary service markets for frequency regulation and peak shaving is compared and analyzed with reference to the market access conditions, compensation mechanism and allocation mechanism, which lays a theoretical foundation for different provinces to further improve the mechanism of virtual power plants participating in ancillary service markets. At last, according to China’s national conditions, suggestions on the construction of market mechanism for virtual power plants to participate in ancillary services are proposed.
An algorithm for distributed optimal voltage regulation of distribution networks with distributed generators (DGs) at the grid edge is proposed in the paper. We first introduce a distributed recursive algorithm to est...
详细信息
Work-stealing is a key component of many parallel task graph libraries such as Intel Threading Building Blocks (TBB) FlowGraph, Microsoft Task parallel Library (TPL) ***, Cpp-Taskflow, and Nabbit. However, designing a...
详细信息
ISBN:
(纸本)9781728190747
Work-stealing is a key component of many parallel task graph libraries such as Intel Threading Building Blocks (TBB) FlowGraph, Microsoft Task parallel Library (TPL) ***, Cpp-Taskflow, and Nabbit. However, designing a correct and effective work-stealing scheduler is a notoriously difficult job, due to subtle implementation details of concurrency controls and decentralized coordination between threads. This problem becomes even more challenging when striving for optimal thread usage in handling parallel workloads with complex task graphs. As a result, we introduce in this paper an effective work-stealing scheduler for execution of task dependency graphs. Our scheduler adopts a simple and efficient strategy to adapt the number of working threads to available task parallelism at any time during the graph execution. Our strategy is provably good in preventing resource underutilization and simultaneously minimizing resource waste when tasks are scarce. We have evaluated our scheduler on both micro-benchmarks and a real-world circuit timing analysis workload, and demonstrated promising results over existing methods in terms of runtime, energy efficiency, and throughput.
The proceedings contain 42 papers. The topics discussed include: evaluation of turbulence and non-Newtonian blood rheology models through an FDA nozzle;diagnosing of air compressor faults using frequency data driven a...
ISBN:
(纸本)9781665497367
The proceedings contain 42 papers. The topics discussed include: evaluation of turbulence and non-Newtonian blood rheology models through an FDA nozzle;diagnosing of air compressor faults using frequency data driven approach;experimental analysis modelling of tower solar chimney;an experimental study to improve heat transfer rate in a double pipe heat exchanger using helical tape;study the effect of storage capacity on the performance of swimming pool heating system;improvement the heat dissipation by using different integral finned tubes for cross flow heat exchanger;design a new smart monitoring micro-grid photovoltaic system network based on mobile technology;a low-cost real-time monitoring system for the river level in Wasit province;cloud-based parallelcomputing system via single-client multi-hash single-server multi-thread;concatenated turbo polar codes: an overview;and practical work for a stand-alone photovoltaic system: efficient MPPT using neural network approach.
As one of the basic components of digital signal processing, digital finite impulse response (FIR) filters are widely used in image processing, speech recognition, and many other fields. This paper proposes an improve...
详细信息
ISBN:
(纸本)9781728131290
As one of the basic components of digital signal processing, digital finite impulse response (FIR) filters are widely used in image processing, speech recognition, and many other fields. This paper proposes an improved distributed algorithm (DA) to implement high-order digital FIR filters with less logical delay and hardware utilization. Firstly, the parallel DA is designed and then improved by look-up-table (LUT) decomposition. Secondly, the improved DA FIR filters are implemented on the Xilinx kintex-7 FPGA chip and used in high-speed ground penetrating radar (GPR) system to process radar signals. Finally, the performance of the DA filters with different order and structures are analyzed and compared, taking logical delay and hardware utilization as the key indicators. It comes to a conclusion that the parallel DA with LUT decomposition can implement high-order filter more effectively than traditional structures.
暂无评论