Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techn...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated with memory bandwidth, but also increase end-to-end latency per inference run, requiring high speculation acceptance rates to improve performance. Combined with a variable rate of acceptance across tasks, speculative inference techniques can result in reduced performance. Additionally, pipeline-parallel designs require many user requests to maintain maximum utilization. As a remedy, we propose PipeInfer, a pipelined speculative acceleration technique to reduce inter-token latency and improve system utilization for single-request scenarios while also improving tolerance to low speculation acceptance rates and low-bandwidth interconnects. PipeInfer exhibits up to a 2.15x improvement in generation speed over standard speculative inference. PipeInfer achieves its improvement through Continuous Asynchronous Speculation and Early Inference Cancellation, the former improving latency and generation speed by running single-token inference simultaneously with several speculative runs, while the latter improves speed and latency by skipping the computation of invalidated runs, even in the middle of inference.
Medical Image AI Systems can assist doctors in making diagnoses, thereby improving diagnostic accuracy. These systems are now widely used in hospitals. However, current AI diagnostic methods typically rely on various ...
详细信息
This paper presents an impedance-based non-iterative fault location algorithm for a two-terminal line considering unsynchronized measurements to account for the loss of Global Positioning System (GPS) signal. Based on...
详细信息
This paper presents an impedance-based non-iterative fault location algorithm for a two-terminal line considering unsynchronized measurements to account for the loss of Global Positioning System (GPS) signal. Based on the availability of pre-fault signals two algorithms are proposed, one using pre-fault data and the other without using pre-fault data. The algorithms are formulated by using fundamental frequency phasor based decoupled modal components of the measured signals at both ends of the line while considering distributed line model. Application of decoupled modal components with distributed model enhances the accuracy of the presented algorithms in both transposed and untransposed configuration of the line. The proposed algorithms have been tested and analyzed with different fault conditions simulated in EMTP-RV platform. Comparative analysis with existing methods is also presented to establish prominent features of the proposed methods. The algorithms have also been tested with practical data of Power grid Corporation of India Limited (PGCIL) and the test result validates the accuracy of the developed fault location algorithms.
parallel processing involves challenges on data dependency. In this work, we presented an investigation of performance into image background removal using the Rembg algorithms incorporating parallelcomputing techniqu...
详细信息
As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Tr...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate hZCCL with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 GB/s, surpassing the conventional DOC workflow by up to 36.53x. Moreover, our hZCCL-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12x and 6.77x compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.
In order to improve the security and evaluation accuracy of power grid business transformation data, this paper proposes a data quality evaluation method for power grid business transformation based on isolation bound...
详细信息
Federated Learning (FL) is experiencing a substantial research interest, with many frameworks being developed to allow practitioners to build federations easily and quickly. Most of these efforts do not consider two m...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
Federated Learning (FL) is experiencing a substantial research interest, with many frameworks being developed to allow practitioners to build federations easily and quickly. Most of these efforts do not consider two main aspects that are key to Machine Learning (ML) software: customizability and performance. This research addresses these issues by implementing an open-source FL framework named FastFederatedLearning (FFL). FFL is implemented in C/C++, focusing on code performance, and allows the user to specify any communication graph between clients and servers involved in the federation, ensuring customizability. FFL is tested against Intel OpenFL, achieving consistent speedups over different computational platforms (x86-64, ARM-v8, RISC-V), ranging from 2.5x and 3.69x. We aim to wrap FFL with a Python interface to ease its use and implement a middleware for different communication backends to be used. We aim to build dynamic federations in which relations between clients and servers are not static, giving life to an environment where federations can be seen as long-time evolving structures and exploited as services.
As an effective means of power demand side management, demand response is of great significance to alleviate the pressure of power grid and maintain the safe and stable operation of power grid. With the rapid developm...
详细信息
The purpose of this paper is to propose a panoramic human-machine collaborative training system that can adapt to new energy grid-connected operation conditions, simulate various complex situations, and provide regula...
详细信息
ISBN:
(数字)9781510688902
ISBN:
(纸本)9781510688896
The purpose of this paper is to propose a panoramic human-machine collaborative training system that can adapt to new energy grid-connected operation conditions, simulate various complex situations, and provide regulators with simulation exercises, intelligent deductions, decision-making references, Q&A services, which provide intelligent reference solutions for precise regulation under new energy access scenarios with high percentage of new energy under a new type of power system. In order to enhance training effectiveness and operational efficiency, it is need to apply intelligent technology and data-driven models instead of experience and human labor. Interactive Q&A application such as Chat GPT has subversively changed the implementation mode of the training process, making the traditional teaching of theory and basic knowledge easier and faster. However, these commercial Q&A systems still have limitations in terms of accuracy, security, as well as professionalism, which makes them inefficient in professional learning area. In this paper, we use the idea of parallel control and the transformer model to construct a human-computer cooperative training system adapted to new energy grid-connected electric power systems, which realize a human-computer cooperative training model that highly integrates the Q&A services with the real trainer, the real trainees, the computer simulation system. By constructing a large model of human-computer system, a training computing experiment platform, as well as a system with a closed loop of training reality, the parallel training will help training program planning, training teaching design, training arrangements, teaching interaction and other key training links, and realize automated and intelligent training design and execution. As a training and management model adapted to the situation of artificial intelligence, parallel training will bring brand new possibilities for the development of the training industry in the intellige
distributed machine learning (DML) has recently experienced widespread application. A major performance bottleneck is the costly communication for gradients synchronization. Recently, researchers have explored the use...
详细信息
暂无评论