检索结果-内蒙古大学图书馆

2024 international conference for High Performance computing, Networking, Storage and Analysis

作者： Butler, Branden Yu, Sixing Mazaheri, Arya Jannesari, Ali Iowa State Univ Ames IA 50011 USA Tech Univ Darmstadt Darmstadt Germany

ISBN: (数字)9798350352917

ISBN: (纸本)9798350352924;9798350352917

Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce bottlenecks associated with memory bandwidth, but also increase end-to-end latency per inference run, requiring high speculation acceptance rates to improve performance. Combined with a variable rate of acceptance across tasks, speculative inference techniques can result in reduced performance. Additionally, pipeline-parallel designs require many user requests to maintain maximum utilization. As a remedy, we propose PipeInfer, a pipelined speculative acceleration technique to reduce inter-token latency and improve system utilization for single-request scenarios while also improving tolerance to low speculation acceptance rates and low-bandwidth interconnects. PipeInfer exhibits up to a 2.15x improvement in generation speed over standard speculative inference. PipeInfer achieves its improvement through Continuous Asynchronous Speculation and Early Inference Cancellation, the former improving latency and generation speed by running single-token inference simultaneously with several speculative runs, while the latter improves speed and latency by skipping the computation of invalidated runs, even in the middle of inference.

关键词： large language models inference speculation acceleration distributed parallel

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of a Workflow-Based Architecture for distributed Medical Image AI Systems 20

Design and Implementation of a Workflow-Based Architecture f...

引用

20th IEEE international conference on e-Business Engineering, ICEBE 2024

作者： Ge, Yu Zhang, Qi Chen, Yibin Chao, Kuo-Ming Hu, Pan Cai, Hongming School of Software Shanghai Jiao Tong University Shanghai China University of Roehampton Computing Department London United Kingdom

ISBN: (纸本)9798350365856

Medical Image AI Systems can assist doctors in making diagnoses, thereby improving diagnostic accuracy. These systems are now widely used in hospitals. However, current AI diagnostic methods typically rely on various deep learning technologies, which require substantial computational resources. When diagnostic demands surge, traditional monolithic architectures may suffer from low computational performance and queue congestion. To address these issues, this paper compares and analyzes two system architectures based on parallel computing and distributed computing. The first one is a coarse-grained multi-service instance architecture, which uses clustering to expand the system's service instances, though it still presents some challenges. The second is a fine-grained workflow-based distributed architecture, which abstracts the diagnostic process into a workflow divided into several subtasks managed and scheduled by the cluster. This architecture demonstrates advantages in several aspects. Finally, this paper implements a Medical Image AI System for pulmonary fibrosis diagnosis based on the workflow-based distributed system architecture. © 2024 IEEE.

关键词： Medical computing

来源：评论

学校读者我要写书评

暂无评论

A Non-Iterative Fault Location Algorithm for Transposed/Untransposed Transmission Line Immune to Unsynchronized Measurements

引用

IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS 2023年第1期59卷 422-433页

作者： Kalita, Kaustav Anand, Shubham Parida, S. K. Power Grid Corp India Ltd Gurgaon 122001 India Indian Inst Technol Dept Elect Engn Patna 801106 India

This paper presents an impedance-based non-iterative fault location algorithm for a two-terminal line considering unsynchronized measurements to account for the loss of Global Positioning System (GPS) signal. Based on the availability of pre-fault signals two algorithms are proposed, one using pre-fault data and the other without using pre-fault data. The algorithms are formulated by using fundamental frequency phasor based decoupled modal components of the measured signals at both ends of the line while considering distributed line model. Application of decoupled modal components with distributed model enhances the accuracy of the presented algorithms in both transposed and untransposed configuration of the line. The proposed algorithms have been tested and analyzed with different fault conditions simulated in EMTP-RV platform. Comparative analysis with existing methods is also presented to establish prominent features of the proposed methods. The algorithms have also been tested with practical data of Power grid Corporation of India Limited (PGCIL) and the test result validates the accuracy of the developed fault location algorithms.

关键词： Fault location Transmission line measurements Impedance Circuit faults Global Positioning System Synchronization Power transmission lines modal component and unsynchronized measurements two-terminal line untransposed line

来源：评论

学校读者我要写书评

暂无评论

parallel computing Using CUDA and MultiThreading in Background Removal Process 3

Parallel Computing Using CUDA and MultiThreading in Backgrou...

引用

3rd international conference on Digital Transformation and Applications, ICDXA 2024

作者： Tew, Yiqi Lee, Wee Yan Tam, Ga Men Tunku Abdul Rahman University of Management and Technology Faculty of Computing and Information Technology Kuala Lumpur Malaysia

ISBN: (纸本)9798350373424

parallel processing involves challenges on data dependency. In this work, we presented an investigation of performance into image background removal using the Rembg algorithms incorporating parallel computing techniques including Multithreading and Compute Unified Device Architecture (CUDA). In the background subtraction algorithm, each pixel of the image is compared with the pixels around it. Thus, it will be difficult to parallelise the system if the pixel is being accessed at the same time;otherwise, it will lead to data corruption or inconsistent results. CUDA and Multithreading are utilised in 2, 4 and 8 threads on two different hardware specification machines. Results show the proposed work achieves a maximum of 5.3092 and 5.7913 speed up value in CUDA and Multithreading respectively. In a nutshell, dynamic thread allocation and advanced load balancing algorithms that adapt to changing workloads in real-time can ensure that the computational workload is distributed evenly among threads and GPU cores. © 2024 IEEE.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression 24

hZCCL: Accelerating Collective Communication with Co-Designe...

引用

2024 international conference for High Performance computing, Networking, Storage and Analysis

作者： Huang, Jiajun Di, Sheng Yu, Xiaodong Zhai, Yujia Liu, Jinyang Jian, Zizhe Liang, Xin Zhao, Kai Lu, Xiaoyi Chen, Zizhong Cappello, Franck Guo, Yanfei Thakur, Rajeev Univ Calif Riverside Riverside CA 92521 USA Argonne Natl Lab Lemont IL 60439 USA Stevens Inst Technol Hoboken NJ 07030 USA Univ Kentucky Lexington KY USA Florida State Univ Tallahassee FL 32306 USA Univ Calif Merced Merced CA USA

ISBN: (数字)9798350352917

ISBN: (纸本)9798350352924;9798350352917

As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate hZCCL with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 GB/s, surpassing the conventional DOC workflow by up to 36.53x. Moreover, our hZCCL-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12x and 6.77x compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.

关键词： Collective Communication Homomorphic Compression distributed computing parallel Algorithm

来源：评论

学校读者我要写书评

暂无评论

Data Quality Evaluation Method for Power grid Service Transformation Based on Isolation Boundary 4

Data Quality Evaluation Method for Power Grid Service Transf...

引用

4th international conference on Information Science, parallel and distributed Systems, ISPDS 2023

作者： Mao, Zhengxiong Chuan, Tao Zhou, Jing Su, Wenwei He, Yingjun Mei, Donghui Li, Chenglin Information Center of Yunnan Power Grid Co. Ltd Kunming650011 China

ISBN: (纸本)9798350337181

In order to improve the security and evaluation accuracy of power grid business transformation data, this paper proposes a data quality evaluation method for power grid business transformation based on isolation boundary. First, improve the security of grid business transformation data through the isolation boundary between the internal and external networks of data transmission. Secondly, within the security range of the isolation boundary, the data of power grid business transformation is collected through the compression sensing theory. Finally, accuracy, integrity, consistency, timeliness and redundancy are selected as the evaluation indicators of data quality, and the weight of indicators is analyzed to determine the evaluation model. According to the comprehensive evaluation score, the evaluation of data quality of power grid business transformation is completed. The experimental results show that, compared with the traditional methods, this method can safely and completely collect the data of power grid business transformation, and can improve the accuracy of data quality evaluation. © 2023 IEEE.

关键词： Electric power transmission

来源：评论

学校读者我要写书评

暂无评论

Efficiently distributed Federated Learning

Efficiently Distributed Federated Learning

引用

29th international conference on parallel and distributed computing (Euro-Par)

作者： Mittone, Gianluca Birke, Robert Aldinucci, Marco Univ Turin Comp Sci Dept Turin Italy

ISBN: (纸本)9783031488023;9783031488030

Federated Learning (FL) is experiencing a substantial research interest, with many frameworks being developed to allow practitioners to build federations easily and quickly. Most of these efforts do not consider two main aspects that are key to Machine Learning (ML) software: customizability and performance. This research addresses these issues by implementing an open-source FL framework named FastFederatedLearning (FFL). FFL is implemented in C/C++, focusing on code performance, and allows the user to specify any communication graph between clients and servers involved in the federation, ensuring customizability. FFL is tested against Intel OpenFL, achieving consistent speedups over different computational platforms (x86-64, ARM-v8, RISC-V), ranging from 2.5x and 3.69x. We aim to wrap FFL with a Python interface to ease its use and implement a middleware for different communication backends to be used. We aim to build dynamic federations in which relations between clients and servers are not static, giving life to an environment where federations can be seen as long-time evolving structures and exploited as services.

关键词： Federated Learning distributed computing HPC

来源：评论

学校读者我要写书评

暂无评论

Load Forecasting Combination Model Considering Demand Response Resources 4

Load Forecasting Combination Model Considering Demand Respon...

引用

4th international conference on Information Science, parallel and distributed Systems, ISPDS 2023

作者： Wu, Jia Qian, Guoliang Sun, Yifan Qian, Jinyue Pan, Bailang Yue, Jiantong Pinghu Power Supply Company of State Grid Zhejiang Electric Power Co. Ltd. Pinghu Zhejiang314200 China Pinghu General Electric Installation Co. Ltd. Pinghu Zhejiang314200 China

ISBN: (纸本)9798350337181

As an effective means of power demand side management, demand response is of great significance to alleviate the pressure of power grid and maintain the safe and stable operation of power grid. With the rapid development of demand response technology, the load data of power system presents nonlinear characteristics of large scale and complex structure. According to the different characteristics of response load and traditional load, a combined forecasting model considering generalized demand side response resources is built. © 2023 IEEE.

关键词： Electric utilities

来源：评论

学校读者我要写书评

暂无评论

Research on new energy grid-connected human-machine cooperative training system applying parallel control and transformer modeling 3

Research on new energy grid-connected human-machine cooperat...

引用

3rd international conference on Algorithms, Network, and Communication Technology, ICANCT 2024

作者： Li, Miao Wang, Jing Zhuang, Yuan Training Center of Jilin Province State Grid Corporation of China Changchun China

ISBN: (数字)9781510688902

ISBN: (纸本)9781510688896

The purpose of this paper is to propose a panoramic human-machine collaborative training system that can adapt to new energy grid-connected operation conditions, simulate various complex situations, and provide regulators with simulation exercises, intelligent deductions, decision-making references, Q&A services, which provide intelligent reference solutions for precise regulation under new energy access scenarios with high percentage of new energy under a new type of power system. In order to enhance training effectiveness and operational efficiency, it is need to apply intelligent technology and data-driven models instead of experience and human labor. Interactive Q&A application such as Chat GPT has subversively changed the implementation mode of the training process, making the traditional teaching of theory and basic knowledge easier and faster. However, these commercial Q&A systems still have limitations in terms of accuracy, security, as well as professionalism, which makes them inefficient in professional learning area. In this paper, we use the idea of parallel control and the transformer model to construct a human-computer cooperative training system adapted to new energy grid-connected electric power systems, which realize a human-computer cooperative training model that highly integrates the Q&A services with the real trainer, the real trainees, the computer simulation system. By constructing a large model of human-computer system, a training computing experiment platform, as well as a system with a closed loop of training reality, the parallel training will help training program planning, training teaching design, training arrangements, teaching interaction and other key training links, and realize automated and intelligent training design and execution. As a training and management model adapted to the situation of artificial intelligence, parallel training will bring brand new possibilities for the development of the training industry in the intellige

关键词： Decision making

来源：评论

学校读者我要写书评

暂无评论

VAKY: Scheduling In-network Aggregation for distributed Deep Training Acceleration 30

VAKY: Scheduling In-network Aggregation for Distributed Deep...

引用

30th IEEE international conference on parallel and distributed Systems, ICPADS 2024

作者： Cui, Penglai Pan, Heng Zhou, Jianer Wu, Qinghua Wang, Zhaohua Li, Zhenyu Huawei Technologies Co. Ltd. Institute of Computing Technology Beijing China Chinese Academy of Sciences Computer Network Information Center Beijing China Peng Cheng Laboratory Shenzhen China Chinese Academy of Sciences Institute of Computing Technology Beijing China

ISBN: (纸本)9798331515966

distributed machine learning (DML) has recently experienced widespread application. A major performance bottleneck is the costly communication for gradients synchronization. Recently, researchers have explored the use of programmable switches for in-network synchronous aggregation of gradients to mitigate the communication overhead. Nevertheless, the performance of in-network synchronous aggregation is significantly impacted by the stragglers. Unfortunately, the schedulers in existing DML systems are no longer effective in dealing with stragglers because of the ignorance of the aggregation progress that is offloaded from the parameter servers to the programmable switches. To address this gap, this paper presents VAKY, an adaptive scheduler specifically designed for in-network aggregation. At the heart of VAKY is the variable K-block sync method, where the aggregators stop waiting for updates from more workers once having received updates from the fastest K workers for each block of gradients. We propose an efficient solution that can dynamically choose the optimal values of K during the training process, in order to minimize the expected training completion time. We have integrated VAKY into PyTorch, and our experiments show that compared to the state-of-the-art in-network aggregation systems, VAKY improves the aggregation throughput by up to 40% and reduces the training time by 25%. © 2024 IEEE.

关键词： Personnel training

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：