咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Retrospecting Available CPU Re... 收藏

Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers

作     者:Liao, Haoyu Liu, Tong-yu Guo, Jianmei Huang, Bo Yang, Dingyu Ding, Jonathan 

作者机构:East China Normal Univ Sch Data Sci & Engn Shanghai 200062 Peoples R China TRE Alibaba Grp Hangzhou 311121 Zhejiang Peoples R China SATG Intel Shanghai 200131 Peoples R China 

出 版 物:《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 (IEEE Trans Parallel Distrib Syst)

年 卷 期:2025年第36卷第1期

页      面:67-83页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Natural Science Foundation of China Hangzhou High-TechZone (Binjiang) Institute of Blockchain and Data Security 

主  题:Hardware Data centers Cloud computing Servers Processor scheduling Program processors Monitoring Message systems Benchmark testing Accuracy SMT interference data center QoS microarchitecture latency-sensitive applications 

摘      要:The article focuses on an understudied yet fundamental problem: existing methods typically average the utilization of multiple hardware threads to evaluate the available CPU resources. However, the approach could underestimate the actual usage of the underlying physical core for Simultaneous Multi-Threading (SMT) processors, leading to an overestimation of remaining resources. The overestimation propagates from microarchitecture to operating systems and cloud schedulers, which may misguide scheduling decisions, exacerbate CPU overcommitment, and increase Service Level Agreement (SLA) violations. To address the potential overestimation problem, we propose an SMT-aware and purely data-driven approach named Remaining CPU (RCPU) that reserves more CPU resources to restrict CPU overcommitment and prevent SLA violations. RCPU requires only a few modifications to the existing cloud infrastructures and can be scaled up to large data centers. Extensive evaluations in the data center proved that RCPU contributes to a reduction of SLA violations by 18% on average for 98% of all latency-sensitive applications. Under a benchmarking experiment, we prove that RCPU increases the accuracy by 69% in terms of Mean Absolute Error (MAE) compared to the state-of-the-art.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分