检索结果-内蒙古大学图书馆

Multi-Dimensional Training Optimization for Efficient Federated Synergy Learning

IEEE Transactions on Mobile computing 2025年第7期24卷 6243-6258页

作者： Fu, Shucun Dong, Fang Chen, Runze Shen, Dian Zhang, Jinghui He, Qiang Southeast University School of Computer Science and Engineering Nanjing China Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan China

Edge learning (EL) is an end-to-edge collaborative learning paradigm enabling devices to participate in model training and data analysis, opening countless opportunities for edge intelligence. As a promising EL framework, federated synergy learning (FSyL) mitigates the computation and communication overhead on resource-constrained devices by offloading partial model layers to the edge server for synergistic training. Nevertheless, due to the system and statistical heterogeneity, naively using existing FSyL methods is significantly time-consuming and causes accuracy degradation. Motivated by this issue, this paper introduces a novel FSyL framework that integrates multi-dimensional training optimization and formulates the edge learning cost minimization (ELCM) problem. To tackle the ELCM efficiently, we design OL-MG, an OnLine Model Splitting and Resource Provisioning Game. Specifically, we first reformulate and decompose the original ELCM based on data quality evaluation. Then, given a model splitting decision, we determine the optimal resource provisioning in Sub-problem1, based on which optimal model splitting in Sub-problem2 is modeled as a potential game. Subsequently, we introduce a decentralized algorithm to find a Nash equilibrium (NE) solution. Furthermore, we further extend OL-MG to support a budget-aware multi-edge scenario. Extensive experiments demonstrate that the proposed mechanism significantly outperforms state-of-the-art methods in cost-saving and accuracy improvement. © 2025 IEEE.

关键词： Federated learning

来源：评论

学校读者我要写书评

暂无评论

ReCSA:a dedicated sort accelerator using ReRAM-based content addressable memory

引用

Frontiers of Computer Science 2023年第2期17卷 1-13页

作者： Huize LI Hai JIN Long ZHENG Yu HUANG Xiaofei LIAO National Engineering Research Center for Big Data Technology and System Services Computing Technology and System LabClusters and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China

With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data *** sorting algorithms have attracted much attention because they can take advantage of different hardware's *** the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the *** this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array *** this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort *** hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting *** sorting algorithm in ReCSA can process various data types,such as integer,float,double,and *** also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort *** experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data *** also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms.

关键词： ReCAM parallel sorting architecture design processing-in-memory

来源：评论

学校读者我要写书评

暂无评论

Scalable Transactional Stream Processing on Multicore Processors

引用

IEEE Transactions on Knowledge and Data Engineering 2025年第07期37卷 4254-4269页

作者： Zhao, Jianjun Mao, Yancan Yang, Zhonghao Liu, Haikun Zhang, Shuhao Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan5430074 China National University of Singapore 119077 Singapore Nanyang Technological University 639798 Singapore

Transactional stream processing engines (TSPEs) are central to modern stream applications handling shared mutable states. However, their full potential, particularly in adaptive scheduling, remains largely unexplored. We present MorphStream, a TSPE designed to optimize parallelism and performance for transactional stream processing on multicores. Through a unique three-stage execution paradigm (i.e., planning, scheduling, and execution), MorphStream enables adaptive scheduling under varying workload characteristics. Building on this foundation, MorphStream is further enhanced with support for non-deterministic state access, employing a stateful task precedence graph to handle undefined read/write sets at runtime while guaranteeing transaction semantics. Additionally, MorphStream incorporates a generalized framework for managing window-based operations, enabling efficient tracking and maintenance of overlapping windows using multi-versioned state management. These extensions enhance the system's ability to process dynamic and irregular workloads. Experimental results demonstrate up to 3.4 times higher throughput and 69.1% lower latency compared to state-of-the-art TSPEs, validating its scalability and adaptability in real-world streaming scenarios. © 1989-2012 IEEE.

关键词： Multicore programming

来源：评论

学校读者我要写书评

暂无评论

Massively parallel algorithms for fully dynamic all-pairs shortest paths

引用

Frontiers of Computer Science 2024年第4期18卷 201-203页

作者： Chilei WANG Qiang-Sheng HUA Hai JIN Chaodong ZHENG National Engineering Research Center for Big Data Technology and System Services Computing Technology and System LabCluster and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China State Key Laboratory for Novel Software Technology Nanjing UniversityNanjing 210023China

1 Introduction In recent years,the Massively Parallel Computation(MPC)model has gained significant ***,most of distributed and parallel graph algorithms in the MPC model are designed for static graphs[1].In fact,the graphs in the real world are constantly *** size of the real-time changes in these graphs is smaller and more *** graph algorithms[2,3]can deal with graph changes more efficiently[4]than the corresponding static graph ***,most studies on dynamic graph algorithms are limited to the single machine ***,a few parallel dynamic graph algorithms(such as the graph connectivity)in the MPC model[5]have been proposed and shown superiority over their parallel static counterparts.

关键词： dynamic shortest gained

来源：评论

学校读者我要写书评

暂无评论

Media Power Measuring via Emotional Contagion

引用

Journal of Social computing 2024年第1期5卷 15-35页

作者： Xue Lin Hong Huang Zongya Li Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System LabCluster and Grid Computing LabSchool of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China School of Journalism Huazhong University of Science and TechnologyWuhan 430074China

Media power,the impact that media have on public opinion and perspectives,plays a significant role in maintaining internal stability,exerting external influence,and shaping international dynamics for nations/***,prior research has primarily concentrated on news content and reporting time,resulting in limitations in evaluating media *** more accurately assess media power,we use news content,news reporting time,and news emotion simultaneously to explore the emotional contagion between *** use emotional contagion to measure the mutual influence between media and regard the media with greater impact as having stronger media *** propose a framework called Measuring Media Power via Emotional Contagion(MMPEC)to capture emotional contagion among media,enabling a more accurate assessment of media power at the media and national/regional *** also interprets experimental results through correlation and causality analyses,ensuring *** analyses confirm the higher accuracy of MMPEC compared to other baseline models,as demonstrated in the context of COVID-19-related news,yielding compelling and interesting insights.

关键词： explainability media power emotional contagion

来源：评论

学校读者我要写书评

暂无评论

Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads

引用

Journal of Computer Science & technology 2023年第4期38卷 807-820页

作者：李若时彭平邵志远金海郑然 National Engineering Research Center for Big Data Technology and System Huazhong University of Science and Technology Wuhan 430074China Services Computing Technology and System Laboratory Huazhong University of Science and TechnologyWuhan 430074 China Cluster and Grid Computing Lab Huazhong University of Science and TechnologyWuhan 430074China

Computer vision(CV)algorithms have been extensively used for a myriad of applications *** the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV *** Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV *** this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge *** our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input ***,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V *** our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.

关键词： RISC-V vector extension single instruction multiple data(SIMD) computer vision OpenCV

来源：评论

学校读者我要写书评

暂无评论

TIGA: Towards Efficient Near Data Processing in SmartNICs-based Disaggregated Memory systems 24

TIGA: Towards Efficient Near Data Processing in SmartNICs-ba...

引用

61st ACM/IEEE Design Automation Conference, DAC 2024

作者： Duan, Zhuohui Yu, Zelin Liu, Haikun Liao, Xiaofei Jin, Hai Zheng, Shijie Wu, Sihan National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology Wuhan430074 China

ISBN: (纸本)9798400706011

Memory disaggregation, facilitated by Smart Network Interface Cards (SmartNICs), has emerged as a cost-effective approach for sharing memory resources in data centers. However, current SoC-based SmartNICs face several challenges for supporting near-data processing (NDP) in disaggregated memory (DM) systems effectively, such as inefficient resource allocation for SmartNICs employed in NDP, and the lack of collaboration between SmartNICs on data nodes and CPUs on compute nodes. To address these issues, we propose TIGA, an efficient NDP framework for SmartNICs-based disaggregated memory systems. We propose an adaptive resource allocator to fully utilize the SoC cores among NDP engines automatically, and a SmartNIC-CPU cooperative computing mechanism to schedule NDP tasks among CPUs and SmartNICs. We prototype TIGA with FPGAs and evaluate it with several typical workloads. Experimental results show that TIGA significantly improves the efficiency of NDP tasks in DM systems compared with state-of-the-art SmartNIC-based co-processing schemes. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Storage allocation (computer)

来源：评论

学校读者我要写书评

暂无评论

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

引用

Journal of Computer Science & technology 2024年第2期39卷 245-266页

作者：廖小飞赵文举金海姚鹏程黄禹王庆刚赵进郑龙张宇邵志远 National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Services Computing Technology and System Laboratory School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Cluster and Grid Computing Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China Zhejiang Lab Hangzhou 311121China

Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph *** domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software *** article presents a graph processing ecosystem from hardware to *** start by introducing a series of hardware accelerators as the foundation of this ***,the codesigned parallel graph systems and their distributed techniques are presented to support graph ***,we introduce our efforts on novel graph applications and hardware *** results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.

关键词： graph processing hardware accelerator software system high performance ecosystem

来源：评论

学校读者我要写书评

暂无评论

Minimal Context-Switching Data Race Detection with Dataflow Tracking

引用

Journal of Computer Science & technology 2024年第1期39卷 211-226页

作者：郑龙李洋辛杰刘海峰郑然廖小飞金海 National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Services Computing Technology and System Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China Cluster and Grid Computing Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China

Data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er sound race ***,this constraint-based approach has serious limitations on helping programmers analyze and understand data ***,it may report a large number of false positives due to the unrecognized dataflow propa-gation of the ***,it recommends a wide range of thread context switches to schedule the reported race(in-cluding the false one)whenever this race is exposed during the constraint-solving *** ad hoc recommendation imposes too many context switches,which complicates the data race *** address these two limitations in the state-of-the-art constraint-based race detection,this paper proposes DFTracker,an improved constraint-based race detec-tor to recommend each data race with minimal thread context ***,we reduce the false positives by ana-lyzing and tracking the dataflow in the *** this means,DFTracker thus reduces the unnecessary analysis of false race *** further propose a novel algorithm to recommend an effective race schedule with minimal thread con-text switches for each data *** experimental results on the real applications demonstrate that 1)without removing any true data race,DFTracker effectively prunes false positives by 68%in comparison with the state-of-the-art constraint-based race detector;2)DFTracker recommends as low as 2.6-8.3(4.7 on average)thread context switches per data race in the real world,which is 81.6%fewer context switches per data race than the state-of-the-art constraint based race ***,DFTracker can be used as an effective tool to understand the data race for programmers.

关键词： data race satisfiability modulo theory multi-threaded program dynamic detection

来源：评论

学校读者我要写书评

暂无评论

AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning 31

AccelES: Accelerating Top-K SpMV for Embedding Similarity vi...

引用

31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025

作者： Zhai, Jiaqi Shi, Xuanhua Huang, Kaiyi Ye, Chencheng Hu, Weifang He, Bingsheng Jin, Hai Huazhong University of Science and Technology National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Wuhan430074 China National University of Singapore School of Computing 119077 Singapore

ISBN: (纸本)9798331506476

In the realm of recommendation systems, achieving real-time performance in embedding similarity tasks is often hindered by the limitations of traditional Top-K sparse matrix-vector multiplication (SpMV) methods, which suffer from high latency due to inefficient memory access patterns. This paper identifies these critical gaps and introduces AccelES, a novel approach that significantly enhances the efficiency of Top-K SpMV. Our method employs a two-stage calculation scheme: the first stage utilizes a compact, low-bit dataset to quickly identify the most relevant entries, while the second stage performs full-precision calculations solely on this pruned subset, thereby minimizing computational overhead. Furthermore, AccelES incorporates innovative matrix representations, Ultra-CSR and Random-CSR, which optimize memory bandwidth utilization. Experimental results demonstrate that AccelES accelerates performance, surpassing state-of-the-art FPGA, GPU, and CPU solutions by factors of 3.4×, 2.5×, and 153.3×, respectively, under controlled conditions. These advancements not only enhance processing speed but also significantly improve real-time performance in recommendation systems, establishing AccelES as a pivotal contribution to the field of Top-K sparse matrix-vector multiplication. © 2025 IEEE.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：