检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Wan, Yao Wan, Guanghua Zhang, Shijie Zhang, Hongyu Zhou, Pan Jin, Hai Sun, Lichao National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China Huazhong University of Science and Technology China University of Leigh United States

Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Examples of such models include CodeGPT and StarCoder. These models are typically trained from a large amount of source code collected from open-source communities such as GitHub. Although using source code in GitHub has been a common practice for training deep-learning-based models for code completion, it may induce some legal and ethical issues such as copyright infringement. In this paper, we investigate the legal and ethical issues of current neural code completion models by answering the following question: Is my code used to train your neural code completion model? To this end, we tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks to a more challenging task of code completion. In particular, since the target code completion models perform as opaque black boxes, preventing access to their training data and parameters, we opt to train multiple shadow models to mimic their behavior. The acquired posteriors from these shadow models are subsequently employed to train a membership classifier. Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model. We comprehensively evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models, (i.e., LSTM-based, CodeGPT, CodeGen, and StarCoder). Experimental results reveal that the LSTM-based and CodeGPT models suffer the membership leakage issue, which can be easily detected by our proposed membership inference approach with an accuracy of 0.842, and 0.730, respectively. Interestingly, our experiments also show that the data membership of current large language models of code, e.g., CodeGen and StarCoder, is difficult to detect, leaving amper space for further improvement. Finally, we also t

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

On Pipelined GCN with Communication-Efficient Sampling and Inclusion-Aware Caching

On Pipelined GCN with Communication-Efficient Sampling and I...

引用

IEEE Annual Joint Conference: INFOCOM, IEEE Computer and Communications Societies

作者： Shulin Wang Qiang Yu Xiong Wang Yuqing Li Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China School of Cyber Science and Engineering Wuhan University Wuhan China

ISBN: (数字)9798350383508

ISBN: (纸本)9798350383515

Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by frequent cross-worker communications. Existing efforts to improve the training efficiency often come at the expense of GCN performance, while the communication overhead persists. In this paper, we propose PSC-GCN, a holistic pipelined framework for distributed GCN training with communication-efficient sampling and inclusion-aware caching, to address the communication bottleneck while ensuring satisfactory model performance. Specifically, we devise an asynchronous pre-fetching scheme to retrieve stale statistics (features, embedding, gradient) of boundary nodes in advance, such that the embedding aggregation and model update are pipelined with statistics transmission. To alleviate communication volume and staleness effect, we introduce a variance-reduction based sampling policy, which prioritizes inner nodes over boundary ones for reducing the access frequency to remote neighbors, thus mitigating cross-worker statistics exchange. Complementing graph sampling, a feature caching module is co-designed to buffer hot nodes with high inclusion probability, ensuring that frequently sampled nodes will be available in local memory. Extensive evaluations on real-world datasets show the superiority of PSC-GCN over state-of-the-art methods, where we can reduce training time by 72%-80% without sacrificing model accuracy.

关键词： Training Estimation error Accuracy Graph convolutional networks Computational modeling Probability Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction 21

SumPA: Efficient Pattern-Centric Graph Mining with Pattern A...

引用

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques

作者： Chuangyi Gui Xiaofei Liao Long Zheng Pengcheng Yao Qinggang Wang Hai Jin National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

ISBN: (纸本)9781665442787

Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.

关键词： data reuse

来源：评论

学校读者我要写书评

暂无评论

FDGLib: A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

引用

Journal of Computer Science & technology 2021年第5期36卷 1051-1070页

作者： Yu-Wei Wu Qing-Gang Wang Long Zheng Xiao-Fei Liao Hai Jin Wen-Bin Jiang Ran Zheng Kan Hu National Engineering Research Center for Big Data Technology and System School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Services Computing Technology and System Laboratory School of Computer Science and Technology Huazhong University of Science and TechnologyWuhan 430074China Cluster and Grid Computing Laboratory School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan 430074China

With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip(board)storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array(FPGA)becomes *** multi-FPGA acceleration is of great necessity and *** cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph *** this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering *** provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code *** the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement *** interface FDGLib into AccuGraph,a state-of-the-art graph *** results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.

关键词： data center accelerator graph processing distributed architecture communication optimization

来源：评论

学校读者我要写书评

暂无评论

No Free Lunch Theorem for Privacy-Preserving LLM Inference

arXiv

引用

arXiv 2024年

作者： Zhang, Xiaojin Fei, Yulin Kang, Yan Chen, Wei Fan, Lixin Jin, Hai Yang, Qiang Huazhong University of Science and Technology China WeBank China Hong Kong University of Science and Technology China National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan430074 China

Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the frontiers of technology and science. However, LLMs also pose privacy concerns. Users’ interactions with LLMs may expose their sensitive personal or company information. A lack of robust privacy safeguards and legal frameworks could permit the unwarranted intrusion or improper handling of individual data, thereby risking infringements of privacy and the theft of personal identities. To ensure privacy, it is essential to minimize the dependency between shared prompts and private information. Various randomization approaches have been proposed to protect prompts’ privacy, but they may incur utility loss compared to unprotected LLMs prompting. Therefore, it is essential to evaluate the balance between the risk of privacy leakage and loss of utility when conducting effective protection mechanisms. The current study develops a framework for inferring privacy-protected Large Language Models (LLMs) and lays down a solid theoretical basis for examining the interplay between privacy preservation and utility. The core insight is encapsulated within a theorem that is called as the NFL (abbreviation of the word No-Free-Lunch) Theorem. Copyright © 2024, The Authors. All rights reserved.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices

arXiv

引用

arXiv 2025年

作者： Yao, Dezhong Shi, Yuexin Liu, Tongtong Xu, Zhiqiang National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan430074 China Mohamed bin Zayed University of Artificial Intelligence United Arab Emirates

Federated Learning (FL) is increasingly adopted in edge computing scenarios, where a large number of heterogeneous clients operate under constrained or sufficient resources. The iterative training process in conventional FL introduces significant computation and communication overhead, which is unfriendly for resource-constrained edge devices. One-shot FL has emerged as a promising approach to mitigate communication overhead, and model-heterogeneous FL solves the problem of diverse computing resources across clients. However, existing methods face challenges in effectively managing model-heterogeneous one-shot FL, often leading to unsatisfactory global model performance or reliance on auxiliary datasets. To address these challenges, we propose a novel FL framework named FedMHO, which leverages deep classification models on resource-sufficient clients and lightweight generative models on resource-constrained devices. On the server side, FedMHO involves a two-stage process that includes data generation and knowledge fusion. Furthermore, we introduce FedMHO-MD and FedMHO-SD to mitigate the knowledge-forgetting problem during the knowledge fusion stage, and an unsupervised data optimization solution to improve the quality of synthetic samples. Comprehensive experiments demonstrate the effectiveness of our methods, as they outperform state-of-the-art baselines in various experimental setups. Our code is available at https://***/YXShi2000/FedMHO. © 2025, CC BY.

关键词： Federated learning

来源：评论

学校读者我要写书评

暂无评论

Fast Parallel Recovery for Transactional Stream Processing on Multicores

Fast Parallel Recovery for Transactional Stream Processing o...

引用

International Conference on Data Engineering

作者： Jianjun Zhao Haikun Liu Shuhao Zhang Zhuohui Duan Xiaofei Liao Hai Jin Yu Zhang National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China School of Computer Science and Engineering Nanyang Technological University Singapore

ISBN: (数字)9798350317152

ISBN: (纸本)9798350317169

Transactional stream processing engines (TSPEs) have gained increasing attention due to their capability of processing real-time stream applications with transactional semantics. However, TSPEs remain susceptible to system failures and power outages. Existing TSPEs mainly focus on performance improvement, but still face a significant challenge to guarantee fault tolerance while offering high-performance services. We revisit commonly-used fault tolerance approaches in stream processing and database systems, and find that these approaches do not work well on TSPEs due to complex data dependencies. In this paper, we propose a novel TSPE called MorphStreamR to achieve fast failure recovery while guaranteeing low performance overhead at runtime. The key idea of MorphStreamR is to record intermediate results of resolved dependencies at runtime, and thus eliminate data dependencies to improve task parallelism during failure recovery. MorphStreamR further mitigates the runtime overhead by selectively tracking data dependencies and incorporating workload-aware log commitment. Experimental results show that MorphStreamR can significantly reduce the recovery time by up to 3.1 x while experiencing much less performance slowdown at runtime, compared with other applicable fault tolerance approaches.

关键词： Fault tolerance Runtime Fault tolerant systems Semantics Parallel processing Real-time systems Power system reliability

来源：评论

学校读者我要写书评

暂无评论

SharDAG: Scaling DAG-Based Blockchains Via Adaptive Sharding

SharDAG: Scaling DAG-Based Blockchains Via Adaptive Sharding

引用

International Conference on Data Engineering

作者： Feng Cheng Jiang Xiao Cunyang Liu Shijie Zhang Yifan Zhou Bo Li Baochun Li Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China Hong Kong University of Science and Technology Hong Kong University of Toronto Canada

ISBN: (数字)9798350317152

ISBN: (纸本)9798350317169

Directed Acyclic Graph (DAG)-based blockchain (a.k.a distributed ledger) has become prevalent for supporting highly concurrent applications. Its inherent parallel data structure accelerates block generation significantly, shifting the bottleneck from performance to storage scalability. An intuitive solution is to apply state sharding that divides the entire ledger (i.e., transactions and states) into multiple shards. While each node only stores proportional transactions, it suffers from the challenges of storing and ensuring the processing consistency of cross-shard transactions. In this paper, we propose SharDAG, a new mechanism that leverages adaptive sharding for DAG-based blockchains to achieve high performance and strong consistency. The key idea of SharDAG is to exploit unique characteristics - silent assets - and design a lightweight processing mechanism based on avatar account caching. Furthermore, we design a Byzantine resilient cross-shard verification mechanism with a theoretically optimal number of participating nodes, which guarantees the consistency and security of avatar account aggregation. Our comprehensive evaluations on real-world workloads demonstrate that SharDAG presents up to 3.8 x throughput improvement compared to the state-of-the-art and reduces the storage overhead of cross-shard transactions.

关键词： Sharding Directed acyclic graph Distributed ledger Avatars Scalability Receivers Throughput

来源：评论

学校读者我要写书评

暂无评论

Efficient distributed algorithms for holistic aggregation functions on random regular graphs

引用

Science china(Information Sciences) 2022年第5期65卷 32-50页

作者： Lin JIA Qiang-Sheng HUA Haoqiang FAN Qiuping WANG Hai JIN National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and TechnologyHuazhong University of Science and Technology Institute for Interdisciplinary Information Science Tsinghua University

In this paper, we propose efficient distributed algorithms for three holistic aggregation functions on random regular graphs that are good candidates for network topology in next-generation data *** three holistic aggregation functions include SELECTION（select the k-th largest or smallest element）,DISTINCT（query the count of distinct elements）, MODE（query the most frequent element）. We design three basic techniques — Pre-order Network Partition, Pairwise-independent Random Walk, and Random Permutation Delivery, and devise the algorithms based on the techniques. The round complexity of the distributed SELECTION is Θ（log N） which meets the lower bound where N is the number of nodes and each node holds a numeric element. The round complexity of the distributed DISTINCT and MODE algorithms are O（log3N/log log N） and O（log2N log log N） respectively. All of our results break the lower bounds obtained on general graphs and our distributed algorithms are all based on the CON GE S T model, which restricts each node to send only O（log N） bits on each edge in one round under synchronous communications.

关键词： distributed algorithms holistic aggregation function random regular graph \({\cal C}{\cal O}{\cal N}{\cal G}{\cal E}{\cal S}{\cal T}\) model communication complexity round complexity

来源：评论

学校读者我要写书评

暂无评论

Graph for Science: From API based Programming to Graph Engine based Programming for HPC

arXiv

引用

arXiv 2023年

作者： Zhang, Yu Wang, Zixiao Zhao, Jin Guo, Yuluo Yu, Hui Huang, Zhiying Shi, Xuanhua Liao, Xiaofei National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab. Cluster and Grid Computing Lab. School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

Modern scientific applications predominantly run on large-scale computing platforms, necessitating collaboration between scientific domain experts and high-performance computing (HPC) experts. While domain experts are often skilled in customizing domain-specific scientific computing routines, which often involves various matrix computations, HPC experts are essential for achieving efficient execution of these computations on large-scale platforms. This process often involves utilizing complex parallel computing libraries tailored to specific matrix computation scenarios. However, the intricate programming procedure and the need for deep understanding in both application domains and HPC poses significant challenges to the widespread adoption of scientific computing. In this research, we observe that matrix computations can be transformed into equivalent graph representations, and that by utilizing graph processing engines, HPC experts can be freed from the burden of implementing efficient scientific computations. Based on this observation, we introduce a graph engine-based scientific computing (Graph for Science) paradigm, which provides a unified graph programming interface, enabling domain experts to promptly implement various types of matrix computations. The proposed paradigm leverages the underlying graph processing engine to achieve efficient execution, eliminating the needs for HPC expertise in programming large-scale scientific applications. We evaluate the performance of the developed graph compute engine for three typical scientific computing routines. Our results demonstrate that the graph engine-based scientific computing paradigm achieves performance comparable to the best-performing implementations based on existing parallel computing libraries and bespoke implementations. Importantly, the paradigm greatly simplifies the development of scientific computations on large-scale platforms, reducing the programming difficulty for scientists and facilitating b

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：