检索结果-内蒙古大学图书馆

E-PRedictor: an approach for early prediction of pull request acceptance

Science China(Information Sciences) 2025年第5期68卷 380-395页

作者： Kexing CHEN Lingfeng BAO Xing HU Xin XIA Xiaohu YANG State Key Laboratory of Blockchain and Data Security Zhejiang University Software Engineering Application Technology Lab

A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.

关键词： pull request prediction model GitHub

来源：评论

学校读者我要写书评

暂无评论

EMTrig: Physical Adversarial Examples Triggered by Electromagnetic Injection towards LiDAR Perception 24

EMTrig: Physical Adversarial Examples Triggered by Electroma...

引用

22nd ACM Conference on Embedded Networked Sensor Systems, SenSys 2024

作者： Liu, Ziwei Lin, Feng Meng, Teshi Baha-Eddine, Benaouda Chouaib Lu, Li Xue, Qiang Ren, Kui Institute of Blockchain and Data Security Hangzhou China State Key Laboratory of Blockchain and Data Security Zhejiang University Hangzhou China

ISBN: (纸本)9798400706974

LiDAR sensors measure the environment by emitting lasers and, when combined with deep neural networks (DNNs), can effectively identify surrounding obstacles such as vehicles and pedestrians. Given its crucial role in autonomous driving perception, the security of LiDAR is closely tied to driving safety. Some studies have explored its vulnerabilities to physical-world attacks, such as laser-based attacks or adversarial objects. However, these methods are either extremely difficult to execute or lack stealth and flexibility. In this paper, we propose a novel attack method called EMTrig, which leverages common roadside objects combined with controlled intentional electromagnetic interference (IEMI) targeting specific LiDARs to create flexible and covert adversarial attacks against designated vehicles. This causes the victim vehicle to misidentify roadside objects as obstacles, such as other vehicles, leading to dangerous driving behaviors like sudden stops and lane changes. Unlike conventional adversarial examples, our deployed objects are common items (e.g., signboards) that are harmless without the IEMI trigger but pose a threat only under IEMI attacks, providing better stealthiness and flexibility. Extensive experiments in both digital and physical domains validate the effectiveness of EMTrig, demonstrating its significant threat to LiDAR perception. © 2024 Copyright is held by the owner/author(s).

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

In Situ Neural Relational Schema Matcher 40

In Situ Neural Relational Schema Matcher

引用

40th IEEE International Conference on data Engineering, ICDE 2024

作者： Du, Xingyu Yuan, Gongsheng Wu, Sai Chen, Gang Lu, Peng Zhejiang University The State Key Laboratory of Blockchain and Data Security China

ISBN: (纸本)9798350317152

The scarcity of training data restricts a neural network from capturing schema diversity and intricacies, hindering schema-matching models' generalization capabilities. In this paper, we propose ISResMat, a framework specifically designed to match the schemas of relational tables by fine-tuning a pre-trained language model. We first offer a training data construction method, Pairwise Sampling, which could generate the training dataset with table data. Next, we design two loss functions (i.e., Meta-Matching Loss and Agent-Delegating Loss) to learn representations of table columns. With those representations, we could calculate matching scores between different table columns for deducing the matching candidates, which provides a novel approach to schema matching. Finally, we present two optimizations (i.e., Matching Rectification Loss and Distribution-Aware Fingerprint) to handle the problems of matching cardinality constraints and numerical columns, respectively. ISResMat is a flexible framework supporting instance-based, schema-based, and hybrid matching without significant modification. Experiments on 500+ fabricated and human-curated relation pairs spanning diverse domains and matching scenarios showcase that our approach outperforms existing state-of-the-art methods. © 2024 IEEE.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models 27

SecPE: Secure Prompt Ensembling for Private and Robust Large...

引用

27th European Conference on Artificial Intelligence, ECAI 2024

作者： Zhang, Jiawen Chen, Kejia Feng, Zunlei Lou, Jian Song, Mingli State Key Laboratory of Blockchain and Data Security Zhejiang University China

ISBN: (纸本)9781643685489

With the growing popularity of LLMs among the general public users, privacy-preserving and adversarial robustness have become two pressing demands for LLM-based services, which have largely been pursued separately but rarely jointly. In this paper, to the best of our knowledge, we are among the first attempts towards robust and private LLM inference by tightly integrating two disconnected fields: private inference and prompt ensembling. The former protects users' privacy by encrypting inference data transmitted and processed by LLMs, while the latter enhances adversarial robustness by yielding an aggregated output from multiple prompted LLM responses. Although widely recognized as effective individually, private inference for prompt ensembling together entails new challenges that render the naive combination of existing techniques inefficient. To overcome the hurdles, we propose SecPE, which designs efficient fully homomorphic encryption (FHE) counterparts for the core algorithmic building blocks of prompt ensembling. We conduct extensive experiments on 8 tasks to evaluate the accuracy, robustness, and efficiency of SecPE. The results show that SecPE maintains high clean accuracy and offers better robustness at the expense of merely 2.5% efficiency overhead compared to baseline private inference methods, indicating a satisfactory "accuracy-robustness-efficiency" tradeoff. For the efficiency of the encrypted ARGMAX operation that incurs major slowdown for prompt ensembling, SecPE is 35.4 times faster than the state-of-the-art peers, which can be of independent interest beyond this work. © 2024 The Authors.

关键词： Differential privacy

来源：评论

学校读者我要写书评

暂无评论

IMPRESS: An Importance-Informed Multi-Tier Prefix KV Storage System for Large Language Model Inference 23

IMPRESS: An Importance-Informed Multi-Tier Prefix KV Storage...

引用

23rd USENIX Conference on File and Storage Technologies, FAST 2025

作者： Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Ruidong Yang, Siling Chen, Ping Zheng, Yi Huai, Baoxing Chen, Gang The State Key Laboratory of Blockchain and Data Security Zhejiang University China Institute of Blockchain and Data Security China Zhejiang Key Laboratory of Big Data Intelligent Computing China Huawei Cloud China

ISBN: (纸本)9781939133458

Modern advanced large language model (LLM) applications often prepend long contexts before user queries to improve model output quality. These contexts frequently repeat, either partially or fully, across multiple queries. Existing systems typically store and reuse the keys and values of these contexts (referred to as prefix KVs) to reduce redundant computation and time to first token (TTFT). When prefix KVs need to be stored on disks due to insufficient CPU memory, reusing them does not always reduce TTFT, as disk I/O latency is high. In this paper, we propose IMPRESS, an importance-informed multi-tier prefix KV storage system to reduce I/O delay for LLM inference by only loading important prefix KVs. IMPRESS first leverages the insight that there is significant similarity in important token index sets across attention heads and introduces an I/O-efficient important KV identification algorithm. It then optimizes prefix KV storage and caching through importance-informed KV management, reducing TTFT during model inference. Our experimental results show that IMPRESS can reduce TTFT by up to 2.8× compared to state-of-the-art systems, while maintaining comparable inference accuracy. © 2025 FAST. All Rights Reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration 23

LeapGNN: Accelerating Distributed GNN Training Leveraging Fe...

引用

23rd USENIX Conference on File and Storage Technologies, FAST 2025

作者： Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen The State Key Laboratory of Blockchain and Data Security Zhejiang University China Zhejiang Lab China Institute of Blockchain and Data Security China Zhejiang Key Laboratory of Big Data Intelligent Computing China Washington State University Vancouver United States

ISBN: (纸本)9781939133458

Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that leverages a refined structure to enhance locality, combined with the model migration technique, to minimize remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2× compared to the state-of-the-art method, namely P3 © 2025 FAST. All Rights Reserved.

关键词： Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Block-gram:Mining knowledgeable features for efficiently smart contract vulnerability detection

引用

Digital Communications and Networks 2025年第1期11卷 1-12页

作者： Xueshuo Xie Haolong Wang Zhaolong Jian Yaozheng Fang Zichun Wang Tao Li Tianjin Key Laboratory of Network and Data Security Technology TianjinChina College of Computer Science Nankai UniversityTianjinChina Key Laboratory of Blockchain and Cyberspace Governance of Zhejiang Province China State Key Laboratory of Computer Architecture Institute of Computing TechnologyChinese Academy of SciencesChina

Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on *** vulnerability detection of large-scale smart contracts is critical,as attacks on smart contracts often cause huge economic *** it is difficult to repair and update smart contracts,it is necessary to find the vulnerabilities before they are ***,code analysis,which requires traversal paths,and learning methods,which require many features to be trained,are too time-consuming to detect large-scale on-chain ***-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol *** the existing features lack the interpretability of the detection results and training model,even worse,the large-scale feature space also affects the efficiency of *** paper focuses on improving the detection efficiency by reducing the dimension of the features,combined with expert *** this paper,a feature extraction model Block-gram is proposed to form low-dimensional knowledge-based features from ***,the metadata is separated and the runtime code is converted into a sequence of opcodes,which are divided into segments based on some instructions(jumps,etc.).Then,scalable Block-gram features,including 4-dimensional block features and 8-dimensional attribute features,are mined for the learning-based model ***,feature contributions are calculated from SHAP values to measure the relationship between our features and the results of the detection *** addition,six types of vulnerability labels are made on a dataset containing 33,885 contracts,and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms,which show that the average detection latency speeds up 25×to 650×,compared with the features extracted by N-gram,and also can enhance the interpretability of the detection model.

关键词： Smart contract Bytecode&opcode Knowledgeable features Vulnerability detection Feature contribution

来源：评论

学校读者我要写书评

暂无评论

Association Pattern-aware Fusion for Biological Entity Relationship Prediction 38

Association Pattern-aware Fusion for Biological Entity Relat...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Jia, Lingxiang Ying, Yuchen Feng, Zunlei Zhong, Zipeng Yao, Shaolun Hu, Jiacong Duan, Mingjiang Wang, Xingen Song, Jie Song, Mingli State Key Laboratory of Blockchain and Data Security Zhejiang University China Institute of Blockchain and Data Security China Bangsheng Technology Co Ltd. China

Deep learning-based methods significantly advance the exploration of associations among triple-wise biological entities (e.g., drug-target protein-adverse reaction), thereby facilitating drug discovery and safeguarding human health. However, existing researches only focus on entity-centric information mapping and aggregation, neglecting the crucial role of potential association patterns among different entities. To address the above limitation, we propose a novel association pattern-aware fusion method for biological entity relationship prediction, which effectively integrates the related association pattern information into entity representation learning. Additionally, to enhance the missing information of the low-order message passing, we devise a bind-relation module that considers the strong bind of low-order entity associations. Extensive experiments conducted on three biological datasets quantitatively demonstrate that the proposed method achieves about 4%-23% hit@1 improvements compared with state-of-the-art baselines. Furthermore, the interpretability of association patterns is elucidated in detail, thus revealing the intrinsic biological mechanisms and promoting it to be deployed in real-world scenarios. Our data and code are available at https://***/hry98kki/PatternBERP. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Simple and Fast Distillation of Diffusion Models 38

Simple and Fast Distillation of Diffusion Models

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Zhou, Zhenyu Chen, Defang Wang, Can Chen, Chun Lyu, Siwei Zhejiang University State Key Laboratory of Blockchain and Data Security China Institute of Blockchain and Data Security China University at Buffalo State University of New York United States

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000×. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://***/zju-pi/diff-sampler. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective 38

A2PO: Towards Effective Offline Reinforcement Learning from ...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Qing, Yunpeng Liu, Shunyu Cong, Jingyuan Chen, Kaixuan Zhou, Yihe Song, Mingli College of Computer Science and Technology Zhejiang University China State Key Laboratory of Blockchain and Data Security Zhejiang University China Institute of Blockchain and Data Security China

Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.e., different behavior policies may exhibit inconsistent actions with distinct returns across the state space. To remedy this issue, recent advantage-weighted methods prioritize samples with high advantage values for agent training while inevitably ignoring the diversity of behavior policy. In this paper, we introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning under mixed-quality datasets. Specifically, A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies by modeling the advantage values of all training data as conditional variables. Then the agent can follow such disentangled action distribution constraints to optimize the advantage-aware policy towards high advantage values. Extensive experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts. Our code is available at https://***/Plankson/A2PO. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：