检索结果-内蒙古大学图书馆

E-PRedictor: an approach for early prediction of pull request acceptance

Science China(Information Sciences) 2025年第5期68卷 380-395页

作者： Kexing CHEN Lingfeng BAO Xing HU Xin XIA Xiaohu YANG State Key Laboratory of Blockchain and Data Security Zhejiang University Software Engineering Application Technology Lab

A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.

关键词： pull request prediction model GitHub

来源：评论

学校读者我要写书评

暂无评论

AsyCo: an asymmetric dual-task co-training model for partial-label learning

引用

Science China(Information Sciences) 2025年第5期68卷 332-347页

作者： Beibei LI Yiyuan ZHENG Beihong JIN Tao XIANG Haobo WANG Lei FENG College of Computer Science Chongqing University State Key Laboratory of Computer Science Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences School of Software Technology Zhejiang University School of Computer Science and Engineering Nanyang Technological University

Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.

关键词： machine learning weakly supervised learning partial-label learning co-training models candidate label sets

来源：评论

学校读者我要写书评

暂无评论

Deep learning for code generation: a survey

引用

Science China(Information Sciences) 2024年第9期67卷 5-40页

作者： Huangzhao ZHANG Kechi ZHANG Zhuo LI Jia LI Jia LI Yongmin LI Yunfei ZHAO Yuqi ZHU Fang LIU Ge LI Zhi JIN Key Lab of High Confidence Software Technologies (Peking University) Ministry of Education School of Computer Science Peking University School of Computer Science and Engineering Beihang University

In the past decade, thanks to the powerfulness of deep-learning techniques, we have witnessed a whole new era of automated code generation. To sort out developments, we have conducted a comprehensive review of solutions to deep learning-based code generation. In this survey, we generally formalize the pipeline and procedure of code generation and categorize existing solutions according to taxonomy from perspectives of architecture, model-agnostic enhancing strategy, metrics, and tasks. In addition, we outline the challenges faced by current dominant large models and list several plausible directions for future research. We hope that this survey may provide handy guidance to understanding, utilizing, and developing deep learning-based code-generation techniques for researchers and practitioners.

关键词： code generation automated software engineering deep learning large model artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

LPW: an efficient data-aware cache replacement strategy for Apache Spark

引用

Science China(Information Sciences) 2023年第1期66卷 77-96页

作者： Hui LI Shuping JI Hua ZHONG Wei WANG Lijie XU Zhen TANG Jun WEI Tao HUANG State Key Lab of Computer Science Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences Nanjing Institute of Software Technology

Caching is one of the most important techniques for the popular distributed big data processing framework Spark. For this big data parallel computing framework, which is designed to support various applications based on in-memory computing, it is not possible to cache every intermediate result due to the memory size limitation. The arbitrariness of cache application programming interface(API) usage,the diversity of application characteristics, and the variability of memory resources constitute challenges to achieving high system execution performance. Inefficient cache replacement strategies may cause different performance problems such as long application execution time, low memory utilization, high replacement frequency, and even program execution failure resulting from out of memory. The cache replacement strategy currently adopted by Spark is the least recently used(LRU) strategy. Although LRU is a classical algorithm and has been widely used, it lacks consideration for the environment and workloads. As a result, it cannot achieve good performance under many scenarios. In this paper, we propose a novel cache replacement algorithm, least partition weight(LPW). LPW takes comprehensive consideration of different factors affecting system performance, such as partition size, computational cost, and reference count. The LPW algorithm was implemented in Spark and compared against the LRU as well as other state-of-the-art mechanisms. Our detailed experiments indicate that LPW obviously outperforms its counterparts and can reduce the execution time by up to 75% under typical workloads. Furthermore, the decreasing eviction frequency also shows the LPW algorithm can generate more reasonable predictions.

关键词： Spark memory cache replacement least partition weight data-aware

来源：评论

学校读者我要写书评

暂无评论

Deep learning-based software engineering: progress,challenges, and opportunities

引用

Science China(Information Sciences) 2025年第1期68卷 57-144页

作者： Xiangping CHEN Xing HU Yuan HUANG He JIANG Weixing JI Yanjie JIANG Yanyan JIANG Bo LIU Hui LIU Xiaochen LI Xiaoli LIAN Guozhu MENG Xin PENG Hailong SUN Lin SHI Bo WANG Chong WANG Jiayi WANG Tiantian WANG Jifeng XUAN Xin XIA Yibiao YANG Yixin YANG Li ZHANG Yuming ZHOU Lu ZHANG School of Journalism and Communication Sun Yat-sen University School of Software Technology Zhejiang University School of Software Engineering Sun Yat-sen University School of Software Dalian University of Technology School of Computer Science and Technology Beijing Institute of Technology Key Laboratory of High Confidence Software Technologies (Peking University) Ministry of EducationSchool of Computer Science Peking University State Key Laboratory for Novel Software Technology Nanjing University School of Computer Science and Engineering Beihang University Institute of Information Engineering Chinese Academy of Sciences School of Computer Science Fudan University State Key Laboratory of Complex & Critical Software Environment (CCSE) School of Software Beihang University School of Computer and Information Technology Beijing Jiaotong University School of Computer Science and Technology Harbin Institute of Technology School of Computer Science Wuhan University Huawei Technologies

Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many studies have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However,although several surveys have provided overall pictures of the application of deep learning techniques in software engineering,they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-tuned for software engineering tasks. We still lack surveys explaining the advances of subareas in software engineering driven by deep learning techniques, as well as challenges and opportunities in each subarea. To this end, in this study, we present the first task-oriented survey on deep learning-based software engineering. It covers twelve major software engineering subareas significantly impacted by deep learning techniques. Such subareas spread out through the whole lifecycle of software development and maintenance, including requirements engineering, software development, testing, maintenance, and developer collaboration. As we believe that deep learning may provide an opportunity to revolutionize the whole discipline of software engineering, providing one survey covering as many subareas as possible in software engineering can help future research push forward the frontier of deep learning-based software engineering more systematically. For each of the selected subareas,we highlight the major advances achieved by applying deep learning techniques with pointers to the available datasets i

关键词： deep learning software engineering software benchmark software artifact representation survey

来源：评论

学校读者我要写书评

暂无评论

An Intelligent Privacy Protection Scheme for Efficient Edge Computation Offloading in IoV

引用

Chinese Journal of Electronics 2024年第4期33卷 910-919页

作者： Liang YAO Xiaolong XU Wanchun DOU Muhammad Bilal School of Software Nanjing University of Information Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Department of Computer and Electronics Systems Engineering Hankuk University of Foreign Studies

As a pivotal enabler of intelligent transportation system(ITS), Internet of vehicles(Io V) has aroused extensive attention from academia and industry. The exponential growth of computation-intensive, latency-sensitive,and privacy-aware vehicular applications in Io V result in the transformation from cloud computing to edge computing,which enables tasks to be offloaded to edge nodes(ENs) closer to vehicles for efficient execution. In ITS environment,however, due to dynamic and stochastic computation offloading requests, it is challenging to efficiently orchestrate offloading decisions for application requirements. How to accomplish complex computation offloading of vehicles while ensuring data privacy remains challenging. In this paper, we propose an intelligent computation offloading with privacy protection scheme, named COPP. In particular, an Advanced Encryption Standard-based encryption method is utilized to implement privacy protection. Furthermore, an online offloading scheme is proposed to find optimal offloading policies. Finally, experimental results demonstrate that COPP significantly outperforms benchmark schemes in the performance of both delay and energy consumption.

关键词： Industries Privacy Energy consumption Transportation Computational efficiency Encryption Protection

来源：评论

学校读者我要写书评

暂无评论

STAL3D: Unsupervised Domain Adaptation for 3D Object Detection via Collaborating Self-Training and Adversarial Learning

IEEE Transactions on Intelligent Vehicles

引用

IEEE Transactions on Intelligent Vehicles 2024年第11期9卷 1-12页

作者： Zhang, Yanan Zhou, Chao Huang, Di State Key Laboratory of Software Development Environment School of Computer Science and Engineering Beihang University Beijing China

Existing 3D object detection suffers from expensive annotation costs and poor transferability to unknown data due to the domain gap, Unsupervised Domain Adaptation (UDA) aims to generalize detection models trained in labeled source domains to perform robustly on unexplored target domains, providing a promising solution for cross-domain 3D object detection. Although Self-Training (ST) based cross-domain 3D detection methods with the assistance of pseudo-labeling techniques have achieved remarkable progress, they still face the issue of low-quality pseudo-labels when there are significant domain disparities due to the absence of a process for feature distribution alignment. While Adversarial Learning (AL) based methods can effectively align the feature distributions of the source and target domains, the inability to obtain labels in the target domain forces the adoption of asymmetric optimization losses, resulting in a challenging issue of source domain bias. To overcome these limitations, we propose a novel unsupervised domain adaptation framework for 3D object detection via collaborating ST and AL, dubbed as STAL3D, unleashing the complementary advantages of pseudo labels and feature distribution alignment. Additionally, a Background Suppression Adversarial Learning (BS-AL) module and a Scale Filtering Module (SFM) are designed tailored for 3D cross-domain scenes, effectively alleviating the issues of the large proportion of background interference and source domain size bias. Our STAL3D achieves state-of-the-art performance on multiple cross-domain tasks and even surpasses the Oracle results on Waymo $\rightarrow$ KITTI and Waymo $\rightarrow$ KITTI-rain. IEEE

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

A sharding blockchain-based UAV system for search and rescue missions

引用

Frontiers of computer Science 2025年第3期19卷 103-118页

作者： Xihan ZHANG Jiashuo ZHANG Jianbo GAO Libin XIA Zhi GUAN Hao HU Zhong CHEN School of Computer Science Peking UniversityBeijing 100871China Peking University Chongqing Research Institute of Big Data Chongqing 401329China National Engineering Research Center for Software Engineering Peking UniversityBeijing 100871China State Key Lab for Novel Software Technology Nanjing UniversityNanjing 210023China

Sharding is a promising technique to tackle the critical weakness of scalability in blockchain-based unmanned aerial vehicle(UAV)search and rescue(SAR)*** breaking up the blockchain network into smaller partitions called shards that run independently and in parallel,shardingbased UAV systems can support a large number of search and rescue UAVs with improved scalability,thereby enhancing the rescue ***,the lack of adaptability and interoperability still hinder the application of sharded blockchain in UAV SAR *** refers to making adjustments to the blockchain towards real-time surrounding situations,while interoperability refers to making cross-shard interactions at the mission *** address the above challenges,we propose a blockchain UAV system for SAR missions based on dynamic sharding *** from the benefits in scalability brought by sharding,our system improves adaptability by dynamically creating configurable and mission-exclusive shards,and improves interoperability by supporting calls between smart contracts that are deployed on different *** implement a prototype of our system based on Quorum,give an analysis of the improved adaptability and interoperability,and conduct experiments to evaluate the *** results show our system can achieve the above goals and overcome the weakness of blockchain-based UAV systems in SAR scenarios.

关键词： blockchain sharding unmanned aerial vehicle search and rescue blockchain interoperability

来源：评论

学校读者我要写书评

暂无评论

FIFAWC:a dataset with detailed annotation and rich semantics for group activity recognition

引用

Frontiers of computer Science 2024年第6期18卷 271-272页

作者： Duoxuan PEI Di HUANG Yunhong WANG State Key Laboratory of Software Development Environment School of Computer Science and EngineeringBeihang UniversityBeijing 100191China Intelligent Recognition and Image Processing Lab. School of Computer Science and EngineeringBeihang UniversityBeijing 100191China

1 *** Activity Recognition(GAR),which aims to identify activities performed collectively in videos,has gained significant attention *** conventional action recognition centered on single individuals,GAR explores the c... 详细信息

关键词： has collective gained

来源：评论

学校读者我要写书评

暂无评论

Committed-programming reductions: formalizations,implications and relations

引用

Science China(Information Sciences) 2024年第10期67卷 151-171页

作者： Jiang ZHANG Yu YU Dengguo FENG Shuqin FAN Zhenfeng ZHANG State Key Laboratory of Cryptology Department of Computer Science and Engineering Shanghai Jiao Tong University Trusted Computing and Information Assurance Laboratory Institute of SoftwareChinese Academy of Sciences

In this work, we introduce a class of black-box(BB) reductions called committed-programming reduction(CPRed) in the random oracle model(ROM) and obtain the following interesting results:(1) we demonstrate that some well-known schemes, including the full-domain hash(FDH) signature(Eurocrypt1996) and the Boneh-Franklin identity-based encryption(IBE) scheme(Crypto 2001), are provably secure under CPReds;(2) we prove that a CPRed associated with an instance-extraction algorithm implies a reduction in the quantum ROM(QROM). This unifies several recent results, including the security of the Gentry-Peikert-Vaikuntanathan IBE scheme by Zhandry(Crypto 2012) and the key encapsulation mechanism(KEM) variants using the Fujisaki-Okamoto transform by Jiang et al.(Crypto 2018) in the ***, we show that CPReds are incomparable to non-programming reductions(NPReds) and randomly-programming reductions(RPReds) formalized by Fischlin et al.(Asiacrypt 2010).

关键词： provable security random oracle model quantum random oracle model black-box reduction/separation programmability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：