检索结果-内蒙古大学图书馆

E-PRedictor: an approach for early prediction of pull request acceptance

Science China(Information Sciences) 2025年第5期68卷 380-395页

作者： Kexing CHEN Lingfeng BAO Xing HU Xin XIA Xiaohu YANG State Key Laboratory of Blockchain and Data Security Zhejiang University Software Engineering Application Technology Lab

A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.

关键词： pull request prediction model GitHub

来源：评论

学校读者我要写书评

暂无评论

Deep learning-based open API recommendation for Mashup development

引用

Science China(Information Sciences) 2023年第7期66卷 94-111页

作者： Ye WANG Junwu CHEN Qiao HUANG Xin XIA Bo JIANG School of Computer and Information Engineering Zhejiang Gongshang University Software Engineering Application Technology Lab

Mashup developers often need to find open application programming interfaces(APIs) for their composition application development. Although most enterprises and service organizations have encapsulated their businesses or resources online as open APIs, finding the right high-quality open APIs is not an easy task from a library with several open APIs. To solve this problem, this paper proposes a deep learning-based open API recommendation(DLOAR) approach. First, the hierarchical density-based spatial clustering of applications with a noise topic model is constructed to build topic models for Mashup clusters. Second,developers' requirement keywords are extracted by the Text Rank algorithm, and the language model is built. Third, a neural network-based three-level similarity calculation is performed to find the most relevant open APIs. Finally, we complement the relevant information of open APIs in the recommended list to help developers make better choices. We evaluate the DLOAR approach on a real dataset and compare it with commonly used open API recommendation approaches: term frequency-inverse document frequency, latent dirichlet allocation, Word2Vec, and Sentence-BERT. The results show that the DLOAR approach has better performance than the other approaches in terms of precision, recall, F1-measure, mean average precision,and mean reciprocal rank.

关键词： Mashup development open API recommendation deep learning neural network service discovery

来源：评论

学校读者我要写书评

暂无评论

Towards an Empirical Analysis of Code Cloning and Code Reuse in CI/CD Ecosystems 23

Towards an Empirical Analysis of Code Cloning and Code Reuse...

引用

23nd Belgium-Netherlands software Evolution Workshop, BENEVOL 2024

作者： Cardoen, Guillaume Software Engineering Lab University of Mons Belgium

Large open source projects are engaged in collaborative software development through social coding platforms, and use CI/CD practices to automate numerous repetitive tasks through workflows. Most CI/CD tools follow the configuration-as-code paradigm, specifying their workflow configurations as runnable workflow files. We posit that, just as is the case when maintaining regular source code, workflow configuration files are subject to the good and bad practices of reusability and cloning. This paper provides the plan of my doctoral research, explaining the objectives and research questions, and outlining the research method to reach these objectives. My research focuses on the empirical analysis of how code reuse and code cloning practices emerge and evolve in workflow files. The initial focus is on GitHub, taking GitHub Actions as a case study, given that it is by far the most popular CI/CD used in GitHub. © 2024 Copyright for this paper by its authors.

关键词： Computer software reusability

来源：评论

学校读者我要写书评

暂无评论

Towards Understanding Open-Source software Communities 23

Towards Understanding Open-Source Software Communities

引用

23nd Belgium-Netherlands software Evolution Workshop, BENEVOL 2024

作者： Hourri, Youness Software Engineering Lab University of Mons Belgium

Development bots are increasingly used in Open Source software (OSS) projects on social coding platforms like GitHub to automate tasks, enforce coding standards, and streamline communication. Studies of specific bots highlighted their influence on the software development process, both positively, by automating repetitive tasks and improving efficiency, and negatively, by introducing challenges such as workflow disruptions or excessive notifications. This research proposal provides an overview of my research context, goals, and current research progress. My research focuses on the roles of contributors during collaborative OSS development and the effects of development bots on such collaboration. The primary goals are to (i) understand the evolving roles and interactions of contributors and (ii) assess how bots affect productivity, efficiency, and communication. By investigating GitHub activities such as pull requests, issue tracking, and code reviews, I will classify contributors and study their communication patterns. By comparing projects that utilize bots with those that do not, I will assess the impact of bots on key aspects of team dynamics, such as collaboration frequency, decision-making efficiency, and role distribution among contributors. The analysis will focus on measurable outcomes, including the speed of pull request reviews, the number of successfully merged code changes, and the rate of issue resolution. Additionally, I will examine how the presence of bots affects long-term contributor engagement and overall project velocity. This research will provide insights into human-bot interactions, develop guidelines for effective bot design, and suggest strategies to enhance contributor integration and communication, ultimately improving OSS project success. © 2024 Copyright for this paper by its authors.

关键词： Bot (Internet)

来源：评论

学校读者我要写书评

暂无评论

A sharding blockchain-based UAV system for search and rescue missions

引用

Frontiers of Computer Science 2025年第3期19卷 103-118页

作者： Xihan ZHANG Jiashuo ZHANG Jianbo GAO Libin XIA Zhi GUAN Hao HU Zhong CHEN School of Computer Science Peking UniversityBeijing 100871China Peking University Chongqing Research Institute of Big Data Chongqing 401329China National Engineering Research Center for Software Engineering Peking UniversityBeijing 100871China State Key Lab for Novel Software Technology Nanjing UniversityNanjing 210023China

Sharding is a promising technique to tackle the critical weakness of scalability in blockchain-based unmanned aerial vehicle(UAV)search and rescue(SAR)*** breaking up the blockchain network into smaller partitions called shards that run independently and in parallel,shardingbased UAV systems can support a large number of search and rescue UAVs with improved scalability,thereby enhancing the rescue ***,the lack of adaptability and interoperability still hinder the application of sharded blockchain in UAV SAR *** refers to making adjustments to the blockchain towards real-time surrounding situations,while interoperability refers to making cross-shard interactions at the mission *** address the above challenges,we propose a blockchain UAV system for SAR missions based on dynamic sharding *** from the benefits in scalability brought by sharding,our system improves adaptability by dynamically creating configurable and mission-exclusive shards,and improves interoperability by supporting calls between smart contracts that are deployed on different *** implement a prototype of our system based on Quorum,give an analysis of the improved adaptability and interoperability,and conduct experiments to evaluate the *** results show our system can achieve the above goals and overcome the weakness of blockchain-based UAV systems in SAR scenarios.

关键词： blockchain sharding unmanned aerial vehicle search and rescue blockchain interoperability

来源：评论

学校读者我要写书评

暂无评论

Deep learning for code generation: a survey

引用

Science China(Information Sciences) 2024年第9期67卷 5-40页

作者： Huangzhao ZHANG Kechi ZHANG Zhuo LI Jia LI Jia LI Yongmin LI Yunfei ZHAO Yuqi ZHU Fang LIU Ge LI Zhi JIN Key Lab of High Confidence Software Technologies (Peking University) Ministry of Education School of Computer Science Peking University School of Computer Science and Engineering Beihang University

In the past decade, thanks to the powerfulness of deep-learning techniques, we have witnessed a whole new era of automated code generation. To sort out developments, we have conducted a comprehensive review of solutions to deep learning-based code generation. In this survey, we generally formalize the pipeline and procedure of code generation and categorize existing solutions according to taxonomy from perspectives of architecture, model-agnostic enhancing strategy, metrics, and tasks. In addition, we outline the challenges faced by current dominant large models and list several plausible directions for future research. We hope that this survey may provide handy guidance to understanding, utilizing, and developing deep learning-based code-generation techniques for researchers and practitioners.

关键词： code generation automated software engineering deep learning large model artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

An Overview and Catalogue of Dependency Challenges in Open Source software Package Registries 23

An Overview and Catalogue of Dependency Challenges in Open S...

引用

23nd Belgium-Netherlands software Evolution Workshop, BENEVOL 2024

作者： Mens, Tom Decan, Alexandre Software Engineering Lab University of Mons Belgium FRS-FNRS Belgium

While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges. Our overview of this very active research field of package dependency management can be used as a starting point for junior and senior researchers as well as practitioners that would like to learn more about research advances in dealing with the challenges that come with the dependency networks of large OSS package registries. © 2024 Copyright for this paper by its authors.

关键词： component reuse empirical analysis package dependency network software ecosystem software library

来源：评论

学校读者我要写书评

暂无评论

Multi-Relational Graph Representation Learning for Financial Statement Fraud Detection

引用

Big Data Mining and Analytics 2024年第3期7卷 920-941页

作者： Chenxu Wang Mengqin Wang Xiaoguang Wang Luyue Zhang Yi Long School of Software Engineering and also with MoE Key Lab of Intelligent Networks and Network SecurityXi’an Jiaotong UniversityXi’an 710049China School of Software Engineering Xi’an Jiaotong UniversityXi’an 710049China Shenzhen Finance Institute The Chinese University of Hong KongShenzhen(CUHK-Shenzhen)Shenzhen 518026China

Financial statement fraud refers to malicious manipulations of financial data in listed companies'annual *** machine learning approaches focus on individual companies,overlooking the interactive relationships among companies that are crucial for identifying fraud ***,fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud *** this paper,we propose a multi-relational graph convolutional network,named FraudGCN,for detecting financial statement fraud.A multi-relational graph is constructed to integrate industrial,supply chain,and accounting-sharing relationships,effectively encapsulating the multidimensional and complex interactions among *** then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple *** attention mechanism enables the model to distinguish the importance of different relationships,thereby aggregating more useful information from key *** alleviate the class imbalance problem,we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model *** also employ focal loss to assign greater weights to harder-to-classify minority *** build a real-world dataset from the annual financial statement of listed companies in *** experimental results show that FraudGCN achieves an improvement of 3.15%in Macro-recall,3.36%in Macro-F1,and 3.86%in GMean compared to the second-best *** dataset and codes are publicly available at:https://***/XNetlab/MRG-for-Finance.

关键词： financial statement fraud class imbalance Graph Neural Networks(GNN) multi-relational graphs

来源：评论

学校读者我要写书评

暂无评论

ISM:intra-class similarity mixing for time series augmentation

引用

Frontiers of Computer Science 2024年第6期18卷 273-275页

作者： Pin LIU Rui WANG Yongqiang HE Yuzhu WANG School of Information Engineering China University of GeosciencesBeijing 100083China State Key Lab of Software Development Environment Beihang UniversityBeijing 100191China

1 *** superior performance of deep models in classification tasks relies heavily on large-scale supervision data with rich features[1].Recent research has shown that improving the feature diversity while expanding the data scale can improve the classification performance[2,3].Time series augmentation possessing the dual strategy is essential in successfully applying deep models in time series classification.

关键词： classification series mixing

来源：评论

学校读者我要写书评

暂无评论

引用

16th Innovations in software engineering Conference, ISEC 2023

作者： Pandya, Nidhi Tiwari, Saurabh Software Engineering Research Lab DA-IICT Gandhinagar India

ISBN: (纸本)9798400700644

A large amount of rich data available in today's world encounters a lot of opportunities to analyze the data and identify some valuable patterns from them. However, dealing with such data requires automated frameworks and knowledge of programming languages. The most common and widely used programming languages among developers are Java and Python, as evident from the queries and issues posted on Stack Overflow and GitHub. Despite the popularity of both Java and Python, the challenges in transitioning from one technology to another technology are hard for individuals and industries. In this paper, we aim to investigate similarities in the challenges faced by developers while dealing with both programming languages. To achieve this goal, we formulated two research questions (RQs) for understanding the topics and issues asked and faced by developers. We have also identified the temporal trend of asking new questions on Stack Overflow for Java and Python programming languages (PLs). Our results revealed the changing trend, from 2015 onwards, from Java to Python and inclined towards Python from the number of new posts on Stack Overflow. We analyzed 18,892 Stack Overflow questions related to Java and Python PLs and 42,674 issues from 22 different GitHub repositories, 11 for each PL. Our results indicate that questions asked on Stack Overflow are co-related to issues posted by developers on GitHub during real-time development for a respective PL. © 2023 ACM.

关键词： Python

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：