A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software developm...
详细信息
A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.
Mashup developers often need to find open application programming interfaces(APIs) for their composition application development. Although most enterprises and service organizations have encapsulated their businesses ...
详细信息
Mashup developers often need to find open application programming interfaces(APIs) for their composition application development. Although most enterprises and service organizations have encapsulated their businesses or resources online as open APIs, finding the right high-quality open APIs is not an easy task from a library with several open APIs. To solve this problem, this paper proposes a deep learning-based open API recommendation(DLOAR) approach. First, the hierarchical density-based spatial clustering of applications with a noise topic model is constructed to build topic models for Mashup clusters. Second,developers' requirement keywords are extracted by the Text Rank algorithm, and the language model is built. Third, a neural network-based three-level similarity calculation is performed to find the most relevant open APIs. Finally, we complement the relevant information of open APIs in the recommended list to help developers make better choices. We evaluate the DLOAR approach on a real dataset and compare it with commonly used open API recommendation approaches: term frequency-inverse document frequency, latent dirichlet allocation, Word2Vec, and Sentence-BERT. The results show that the DLOAR approach has better performance than the other approaches in terms of precision, recall, F1-measure, mean average precision,and mean reciprocal rank.
Large open source projects are engaged in collaborative software development through social coding platforms, and use CI/CD practices to automate numerous repetitive tasks through workflows. Most CI/CD tools follow th...
详细信息
Development bots are increasingly used in Open Source software (OSS) projects on social coding platforms like GitHub to automate tasks, enforce coding standards, and streamline communication. Studies of specific bots ...
详细信息
Sharding is a promising technique to tackle the critical weakness of scalability in blockchain-based unmanned aerial vehicle(UAV)search and rescue(SAR)*** breaking up the blockchain network into smaller partitions cal...
详细信息
Sharding is a promising technique to tackle the critical weakness of scalability in blockchain-based unmanned aerial vehicle(UAV)search and rescue(SAR)*** breaking up the blockchain network into smaller partitions called shards that run independently and in parallel,shardingbased UAV systems can support a large number of search and rescue UAVs with improved scalability,thereby enhancing the rescue ***,the lack of adaptability and interoperability still hinder the application of sharded blockchain in UAV SAR *** refers to making adjustments to the blockchain towards real-time surrounding situations,while interoperability refers to making cross-shard interactions at the mission *** address the above challenges,we propose a blockchain UAV system for SAR missions based on dynamic sharding *** from the benefits in scalability brought by sharding,our system improves adaptability by dynamically creating configurable and mission-exclusive shards,and improves interoperability by supporting calls between smart contracts that are deployed on different *** implement a prototype of our system based on Quorum,give an analysis of the improved adaptability and interoperability,and conduct experiments to evaluate the *** results show our system can achieve the above goals and overcome the weakness of blockchain-based UAV systems in SAR scenarios.
In the past decade, thanks to the powerfulness of deep-learning techniques, we have witnessed a whole new era of automated code generation. To sort out developments, we have conducted a comprehensive review of solutio...
详细信息
In the past decade, thanks to the powerfulness of deep-learning techniques, we have witnessed a whole new era of automated code generation. To sort out developments, we have conducted a comprehensive review of solutions to deep learning-based code generation. In this survey, we generally formalize the pipeline and procedure of code generation and categorize existing solutions according to taxonomy from perspectives of architecture, model-agnostic enhancing strategy, metrics, and tasks. In addition, we outline the challenges faced by current dominant large models and list several plausible directions for future research. We hope that this survey may provide handy guidance to understanding, utilizing, and developing deep learning-based code-generation techniques for researchers and practitioners.
While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This...
详细信息
Financial statement fraud refers to malicious manipulations of financial data in listed companies'annual *** machine learning approaches focus on individual companies,overlooking the interactive relationships amon...
详细信息
Financial statement fraud refers to malicious manipulations of financial data in listed companies'annual *** machine learning approaches focus on individual companies,overlooking the interactive relationships among companies that are crucial for identifying fraud ***,fraud detection is a typical imbalanced binary classification task with normal samples outnumbering fraud *** this paper,we propose a multi-relational graph convolutional network,named FraudGCN,for detecting financial statement fraud.A multi-relational graph is constructed to integrate industrial,supply chain,and accounting-sharing relationships,effectively encapsulating the multidimensional and complex interactions among *** then develop a multi-relational graph convolutional network to aggregate information within each relationship and employ an attention mechanism to fuse information across multiple *** attention mechanism enables the model to distinguish the importance of different relationships,thereby aggregating more useful information from key *** alleviate the class imbalance problem,we present a diffusion-based under-sampling strategy that strategically selects key nodes globally for model *** also employ focal loss to assign greater weights to harder-to-classify minority *** build a real-world dataset from the annual financial statement of listed companies in *** experimental results show that FraudGCN achieves an improvement of 3.15%in Macro-recall,3.36%in Macro-F1,and 3.86%in GMean compared to the second-best *** dataset and codes are publicly available at:https://***/XNetlab/MRG-for-Finance.
1 *** superior performance of deep models in classification tasks relies heavily on large-scale supervision data with rich features[1].Recent research has shown that improving the feature diversity while expanding the...
详细信息
1 *** superior performance of deep models in classification tasks relies heavily on large-scale supervision data with rich features[1].Recent research has shown that improving the feature diversity while expanding the data scale can improve the classification performance[2,3].Time series augmentation possessing the dual strategy is essential in successfully applying deep models in time series classification.
A large amount of rich data available in today's world encounters a lot of opportunities to analyze the data and identify some valuable patterns from them. However, dealing with such data requires automated framew...
详细信息
暂无评论