检索结果-内蒙古大学图书馆

E-PRedictor: an approach for early prediction of pull request acceptance

Science China(Information Sciences) 2025年第5期68卷 380-395页

作者： Kexing CHEN Lingfeng BAO Xing HU Xin XIA Xiaohu YANG State Key Laboratory of Blockchain and Data Security Zhejiang University Software Engineering Application Technology Lab

A pull request(PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the opensource community. Nevertheless, the massive number of PRs in an open-source software(OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while(e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor(earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features(i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models(e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC(area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.

关键词： pull request prediction model GitHub

来源：评论

学校读者我要写书评

暂无评论

Large language model for table processing: a survey

引用

Frontiers of Computer Science 2025年第2期19卷 71-87页

作者： Weizheng LU Jing ZHANG Ju FAN Zihao FU Yueguo CHEN Xiaoyong DU School of Information Renmin University of ChinaBeijing 100872China Key Laboratory of Data Engineering and Knowledge Engineering Beijing 100872China WPS Office Kingsoft Co.Zhuhai 519080China

Tables,typically two-dimensional and structured to store large amounts of data,are essential in daily activities like database queries,spreadsheet manipulations,Web table question answering,and image table information *** these table-centric tasks with Large Language Models(LLMs)or Visual Language Models(VLMs)offers significant public benefits,garnering interest from academia and *** survey provides a comprehensive overview of table-related tasks,examining both user scenarios and technical *** covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data *** summarize the training techniques for LLMs and VLMs tailored for table ***,we discuss prompt engineering,particularly the use of LLM-powered agents,for various tablerelated ***,we highlight several challenges,including diverse user input when serving and slow thinking using chainof-thought.

关键词： data mining and knowledge discovery table processing large language model

来源：评论

学校读者我要写书评

暂无评论

Enhancing Storage Efficiency and Performance: A Survey of data Partitioning Techniques

引用

Journal of Computer Science & Technology 2024年第2期39卷 346-368页

作者：刘鹏举李翠平陈红 Distinguished Member CCF 1.School of InformationRenmin University of ChinaBeijing 100872China Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education Beijing 100872China

data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational ***,dynamic environments necessitate robust partition detection *** paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are *** discuss partitioning features pertaining to database schema,table data,workload,and runtime *** then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization ***,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and *** survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.

关键词： data partitioning survey partitioning feature partition generation partition update

来源：评论

学校读者我要写书评

暂无评论

Study on the Characteristics of Cross-Domain knowledge Diffusion from Science to Policy: Evidence from Overton data

引用

Proceedings of the Association for Information Science and Technology 2023年第1期60卷 368-378页

作者： Ren, Chao Yang, Menghui School of Information Resource Management Renmin University of China China Key Laboratory of Data Engineering and Knowledge Engineering Ministry of Education China

The cross-domain knowledge diffusion from science to policy is a prevalent phenomenon that demands academic attention. To investigate the characteristics of cross-domain knowledge diffusion from science to policy, this study suggests using the citation of policies to scientific articles as a basis for quantifying the diffusion strength, breadth, and speed. The study reveals that the strength and breadth of cross-domain knowledge diffusion from scientific papers to policies conform to a power-law distribution, while the speed follows a logarithmic normal distribution. Moreover, the papers with the highest diffusion strength, breadth, and fastest diffusion speed are predominantly from world-renowned universities, scholars, and top journals. The papers with the highest diffusion strength and breadth are mostly from social sciences, especially economics, those with the fastest diffusion speed are mainly from medical and life sciences, followed by social sciences. The findings indicate that cross-domain knowledge diffusion from science to policy follows the Matthew effect, whereby individuals or institutions with high academic achievements are more likely to achieve successful cross-domain knowledge diffusion. Furthermore, papers in the field of economics tend to have the higher cross-domain knowledge diffusion strength and breadth, while those in medical and life sciences have the faster cross-domain knowledge diffusion speed. 86 Annual Meeting of the Association for Information Science & Technology | Oct. 27 – 31, 2023 | London, United Kingdom. Author(s) retain copyright, but ASIS&T receives an exclusive publication license.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy

引用

Journal of Computer Science & Technology 2023年第6期38卷 1403-1422页

作者：赵丹赵素云陈红刘睿瑄李翠平张晓莹 Institute of Scientific and Technical Information of China Beijing 100038China Key Laboratory of Data Engineering and Knowledge Engineering(Ministry of Education) School of InformationRenmin University of ChinaBeijing 100872China

Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee *** current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of *** current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate ***,an unbiased FIM approach with lower variance is highly *** narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the *** FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their ***,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.

关键词： local differential privacy frequent itemset mining frequency oracle

来源：评论

学校读者我要写书评

暂无评论

Leaving None Behind: data-Free Domain Incremental Learning for Major Depressive Disorder Detection

引用

IEEE Transactions on Affective Computing 2024年第2期16卷 758-770页

作者： Chen, Tao Guo, Yanrong Hao, Shijie Hong, Richang Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data China Hefei University of Technology Ministry of Education and School of Computer Science and Information Engineering Hefei230009 China

While deep learning techniques have shown promising performance in the Major Depressive Disorder (MDD) detection task, they still face limitations in real-world scenarios. Specifically, given the data scarcity, some efforts have resorted to aggregating data from different domains to expand the data volume. However, their effectiveness is currently limited by the domain gap and data privacy. Additionally, the class imbalance issue is particularly severe in our application, leading to biased classifying performance accordingly. To address these challenges, we propose data-Free Domain Incremental Learning for the MDD detection (DIL-MDD) task, accommodating multiple feature distributions by only accessing well-trained models from previous domains and the data in the current domain. Specifically, DIL-MDD consists of two key modules: Adaptive Class-tailored Threshold Learning (ACTL) and data-Free Domain Alignment (DFDA). The first module measures the discrepancy between the outputs of two sequential domains, based on which we learn a class-tailored threshold adaptively. Building on this, we differentiate between samples that either exhibit similarities or dissimilarities with the previous domain, where this similar sample set is identified to investigate the feature distribution of the historical data. The second module imposes an alignment constraint to narrow the gap between these two sample sets, thereby exploring the expertise of the previous domain. To validate the effectiveness of the proposed method, we conduct extensive experiments on the public MDD datasets, i.e., DAIC-WOZ, MODMA, and CMDC. We also apply our method to another mental health condition, Autism Spectrum Disorder (ASD), to further demonstrate its applicability. Finally, the ablation studies validate the superiority of the proposed modules. © 2010-2012 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis

引用

Chinese Journal of Electronics 2023年第6期32卷 1329-1340页

作者： SUN Haoran WANG Yang LIU Haipeng QIAN Biao Department of Computer Science and Information Engineering Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data Ministry of EducationHefei University of Technology

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel finegrained text-image fusion based generative adversarial networks(FF-GAN), which consists of two modules: Finegrained text-image fusion block(FF-Block) and global semantic refinement(GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details. And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process. Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.

关键词： Visualization Fuses Convolution Semantics Linguistics Benchmark testing Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Representation learning: serial-autoencoder for personalized recommendation

引用

Frontiers of Computer Science 2024年第4期18卷 61-72页

作者： Yi ZHU Yishuai GENG Yun LI Jipeng QIANG Xindong WU School of Information Engineering Yangzhou UniversityYangzhou 225127China Key Laboratory of Knowledge Engineering with Big Data(Ministry of Education of the People’s Republic of China) Hefei University of TechnologyHefei 230009China School of Computer Science and Information Engineering Hefei University of TechnologyHefei 230009China

Nowadays,the personalized recommendation has become a research hotspot for addressing information *** this,generating effective recommendations from sparse data remains a ***,auxiliary information has been widely used to address data sparsity,but most models using auxiliary information are linear and have limited *** to the advantages of feature extraction and no-label requirements,autoencoder-based methods have become quite ***,most existing autoencoder-based methods discard the reconstruction of auxiliary information,which poses huge challenges for better representation learning and model *** address these problems,we propose Serial-Autoencoder for Personalized Recommendation(SAPR),which aims to reduce the loss of critical information and enhance the learning of feature ***,we first combine the original rating matrix and item attribute features and feed them into the first autoencoder for generating a higher-level representation of the ***,we use a second autoencoder to enhance the reconstruction of the data representation of the prediciton rating *** output rating information is used for recommendation *** experiments on the MovieTweetings and MovieLens datasets have verified the effectiveness of SAPR compared to state-of-the-art models.

关键词： personalized recommendation autoencoder representation learning collaborative filtering

来源：评论

学校读者我要写书评

暂无评论

Multi-view Feature Learning for the Over-penalty in Adversarial Domain Adaptation

引用

data Intelligence 2024年第1期6卷 183-200页

作者： Yuhong Zhang Jianqing Wu Qi Zhang Xuegang Hu School of Computer and Information Engineering Hefei University of TechnologyHefei 230601China Key Laboratory of Knowledge Engineering with Big Data(Hefei University of Technology) The Ministry of Education of ChinaHefei 230009China

Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different ***,adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation ***,the adversarial methods learn the transferability at the expense of the discriminability in feature representation,leading to low generalization to the target *** this end,we propose a Multi-view Feature Learning method for the Over-penalty in Adversarial Domain ***,multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation,which will counter the over-penalty for discriminability in adversarial ***,the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable *** experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.

关键词： domain adaptation adversarial learning multi-view learning

来源：评论

学校读者我要写书评

暂无评论

Diversifying Question Generation over knowledge Base via External Natural Questions 30

Diversifying Question Generation over Knowledge Base via Ext...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Guo, Shasha Zhang, Jing Ke, Xirui Li, Cuiping Chen, Hong School of Information Renmin University of China Beijing China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education China

ISBN: (纸本)9782493814104

Previous methods on knowledge base question generation (KBQG) primarily focus on refining the quality of a single generated question. However, considering the remarkable paraphrasing ability of humans, we believe that diverse texts can express identical semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the aforementioned diversity. They calculate the ratio of unique n-grams in the generated question, which tends to measure duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG show that our approach can outperform pre-trained language model baselines and text-davinci-003 in diversity while achieving comparable performance with ChatGPT. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：