Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as Hum...
详细信息
ISBN:
(纸本)9798400702174
Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses on a sim-ple code generation scenario (i.e., function-level or statement-level code generation), which mainly asks LLMs to generate one single code unit (e.g., a function or a statement) for the given natural language description. Such evaluation focuses on generating independent and often small-scale code units, thus leaving it unclear how LLMs perform in real-world software development scenarios. To fill this knowledge gap, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e., class-level code generation. Compared with existing code generation benchmarks, it better reflects real-world software development scenarios due to it comprising broader contextual dependencies and multiple, interdependent units of code. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on the new benchmark ClassEval, we then perform the first study of 11 state-of-the-art LLMs on class-level code generation. Based on our results, we find that all LLMs perform much worse on class-level code generation compared to the method-level. While GPT models still dominate other LLMs on class-level code generation, the performance rankings of other models on method-level code generation no longer holds for class-level code generation. Besides, most models (except GPT models) perform better when generating the class method by method;and they have the limited ability of generating dependent code. Based on our findings, we call for software engineering (SE) researchers' expertise to build more LLM benchmarks based on practical and com
With the development of Service Oriented Architecture (SOA), the number of Web services on the Internet is also growing rapidly. Classifying Web services accurately and efficiently is helpful to improve the quality of...
详细信息
The Wuling Mountains Area(WMA) is one of the important ecological protection areas in central China, known for its rich biodiversity and unique ecological environment. To effectively protect the species resources in t...
详细信息
The escalating parking space problem at Nile University of Nigeria poses a growing threat due to rising enrollments. The absence of a structured parking management system compounds this issue. This project aims to dev...
详细信息
To fuse vocabulary features into the pre-training model is the mainstream data feature processing method for sequence labelling tasks. In general, the feature fusion methods that have been proposed at present are dire...
To fuse vocabulary features into the pre-training model is the mainstream data feature processing method for sequence labelling tasks. In general, the feature fusion methods that have been proposed at present are direct fusion outside the pre-training model or fusion of lexical features using attention mechanism. However, the study found that this way of vocabulary enhancement does not conform to the word formation rules of modern Chinese. In the Chinese language, it is easy to fuse irrelevant or even incorrect lexical features into the sequence using the above feature processing methods, which is bad for the experimental results of the Chinese sequence labelling task. To solve these problems, we propose to use Cosine Similarity Adapter to process lexical features in Chinese sequence labelling tasks. CSBERT is a hybrid model using this structure based on BERT, which conforms to the word formation rules of modern Chinese to a certain extent. It can fuse the features of the word into the character or eliminate the features of the word in the character according to the cosine similarity between the character vector and a word vector. The experimental results show that CSBERT has better ability to label Chinese sequences than the benchmark model. CSBERT has achieved the best experimental results such as F1-Score on 7 open datasets and the best ability of multi-label classification, which proves that the model has good practical value.
作者:
Xie, PengchengState Key Laboratory of Scientific and Engineering Computing
Institute of Computational Mathematics and Scientific/Engineering Computing Academy of Mathematics and Systems Science Chinese Academy of Sciences University of Chinese Academy of Sciences ZhongGuanCun East Road No. 55 Beijing China
Optimization methods play a crucial role in various fields and applications. In some optimization problems, the derivative information of the objective function is unavailable. Such black-box optimization problems nee...
详细信息
In Tunisia, citizens use social media platforms as a space to exercise freedom of speech. However, unchecked and complete freedom of expression can fuel the spread of hateful speech, which is devastating not only for ...
详细信息
ISBN:
(纸本)9783031791635;9783031791642
In Tunisia, citizens use social media platforms as a space to exercise freedom of speech. However, unchecked and complete freedom of expression can fuel the spread of hateful speech, which is devastating not only for those targeted but also for our society. This alarming situation evokes the need for limiting the spread of hateful content by working on hate speech detection in "Derja", which is the tunisian dialect. Used as a means of communication in daily life and on social media platforms, this dialect is a mixture of many languages, including Arabic, French, and Amazighi, and it can be written using Arabic letters. Due to the complexity of this language, a significant lack of publicly available, large, and annotated datasets for hate speech detection in Tunisian dialect written in Arabic letters is noticeable, making "Tunisian Derja" an underrepresented dialect. In this paper, we introduce the largest publicly available dataset, which consists of more than 12k comments manually annotated as Hate, and Neutral. We also provide an in-depth explanation of the processes of data collection, annotation, and pre-processing. Moreover, we undertake a comprehensive evaluation of the dataset's efficacy through various machine learning models, including Support Vector Machines (SVM), Random Forest, and XGBoost.
The dynamic knowledge graph is a data structure that adds temporal information to the nodes and edges of a traditional knowledge graph. It describes the changing processes of entities and relationships over time, ther...
详细信息
Misleading headlines are part of the disinformation problem. Headlines should give a concise summary of the news story helping the reader to decide whether to read the body text of the article, which is why headline a...
详细信息
Misleading headlines are part of the disinformation problem. Headlines should give a concise summary of the news story helping the reader to decide whether to read the body text of the article, which is why headline accuracy is a crucial element of a news story. This work focuses on detecting misleading headlines through the automatic identification of contradiction between the headline and body text of a news item. When the contradiction is detected, the reader is alerted to the lack of precision or trustworthiness of the headline in relation to the body text. To facilitate the automatic detection of misleading headlines, a new Spanish dataset is created (ES_Headline_Contradiction) for the purpose of identifying contradictory information between a headline and its body text. This dataset annotates the semantic relationship between headlines and body text by categorising the relation between texts as compatible, contradictory and unrelated. Furthermore, another novel aspect of this dataset is that it distinguishes between different types of contradictions, thereby enabling a more fine-grain identification of them. The dataset was built via a novel semi-automatic methodology, which resulted in a more cost-efficient development process. The results of the experiments show that pre-trained language models can be fine-tuned with this dataset, producing very encouraging results for detecting incongruency or non-relation between headline and body text.
Federated Learning (FL) recently emerges as a paradigm to train a global machine learning model across distributed clients without sharing raw data. Knowledge Graph (KG) embedding represents KGs in a continuous vector...
详细信息
暂无评论