In this paper we introduce our system for the task of determining whether an author spreads Irony and Stereotype in English tweets or not, a part of PAN 2022 (IROSTEREO) task. For the irony spreading author classifica...
详细信息
Large Language Models (LLMs) have shown significant potential in automating softwareengineering tasks, particularly in code generation. However, current evaluation benchmarks, which primarily focus on accuracy, fall ...
详细信息
ISBN:
(数字)9798331537111
ISBN:
(纸本)9798331537128
Large Language Models (LLMs) have shown significant potential in automating softwareengineering tasks, particularly in code generation. However, current evaluation benchmarks, which primarily focus on accuracy, fall short in assessing the quality of the code generated by these models, specifically their tendency to produce code smells. To address this limitation, we introduce CodeSmellEval, a benchmark designed to evaluate the propensity of LLMs for generating code smells. Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData. To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral. The results reveal that both models tend to generate code smells, such as simplifiable-condition and consider-merging-isinstance. These findings highlight the effectiveness of our benchmark in evaluating LLMs, providing valuable insights into their reliability and their propensity to introduce code smells in code generation tasks.
Post-training is known to be effective for boosting the performance of a pre-trained language model. However, in the task of question generation, question generators post-trained with a well-designed training objectiv...
详细信息
software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification,which affect the testing ***,it is difficult to identify all faults in *** requirement changes cont...
详细信息
software testing is a critical phase due to misconceptions about ambiguities in the requirements during specification,which affect the testing ***,it is difficult to identify all faults in *** requirement changes continuously,it increases the irrelevancy and redundancy during *** to these challenges;fault detection capability decreases and there arises a need to improve the testing process,which is based on changes in requirements *** this research,we have developed a model to resolve testing challenges through requirement prioritization and prediction in an agile-based *** research objective is to identify the most relevant and meaningful requirements through semantic analysis for correct change *** compute the similarity of requirements through case-based reasoning,which predicted the requirements for reuse and restricted to error-based ***,the apriori algorithm mapped out requirement frequency to select relevant test cases based on frequently reused or not reused test cases to increase the fault detection ***,the proposed model was evaluated by conducting *** results showed that requirement redundancy and irrelevancy improved due to semantic analysis,which correctly predicted the requirements,increasing the fault detection rate and resulting in high user *** predicted requirements are mapped into test cases,increasing the fault detection rate after changes to achieve higher user ***,the model improves the redundancy and irrelevancy of requirements by more than 90%compared to other clustering methods and the analytical hierarchical process,achieving an 80%fault detection rate at an earlier ***,it provides guidelines for practitioners and researchers in the modern *** the future,we will provide the working prototype of this model for proof of concept.
Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML me...
详细信息
The Column Subset Selection (CSS) problem has been widely studied in dimensionality reduction and feature selection. The goal of the CSS problem is to output a submatrix S, consisting of k columns from an n × d i...
The Column Subset Selection (CSS) problem has been widely studied in dimensionality reduction and feature selection. The goal of the CSS problem is to output a submatrix S, consisting of k columns from an n × d input matrix A that minimizes the residual error A − SS†A2F, where S† is the Moore-Penrose inverse matrix of S. Many previous approximation algorithms have non-linear running times in both n and d, while the existing linear-time algorithms have a relatively larger approximation ratios. Additionally, the local search algorithms in existing results for solving the CSS problem are heuristic. To achieve linear running time while maintaining better approximation using a local search strategy, we propose a local search-based approximation algorithm for the CSS problem with exactly k columns selected. A key challenge in achieving linear running time with the local search strategy is how to avoid exhaustive enumerations of candidate columns for constructing swap pairs in each local search step. To address this issue, we propose a two-step mixed sampling method that reduces the number of enumerations for swap pair construction from O(dk) to k in linear time. Although the two-step mixed sampling method reduces the search space of local search strategy, bounding the residual error after swaps is a non-trivial task. To estimate the changes in residual error after swaps, we propose a matched swap pair construction method to bound the approximation loss, ensuring a constant probability of loss reduction in each local search step. In expectation, these techniques enable us to obtain the local search algorithm for the CSS problem with theoretical guarantees, where a 53(k + 1)-approximate solution can be obtained in linear running time O(ndk4 log k). Empirical experiments show that our proposed algorithm achieves better quality and time compared to previous algorithms on both small and large datasets. Moreover, it is at least 10 times faster than state-of-the-art algorithms a
Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary. The majority of existing continual learning approaches in the literature rely on heuristics and do ...
详细信息
This paper presents an emergency response management system to tackle the problem of the absence of network connectivity during the time of a natural disaster. Network connectivity is often enabled by the base station...
详细信息
Many real-world networks including the World Wide Web and the Internet of Things are graphs in their abstract forms. Graph neural networks (GNNs) have emerged as the main solution for deep learning on graphs. Recently...
详细信息
Cloud storage is widely used by large companies to store vast amounts of data and files,offering flexibility,financial savings,and ***,information shoplifting poses significant threats,potentially leading to poor perf...
详细信息
Cloud storage is widely used by large companies to store vast amounts of data and files,offering flexibility,financial savings,and ***,information shoplifting poses significant threats,potentially leading to poor performance and privacy ***-based cognitive computing can help protect and maintain information security and privacy in cloud platforms,ensuring businesses can focus on business *** ensure data security in cloud platforms,this research proposed a blockchain-based Hybridized Data Driven Cognitive Computing(HD2C)***,the proposed HD2C framework addresses breaches of the privacy information of mixed participants of the Internet of Things(IoT)in the ***2C is developed by combining Federated Learning(FL)with a Blockchain consensus algorithm to connect smart contracts with Proof of ***“Data Island”problem can be solved by FL’s emphasis on privacy and lightning-fast processing,while Blockchain provides a decentralized incentive structure that is impervious to *** with Blockchain allows quick consensus through smart member selection and *** HD2C paradigm significantly improves the computational processing efficiency of intelligent *** analysis results derived from IIoT datasets confirm HD2C *** compared to other consensus algorithms,the Blockchain PoA’s foundational cost is *** accuracy and memory utilization evaluation results predict the total benefits of the *** comparison to the values 0.004 and 0.04,the value of 0.4 achieves good *** to the experiment results,the number of transactions per second has minimal impact on memory *** findings of this study resulted in the development of a brand-new IIoT framework based on blockchain technology.
暂无评论