While there are many works on the applications of machine learning, not so many of them are trying to understand the theoretical justifications to explain their efficiency. In this work, overfitting control (or genera...
详细信息
The two-dimensional problem of a viscous laminar flow around Zhukovsky airfoils at an angle of attack is considered. Based on the approach of local similarity, which was proposed by Kochin and Loytsyansky for the equa...
详细信息
Online network crawling tasks require a lot of efforts for the researchers to collect the data. One of them is identification of important nodes, which has many applications starting from viral marketing to the preven...
详细信息
Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated code is prone to bugs. Deve...
详细信息
ISBN:
(数字)9798331526023
ISBN:
(纸本)9798331526030
Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated code is prone to bugs. Developers interacting with LLMs seek trusted code and, ideally, clear indications of potential bugs and vulnerabilities. Verified code can mitigate potential business risks associated with adopting generated code. We use model-agnostic framework CodePatchLLM, an extension for LLM that utilizes Svace feedback to enhance code generation quality. We evaluate CodePatchLLM on four popular LLMs across three datasets. Our experiments show an average absolute reduction of 19.1 % in static analyzer warnings for Java across all datasets and models, while preserving pass@ 1 code generation accuracy.
Ships and offshore structures often encounter irregular waves in the ocean. Therefore, the simulation of irregular waves is very necessary and meaningful. The potential flow method is often used to simulate the irregu...
详细信息
Typology of semantic shifts has been in the focus of linguistic typology for the last 20 years. Emergence of cross-linguistic databases and linguistic platforms has taken the study of semantic changes to the new level...
详细信息
Typology of semantic shifts has been in the focus of linguistic typology for the last 20 years. Emergence of cross-linguistic databases and linguistic platforms has taken the study of semantic changes to the new level, as it enlarged the sample of the languages under investigation. Yet the languages of Russia are only scarcely represented in the global databases and do not make a substantial contribution to this field. The LingvoDoc platform, which stores unique materials on the languages of Russia, upon certain enhancements can fill in this gap.
This paper focuses on investigation of confidential documents leaks in the form of screen photographs. Proposed approach does not try to prevent leak in the first place but rather aims to determine source of the leak....
详细信息
Most Named Entity Recognition (NER) models operate under the assumption that training datasets are fully labelled. While it is valid for established datasets like CoNLL 2003 and OntoNotes, sometimes it is not feasible...
详细信息
Most Named Entity Recognition (NER) models operate under the assumption that training datasets are fully labelled. While it is valid for established datasets like CoNLL 2003 and OntoNotes, sometimes it is not feasible to obtain the complete dataset annotation. These situations may occur, for instance, after selective annotation of entities for cost reduction. This work presents an approach to finetuning BERT on such partially labelled datasets using self-supervision and label preprocessing. Our approach outperforms the previous LSTM-based label preprocessing baseline, significantly improving the performance on poorly labelled datasets. We demonstrate that following our approach while finetuning RoBERTa on CoNLL 2003 dataset with only 10% of total entities labelled is enough to reach the performance of the baseline trained on the same dataset with 50% of the entities labelled.
This paper presents a complete solution for extraction of textual information and tables from PDF with a text layer. The presented solution consist of two parts: PyTabby is a tool for extracting text and tables from P...
详细信息
The challenges of black box optimization arise due to imprecise responses and limited output information. This article describes new results on optimizing multivariable functions using an Order Oracle, which provides ...
详细信息
暂无评论