检索结果-内蒙古大学图书馆

SEOSS-Queries - a software engineering dataset for text-to-sql and question answering tasks

DATA IN BRIEF 2022年 42卷 108211页

作者： Tomova, Mihaela Todorova Hofmann, Martin Maeder, Patrick Tech Univ Ilmenau D-98693 Ilmenau Germany Friedrich Schiller Univ Fac Biol Sci D-07745 Jena Germany

Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-sql benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSSQueries dataset consisting of natural language utterances and accompanying sql queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 sql queries;each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained sqlNet and Ratsql baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and sql queries is hosted at ***/s/75ed49ef01ac2f83b3e2. (C) 2022 The Authors. Published by Elsevier Inc.

关键词： Software and systems requirement engineering text-to-sql Dataset Question answering Natural language processing

来源：评论

学校读者我要写书评

暂无评论

HFD: Hierarchical feature decoupling for sql generation from text

引用

INTELLIGENT DATA ANALYSIS 2024年第4期28卷 991-1005页

作者： Zhang, Xu Hu, Xiaoyu Liu, Zejie Xiang, Yanzheng Zhou, Deyu Southeast Univ Sch Comp Sci & Engn Key Lab Comp Network Informat Integrat Minist Educ Nanjing Jiangsu Peoples R China

text-to-sql, a computational linguistics task, seeks to facilitate the conversion of natural language queries into sql queries. Recent methodologies have leveraged the concept of slot-filling in conjunction with predetermined sql templates to effectively bridge the semantic gap between natural language questions and structured database queries, achieving commendable performance by harnessing the power of multi-task learning. However, employing identical features across diverse tasks is an ill-suited practice, fraught with inherent drawbacks. Firstly, based on our observation, there are clear boundaries in the natural language corresponding to SELECT and WHERE clauses. Secondly, the exclusive features integral to each subtask are inadequately emphasized and underutilized, thereby hampering the acquisition of discriminative features for each specific subtask. In an endeavor to rectify these issues, the present work introduces an innovative approach: the hierarchical feature decoupling model for sql query generation from natural language. This novel approach involves the deliberate separation of features pertaining to subtasks within both SELECT and WHERE clauses, further dissociating these features at the subtask level to foster better model performance. Empirical results derived from experiments conducted on the Wikisql benchmark dataset reveal the superiority of the proposed approach over several state-of-the-art baseline methods in the context of text-to-sql query generation.

关键词： text-to-sql multi-task learning discriminative features feature decoupling

来源：评论

学校读者我要写书评

暂无评论

The Role of Accuracy and Validation Effectiveness in Conversational Business Analytics

引用

IEEE ACCESS 2025年 13卷 29279-29291页

作者： Alparslan, Adem FOM Univ Appl Sci Econ & Management Dept Business Analyt D-45127 Essen Germany

This study examines how conversational business analytics can bridge the skill gap of end users that hinders traditional self-service analytics. By leveraging generative AI, conversational business analytics enables end users to independently retrieve data, process it, and generate information. Using text-to-sql as an example, this study proposes theoretical models grounded in expected utility theory to examine two levels of AI support: partial support, where AI translates natural language requests into sql and the generated information serves directly as the basis for decision-making, and full support, which includes an additional validation step. The models define conditions where AI-driven information generation surpasses human delegation. These conditions underscore the critical interplay between AI accuracy and validation effectiveness as pivotal factors for the successful integration of AI. The findings suggest that partial support is viable when the AI accuracy is sufficiently high. In contrast, full support necessitates both adequate accuracy and robust validation. Insufficient validation impairs decisions, highlighting the need for effective validation techniques to fully leverage conversational business analytics. Moreover, the dependence on user-driven validation introduces additional risks, as its effectiveness is contingent on the user's experience or familiarity with sql and underlying data structures. This insight challenges conventional validation techniques for AI-generated information and highlights the need to use techniques that reduce the reliance on the technical expertise of end users.

关键词： Business Self-service Structured Query Language Natural languages Data models Analytical models Artificial intelligence Data retrieval Conversational business analytics self-service analytics large language models accuracy validation effectiveness text-to-sql expected utility theory fine-tuning retrieval augmented generation business intelligence retrieval augmented generation business intelligence

来源：评论

学校读者我要写书评

暂无评论

A sql automatic generation method based on prompt learning 5

A SQL automatic generation method based on prompt learning

引用

5th International Conference on Telecommunications, Optics, and Computer Science, TOCS 2024

作者： Wang, Ying Zhou, Hongyan Zhang, Cheng Yu, Xuexia State Grid Info & Telecom Group Co. Ltd. Information Industry Research Institute Beijing China Marketing Department of State Grid Hubei Electric Power Co. Ltd. Hubei Wuhan China Marketing Service Center (Metering Center) State Grid Hubei Electric Power Co. Ltd. Hubei Wuhan China Beijing CETC Information Technology Co. Ltd. Beijing China

ISBN: (纸本)9781510691667

The objective of the Structure Query Language (sql) project is to transform spoken language into sql commands that can be executed. Typically, creating models that generate sql requires paired examples of sql code and natural language queries, and these models are usually limited to specific domains. Traditional approaches struggle with broad applicability. This paper introduces a novel technique for sql generation, known as TS-PL, which leverages prompt learning to produce functional sql commands from input queries and data tables. The process begins with the analysis and extraction of essential details from the query;next, an sql template is crafted and populated to create the sql code;finally, the generated commands are executed to retrieve data. The study's findings indicate that TS-PL outperforms other methods on the Spider dataset, thereby validating its efficacy. The research underscores the potential of this method to enhance automation in natural language processing and database interactions. This advancement is crucial for easing the burden on database professionals, streamlining data operations, and fostering the evolution of intelligent querying systems. © 2025 SPIE.

关键词： information extraction pre-trained model prompt learning sql statement generation text-to-sql

来源：评论

学校读者我要写书评

暂无评论

Enhanced Natural Language Interface for Web-Based Information Retrieval

引用

IEEE ACCESS 2021年 9卷 4233-4241页

作者： Bai, Tian Ge, Yan Guo, Shuyu Zhang, Zhenting Gong, Leiguang Jilin Univ Coll Comp Sci & Technol Changchun 130012 Peoples R China Jilin Univ Coll Software Changchun 130012 Peoples R China Jilin Univ Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Yantai Huashen Intelligent Technol Ltd Yantai 264000 Peoples R China

Database application is at the core of most web application systems such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repository of various fields. However, the usage of these database systems for data and information retrieval is severely limited because of lacking support for processing search queries expressed in a natural language (NL). Most web interfaces for databases today only take search queries entered in some form of logical combination of keywords or text strings, which restrict the scope and depth of what a web user really wants to search for, even though natural language based data or information retrieval has made significant advances in recent years. To overcome or at least to alleviate such limitation in web information services, we propose in this article an improved neural model based on an existing framework IRNet for NL query of databases, in which a representation of Gated Graph Neural Network (GGNN) is introduced to encode the database entities and relations. We also represent and use the database values in the prediction model to identify and match table and column names for automatic synthesize a correct sql statement from a query expressed in a NL sentence. Experiments with a public dataset demonstrates the promising potential of our approach.

关键词： Neural network natural language processing text-to-sql gated graph neural network

来源：评论

学校读者我要写书评

暂无评论

sqlSketch-TVC: Type, value and compatibility based approach for sql queries sqlSketch-typed

引用

APPLIED INTELLIGENCE 2023年第4期53卷 3889-3898页

作者： Ahkouk, Karam Machkour, Mustapha Ibn Zohr Univ Fac Sci Agadir 80000 Souss Massa Dar Morocco

Understanding the complexity of the translation of Natural Language (NL) sentences to sql queries becomes an essential part in the resolution process. The majority of the proposed models either focus on simple queries or suffer when exposed to unseen domains or new schemas structures;This can be understood as the greater part of solutions are based on limited datasets or treat the problem in an end-to-end perspective. Our previously proposed model which is sqlSketch that provides an intelligent method for handling complex queries was able to outperform all the state-of-the-art models on the Greatsql dataset. This paper addresses the problem of translating NL sentences to sql queries in an effective way by leveraging our previous sqlSketch model with a type aware layer, a values classification method as well as a compatibility based module that enhance the quality of the predicted items (sqlSketch-TVC). We evaluate the new model using the Components and Exact matching metrics. The results show that sqlSketch-TVC outperforms the other models on all sql components and provides a novel way for inferring values from the input Question.

关键词： text-to-sql Greatsql Natural language processing Relational databases Neural networks

来源：评论

学校读者我要写书评

暂无评论

xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

引用

VLDB JOURNAL 2023年第2期33卷 301-321页

作者： Usta, Arif Karakayali, Akifhan Ulusoy, Ozgur Univ Waterloo Waterloo ON Canada Cent Bank Republ Turkiye Ankara Turkiye Bilkent Univ Ankara Turkiye

Recently, numerous studies have been proposed to attack the natural language interfaces to data-bases (NLIDB) problem by researchers either as a conventional pipeline-based or an end-to-end deep-learning-based solution. Although each approach has its own advantages and drawbacks, regardless of the approach preferred, both approaches exhibit black-box nature, which makes it difficult for potential users to comprehend the rationale behind the decisions made by the intelligent system to produce the translated sql. Given that NLIDB targets users with little to no technical background, having interpretable and explainable solutions becomes crucial, which has been overlooked in the recent studies. To this end, we propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. We also evaluate xDBTagger quantitatively in three real-world relational databases. The evaluation results indicate that in addition to being lightweight, fast, and fully explainable, xDBTagger is also competitive in terms of translation accuracy compared to both pipeline-based and end-to-end deep learning approaches.

关键词： Natural language interface for databases NLIDB text-to-sql Multi-task learning Explainable artificial intelligence XAI

来源：评论

学校读者我要写书评

暂无评论

Towards Demand-Driven On-The-Fly Statistics

引用

JOURNAL OF OFFICIAL STATISTICS 2023年第3期39卷 351-379页

作者： Gelsema, Tjalling van den Heuvel, Guido Stat Netherlands Res & Dev Henri Faasdreef 312 NL-2492 JP The Hague Netherlands

A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.

关键词： Dissemination artificial intelligence question answering text-to-sql information modeling

来源：评论

学校读者我要写书评

暂无评论

Domain-Specific Few-Shot Table Prompt Question Answering via Contrastive Exemplar Selection

引用

ALGORITHMS 2024年第7期17卷 278页

作者： Mo, Tianjin Xiao, Qiao Zhang, Hongyi Li, Ren Wu, Yunsong Chongqing Coll Elect Engn Business Sch Chongqing 401331 Peoples R China Chongqing Jiaotong Univ Sch Informat Sci & Engn Chongqing 400074 Peoples R China Chongqing Univ Sch Big Data & Software Engn Chongqing 400044 Peoples R China

As a crucial task in natural language processing, table question answering has garnered significant attention from both the academic and industrial communities. It enables intelligent querying and question answering over structured data by translating natural language into corresponding sql statements. Recently, there have been notable advancements in the general domain table question answering task, achieved through prompt learning with large language models. However, in specific domains, where tables often have a higher number of columns and questions tend to be more complex, large language models are prone to generating invalid sql or Nosql statements. To address the above issue, this paper proposes a novel few-shot table prompt question answering approach. Specifically, we design a prompt template construction strategy for structured sql generation. It utilizes prompt templates to restructure the input for each test data and standardizes the model output, which can enhance the integrity and validity of generated sql. Furthermore, this paper introduces a contrastive exemplar selection approach based on the question patterns and formats in domain-specific contexts. This enables the model to quickly retrieve the relevant exemplars and learn characteristics about given question. Experimental results on the two datasets in the domains of electric energy and structural inspection show that the proposed approach outperforms the baseline models across all comparison settings.

关键词： table question answering text-to-sql few-shot large language models prompt learning

来源：评论

学校读者我要写书评

暂无评论

Dataset and Enhanced Model for Eligibility Criteria-to-sql Semantic Parsing 12

Dataset and Enhanced Model for Eligibility Criteria-to-SQL S...

引用

12th International Conference on Language Resources and Evaluation (LREC)

作者： Yu, Xiaojing Chen, Tianlong Yu, Zhengjie Li, Huiyu Yang, Yang Jiang, Xiaoqian Jiang, Anxiao Texas A&M Univ College Stn TX 77843 USA Univ Sci & Technol China Hefei Anhui Peoples R China UT Southwestern Med Ctr Dallas TX USA Walmart Technol Bentonville AR USA Univ Texas Hlth Sci Ctr Houston Houston TX 77030 USA

ISBN: (纸本)9791095546344

Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and the effectiveness of studies. However, retrieving eligible patients for a trial from the electronic health record (EHR) database remains a challenging task for clinicians since it requires not only medical knowledge about eligibility criteria, but also an adequate understanding of structured query language (sql). In this paper, we introduce a new dataset that includes the first-of-its-kind eligibility-criteria corpus and the corresponding queries for criteria-to-sql (Criteria2sql), a task translating the eligibility criteria to executable sql queries. Compared to existing datasets, the queries in the dataset here are derived from the eligibility criteria of clinical trials and include Order-sensitive, Counting-based, and Boolean-type cases which are not seen before. In addition to the dataset, we propose a novel neural semantic parser as a strong baseline model. Extensive experiments show that the proposed parser outperforms existing state-of-the-art general-purpose text-to-sql models while highlighting the challenges presented by the new dataset. The uniqueness and the diversity of the dataset leave a lot of research opportunities for future improvement.

关键词： Semantic Parsing text-to-sql Eligibility Criteria

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：