检索结果-内蒙古大学图书馆

text-to-sql: A methodical review of challenges and models

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES 2024年第3期32卷 403-419页

作者： Kanburoglu, Ali Bugra Tek, F. Boray Isik Univ Dept Comp Engn Istanbul Turkiye Istanbul Tech Univ Dept Artificial Intelligence & Data Engn Istanbul Turkiye

This survey focuses on text -to -sql, automated translation of natural language queries into sql queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing text -to -sql review papers in the literature. We apply the same method to extract proposed text -to -sql models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on text -to -sql datasets and discuss execution -guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of text -to -sql datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the text -to -sql task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross -domain Spider dataset. Finally, we conclude with a discussion of future directions for text -to -sql research, identifying potential areas of improvement and advancements in this field.

关键词： text-to-sql large language model natural language processing deep learning

来源：评论

学校读者我要写书评

暂无评论

Multi-pattern retrieval-augmented framework for text-to-sql with Poincaré-Skeleton retrieval and meta-instruction reasoning

引用

INFORMATION PROCESSING & MANAGEMENT 2025年第3期62卷

作者： Guo, Chunxi Tian, Zhiliang Tang, Jintao Li, Shasha Wang, Ting Natl Univ Def Technol Coll Comp 109 Deya Rd Changsha 410073 Hunan Peoples R China

text-to-sql transforms natural language text into sql queries, a task complicated by sql's complex syntax and the need for specialized knowledge. Retrieval from the sql-generated case repository assists large language models (LLMs) by providing relevant examples. However, complex queries often involve complicated sql syntax, which can confuse LLMs because they are designed for natural language rather than sql. In this paper, we propose a multi- pattern retrieval-augmented framework for sql generation, which dynamically selects relevant examples based on the query and reasoning patterns. To retrieve similar query patterns, we construct question skeletons in the Poincar & eacute;model, which better distinguishes entities aligned with a question's needs. To provide tailored examples of reasoning patterns for each logical step, especially for complex problems, we design meta-instruction-based retrieval repositories for multi-category chain-of-thought fragments. To mitigate biases from initial retrievals, we implement a revision strategy that leverages LLMs to interact with databases, enabling LLMs to self-correct errors during sql generation. Experiments on four benchmarks show that our method outperforms strong baseline models, which increases execution accuracy by 13% to 23.1% with the same LLM. Ablation studies provide insights into the framework's performance sensitivity to different components and strategies. Further analysis reveals a correlation between the framework's effectiveness and LLMs' quality and stability, and substantial performance gains from initial iteration modifications are the most significant.

关键词： Large language model text-to-sql Prompt learning Retrieval augmented

来源：评论

学校读者我要写书评

暂无评论

A Question-Aware Few-Shot text-to-sql Neural Model for Industrial Databases

引用

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS 2025年第1期2025卷

作者： Li, Ren Chen, Yu Zhang, Hongyi Yang, Jianxi Xiao, Qiao Jiang, Shixin Chongqing Jiaotong Univ Sch Informat Sci & Engn Chongqing 400074 Peoples R China Chongqing Jiaotong Univ Sch Traff & Transportat Chongqing 400074 Peoples R China

Intelligent question answering over industrial databases is a challenging task due to the multicolumn context and complex questions. The existing methods need to be improved in terms of sql generation accuracy. In this paper, we propose a question-aware few-shot text-to-sql approach based on the SDCUP pretrained model. Specifically, an attention-based filtering approach is proposed to reduce the redundant information from multiple columns in the industrial database scenario. We further propose an operator semantics enhancement method to improve the ability of identifying complex conditions in queries. Experimental results on the industrial benchmarks in the fields of electric energy and structural inspection show that the proposed model outperforms the baseline models across all few-shot settings.

关键词： few-shot industrial databases question answering text-to-sql

来源：评论

学校读者我要写书评

暂无评论

Assessing the utility of text-to-sql approaches for satisfying software developer information needs

引用

EMPIRICAL SOFTWARE ENGINEERING 2024年第1期29卷 15-15页

作者： Tomova, Mihaela Hofmann, Martin Huetterer, Constantin Maeder, Patrick Tech Univ Ilmenau D-98693 Ilmenau Germany Friedrich Schiller Univ Fac Biol Sci D-07743 Jena Germany

Software analytics integrated with complex databases can deliver project intelligence into the hands of software engineering (SE) experts for satisfying their information needs. A new and promising machine learning technique known as text-to-sql automatically extracts information for users of complex databases without the need to fully understand the database structure nor the accompanying query language. Users pose their request as so-called natural language utterance, i.e., question. Our goal was evaluating the performance and applicability of text-to-sql approaches on data derived from tools typically used in the workflow of software engineers for satisfying their information needs. We carefully selected and discussed five seminal as well as state-of-the-art text-to-sql approaches and conducted a comparative assessment using the large-scale, cross-domain Spider dataset and the SE domain-specific SEOSS-Queries dataset. Furthermore, we study via a survey how SE professionals perform in satisfying their information needs and how they perceive text-to-sql approaches. For the best performing approach, we observe a high accuracy of 94% in query prediction when training specifically on SE data. This accuracy is almost independent of the query's complexity. At the same time, we observe that SE professionals have substantial deficits in satisfying their information needs directly via sql queries. Furthermore, SE professionals are open for utilizing text-to-sql approaches in their daily work, considering them less time-consuming and helpful. We conclude that state-of-the-art text-to-sql approaches are applicable in SE practice for day-to-day information needs.

关键词： Software analytics Database querying Natural language processing text-to-sql Machine learning Complex queries

来源：评论

学校读者我要写书评

暂无评论

UniSAr: a unified structure-aware autoregressive language model for text-to-sql semantic parsing

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2023年第12期14卷 4361-4376页

作者： Dou, Longxu Gao, Yan Pan, Mingyang Wang, Dingzirui Che, Wanxiang Lou, Jian-Guang Zhan, Dechen Harbin Inst Technol Harbin 150001 Heilongjiang Peoples R China Microsoft Res Asia Beijing 100089 Peoples R China

Existing text-to-sql semantic parsers are typically designed for particular settings such as handling queries that span multiple tables, domains, or turns which makes them ineffective when applied to different settings. We present UniSAr (Unified Structure-Aware Autoregressive Language Model), which benefits from directly using an off-the-shelf language model architecture and demonstrates consistently high performance under different settings. Specifically, UniSAr extends existing autoregressive language models to incorporate two non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships;(2) constrained decoding to decode well-structured sql for a given database schema. On seven well-known text-to-sql datasets covering multi-domain, multi-table, and multi-turn, UniSAr demonstrates highly comparable or better performance to the most advanced specifically-designed text-to-sql models.

关键词： text-to-sql Semantic parsing Natural language interfaces to databases Natural language processing Constrained decoding

来源：评论

学校读者我要写书评

暂无评论

A survey on deep learning approaches for text-to-sql

引用

VLDB JOURNAL 2023年第4期32卷 905-936页

作者： Katsogiannis-Meimarakis, George Koutrika, Georgia Athena Res Ctr Athens Greece

To bridge the gap between users and data, numerous text-to-sql systems have been developed that allow users to pose natural language questions over relational databases. Recently, novel text-to-sql systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open making this area an active and flourishing field of research and development. To make real progress in building text-to-sql systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. The purpose of this survey is to present a detailed taxonomy of neural text-to-sql systems that will enable a deeper study of all the parts of such a system. This taxonomy will allow us to make a better comparison between different approaches, as well as highlight specific challenges in each step of the process, thus enabling researchers to better strategise their quest towards the "holy grail" of database accessibility.

关键词： text-to-sql Deep learning Natural language processing Natural language interface for databases

来源：评论

学校读者我要写书评

暂无评论

Bravely Say I Don't Know: Relational Question-Schema Graph for text-to-sql Answerability Classification

引用

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 2023年第4期22卷 1-18页

作者： Yu, Wei Yang, Haiyan Wang, Mengzhu Wang, Xiaodong 30th Res Inst China Elect Technol Grp Corp Chuangye Rd Chengdu 610041 Peoples R China Sichuan Minzu Coll Wenhua Rd Kangding Peoples R China Natl Univ Def Technol Deya Rd Changsha Peoples R China

Recently, the text-to-sql task has received much attention. Many sophisticated neural models have been invented that achieve significant results. Most current work assumes that all the inputs are legal and the model should generate an sql query for any input. However, in the real scenario, users are allowed to enter the arbitrary text that may not be answered by an sql query. In this article, we focus on the issue-answerability classification for the text-to-sql system, which aims to distinguish the answerability of the question according to the given database schema. Existing methods concatenate the question and the database schema into a sentence, then fine-tune the pre-trained language model on the answerability classification task. In this way, the database schema is regarded as sequence text that may ignore the intrinsic structure relationship of the schema data, and the attention that represents the correlation between the question token and the database schema items is not well designed. To this end, we propose a relational Question-Schema graph framework that can effectively model the attention and relation between question and schema. In addition, a conditional layer normalization mechanism is employed to modulate the pre-trained language model to generate better question representation. Experiments demonstrate that the proposed framework outperforms all existing models by largemargins, achieving new state of the art on the benchmark TRIAGEsql. Specifically, the model attains 88.41%, 78.24%, and 75.98% in Precision, Recall, and F1, respectively. Additionally, it outperforms the baseline by approximately 4.05% in Precision, 6.96% in Recall, and 6.01% in F1.

关键词： text-to-sql answerability classification relational graph

来源：评论

学校读者我要写书评

暂无评论

Finetuning LLMs for text-to-sql with Two-Stage Progressive Learning 13th

Finetuning LLMs for Text-to-SQL with Two-Stage Progressive L...

引用

13th International Conference on Natural Language Processing and Chinese Computing

作者： Ling, Xiao Liu, Jialin Liu, Jindu Wu, Jianhua Liu, Jie Nankai Univ Coll Artificial Intelligence Engn Res Ctr Trusted Behav Intelligence Natl Key Lab Intelligent Tracking & Forecasting I Tianjin Peoples R China

ISBN: (纸本)9789819794331;9789819794348

With the widespread usage of large language model (LLMs), LLM-based method has become the mainstream approach for text-to-sql tasks, achieving leading performance on text-to-sql leaderboards. However, generating complex sql queries correctly has always been a main challenge. Current LLM-based models primarily utilize prompting-based methods on large scale closed-source LLMs (e.g., GPT-4 and ChatGPT), which may cause concerns of usage costs and data privacy. For fine-tuning based methods, it is difficult to generate complex sql accurately in only one fine-tuning step. Focusing on this, we propose TSPsql, a Two-Stage Progressive learning method for text-to-sql. TSPsql decomposes text-to-sql task into two stages: sql elements generation auxiliary task, and sql query generation main task. The two tasks are progressively fine-tuned on a single model, effectively reducing the difficulty of sql generation and improving accuracy. TSP-sql achieves state-of-the-art performance among open-source fine-tuning based methods on Spider dev set, and surpasses most of the methods based on large scale closed-source LLMs.

关键词： text-to-sql Large Languages Models Progressive Learning

来源：评论

学校读者我要写书评

暂无评论

Towards text-to-sql over Aggregate Tables

引用

Data Intelligence 2023年第2期5卷 457-474页

作者： Shuqin Li Kaibin Zhou Zeyang Zhuang Haofen Wang Jun Ma College of Design and Innovation Tongji UniversityShanghai200092 School of Software Tongji UniversityShanghai201804 School of Automotive Studies Tongji UniversityShanghai201804

text-to-sql aims at translating textual questions into the corresponding sql *** tables are widely created for high-frequent *** text-to-sql has emerged as an important task,recent studies paid little attention to the task over aggregate *** increased aggregate tables bring two challenges:(1)mapping of natural language questions and relational databases will suffer from more ambiguity,(2)modern models usually adopt self-attention mechanism to encode database schema and *** mechanism is of quadratic time complexity,which will make inferring more time-consuming as input sequence length *** this paper,we introduce a novel approach named WAGG for text-to-sql over aggregate *** effectively select among ambiguous items,we propose a relation selection mechanism for relation *** deal with high computation costs,we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate *** also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation,where extensive experiments show the effectiveness and efficiency of our proposed method with 4%increase of accuracy and 15%decrease of inference time w.r.t a strong baseline RAT-sql.

关键词： text-to-sql Question Answering Business Intelligence Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Small, Medium, and Large Language Models for text-to-sql 43rd

Small, Medium, and Large Language Models for Text-to-SQL

引用

43rd International Conference on Conceptual Modeling

作者： Oliveira, Aiko Nascimento, Eduardo Pinheiro, Joao Avila, Caio Viktor S. Coelho, Gustavo Feijo, Lucas Izquierdo, Yenier Garcia, Grettel Paes Leme, Luiz Andre P. Lemos, Melissa Casanova, Marco A. Pontificia Univ Catolica Rio de Janeiro Tecgraf Inst BR-22451900 Rio de Janeiro RJ Brazil Pontificia Univ Catolica Rio de Janeiro Dept Informat BR-22451900 Rio de Janeiro RJ Brazil Univ Fed Fluminense Inst Comp BR-24210310 Niteroi RJ Brazil Univ Fed Ceara UFC BR-60440900 Fortaleza Ceara Brazil

ISBN: (纸本)9783031758713;9783031758720

This paper investigates how the model size affects the ability of a Generative AI Language Model, or briefly a GLM, to support the text-to-sql task for databases with large schemas typical of real-world applications. The paper first introduces a text-to-sql framework that combines a prompt strategy and a Retrieval-Augmented Generation (RAG) technique, leaving as flexibilization points the GLM and the database. Then, it describes a benchmark based on an open-source database featuring a schema much larger than the schemas of most of the databases in familiar text-to-sql benchmarks. The paper proceeds with experiments to assess the performance of the text-to-sql framework instantiated with the benchmark database and GLMs of different sizes. The paper concludes with recommendations to help select which GLM size is appropriate for a text-to-sql scenario, characterized by the difficulty of the expected NL questions and the data privacy requirements, among other characteristics.

关键词： text-to-sql Generative AI Language Model Retrieval-Augmented Generation Prompt engineering Real-World Databases

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：