检索结果-内蒙古大学图书馆

xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph

VLDB JOURNAL 2023年第2期33卷 301-321页

作者： Usta, Arif Karakayali, Akifhan Ulusoy, Ozgur Univ Waterloo Waterloo ON Canada Cent Bank Republ Turkiye Ankara Turkiye Bilkent Univ Ankara Turkiye

Recently, numerous studies have been proposed to attack the natural language interfaces to data-bases (NLIDB) problem by researchers either as a conventional pipeline-based or an end-to-end deep-learning-based solution. Although each approach has its own advantages and drawbacks, regardless of the approach preferred, both approaches exhibit black-box nature, which makes it difficult for potential users to comprehend the rationale behind the decisions made by the intelligent system to produce the translated SQL. Given that NLIDB targets users with little to no technical background, having interpretable and explainable solutions becomes crucial, which has been overlooked in the recent studies. To this end, we propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. We also evaluate xDBTagger quantitatively in three real-world relational databases. The evaluation results indicate that in addition to being lightweight, fast, and fully explainable, xDBTagger is also competitive in terms of translation accuracy compared to both pipeline-based and end-to-end deep learning approaches.

关键词： natural language interface for databases NLIDB Text-to-SQL Multi-task learning Explainable artificial intelligence XAI

来源：评论

学校读者我要写书评

暂无评论

Reliable Text-to-SQL with Adaptive Abstention

引用

Proceedings of the ACM on Management of Data 2025年第1期3卷 1-30页

作者： Kaiwen Chen Yueting Chen Nick Koudas Xiaohui Yu University of Toronto Toronto ON Canada Seattle University Seattle WA USA York University Toronto ON Canada

Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient *** present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking *** validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.

关键词： natural language interface for databases reliable generation test to sql

来源：评论

学校读者我要写书评

暂无评论

A survey on deep learning approaches for text-to-SQL

引用

VLDB JOURNAL 2023年第4期32卷 905-936页

作者： Katsogiannis-Meimarakis, George Koutrika, Georgia Athena Res Ctr Athens Greece

To bridge the gap between users and data, numerous text-to-SQL systems have been developed that allow users to pose natural language questions over relational databases. Recently, novel text-to-SQL systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open making this area an active and flourishing field of research and development. To make real progress in building text-to-SQL systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. The purpose of this survey is to present a detailed taxonomy of neural text-to-SQL systems that will enable a deeper study of all the parts of such a system. This taxonomy will allow us to make a better comparison between different approaches, as well as highlight specific challenges in each step of the process, thus enabling researchers to better strategise their quest towards the "holy grail" of database accessibility.

关键词： Text-to-SQL Deep learning natural language processing natural language interface for databases

来源：评论

学校读者我要写书评

暂无评论

CodeS: Towards Building Open-source language Models for Text-to-SQL

引用

Proceedings of the ACM on Management of Data 2024年第3期2卷 1-28页

作者： Haoyang Li Jing Zhang Hanbing Liu Ju Fan Xiaokang Zhang Jun Zhu Renjie Wei Hongyan Pan Cuiping Li Hong Chen Renmin University of China Beijing China BEIJING AI-FINANCE TECHNOLOGIES CO. LTD Beijing China

language models have shown promising performance on the task of translating natural language questions into SQL queries (Text-to-SQL). However, most of the state-of-the-art (SOTA) approaches rely on powerful yet closed-source large language models (LLMs), such as ChatGPT and GPT-4, which may have the limitations of unclear model architectures, data privacy risks, and expensive inference overheads. To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task. CodeS is a fully open-source language model, which achieves superior accuracy with much smaller parameter sizes. This paper studies the research challenges in building CodeS. To enhance the SQL generation abilities of CodeS, we adopt an incremental pre-training approach using a specifically curated SQL-centric corpus. Based on this, we address the challenges of schema linking and rapid domain adaptation through strategic prompt construction and a bi-directional data augmentation technique. We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark, the newly released BIRD benchmark, robustness-diagnostic benchmarks such as Spider-DK, Spider-Syn, Spider-Realistic, and ***, as well as two real-world datasets created for financial and academic applications. The experimental results show that our CodeS achieves new SOTA accuracy and robustness on nearly all challenging text-to-SQL benchmarks.

关键词： language model natural language interface for databases text-to-SQL

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：