检索结果-内蒙古大学图书馆

A Multiple-Integration Encoder for Multi-Turn text-to-sql Semantic Parsing

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2021年 29卷 1503-1513页

作者： Wang, Run-Ze Ling, Zhen-Hua Zhou, Jing-Bo Hu, Yu Univ Sci & Technol China Natl Engn Lab Speech & Language Informat Hefei 230027 Peoples R China Baidu Res Business Intelligence Lab Beijing 100084 Peoples R China Univ Sci & Technol China Hefei 230027 Peoples R China iFLYTEK Res Hefei 230088 Peoples R China

This paper studies multi-turn text-to-sql generation, which is a new but important task in semantic parsing. In order to deal with its two challenges, i.e., multi-turn interaction and cross-domain evaluation, this paper proposes a multiple-integration encoder, which derives the vector representations of user utterances and database schemas using three custom-designed modules for information integration. First, an utterance representation enhancing module is built to integrate the information of history utterances into the representation of each token in current utterance by attentive selection. Second, a schema discrepancy enhancing module is designed to integrate previous predicted sql query into the representation of schema items. Third, a latent schema linking module is employed to integrate schema information into utterance representations for better dealing with unseen database schemas. These three modules are all implemented based on a lightweight multi-head attention mechanism, which reduces the number of parameters in conventional multi-head attention. Experimental results on the SParC dataset show that our method achieved better accuracy of multi-turn text-to-sql generation than the most advanced benchmarks. Further ablations studies and analysis also demonstrate the effectiveness of the three modules designed for information integration in the encoder.

关键词： Task analysis Structured Query Language Decoding Databases Semantics History Bit error rate text-to-sql cross-domain multi-turn encoder-decoder lightweight multi-head attention

来源：评论

学校读者我要写书评

暂无评论

A survey on deep learning approaches for text-to-sql

引用

VLDB JOURNAL 2023年第4期32卷 905-936页

作者： Katsogiannis-Meimarakis, George Koutrika, Georgia Athena Res Ctr Athens Greece

To bridge the gap between users and data, numerous text-to-sql systems have been developed that allow users to pose natural language questions over relational databases. Recently, novel text-to-sql systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open making this area an active and flourishing field of research and development. To make real progress in building text-to-sql systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. The purpose of this survey is to present a detailed taxonomy of neural text-to-sql systems that will enable a deeper study of all the parts of such a system. This taxonomy will allow us to make a better comparison between different approaches, as well as highlight specific challenges in each step of the process, thus enabling researchers to better strategise their quest towards the "holy grail" of database accessibility.

关键词： text-to-sql Deep learning Natural language processing Natural language interface for databases

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Exploration on Spider with Fuzzy Decision text-to-sql Model

引用

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2020年第4期16卷 2542-2550页

作者： Li, Qing Li, Lili Li, Qi Zhong, Jiang Chongqing Univ Coll Comp Sci Chongqing 400044 Peoples R China Chongqing Univ Sch Civil Engn Chongqing 400044 Peoples R China Shaoxing Univ Dept Comp Sci & Engn Shaoxing 312000 Peoples R China

The challenge of natural language processing is from natural language to logical form (sql). In this article, we present an fuzzy semantic to structured query language (F-Semtosql) neural approach that is a fuzzy decision semantic deep network query model based on demand aggregation. It aims to address the problem of the complex and cross-domain text-to-sql generation task. The corpus is trained as the input word vector of the model with LSTM and Word2Vec embedding technology. Combined with the dependency graph method, the problem of sql statement generation is converted to slot filling. Complex tasks are divided into four levels via F-Semtosql and constructed by the need of aggregation. At the same time, to avoid the order problem in the traditional model effectively, we have adopted the attention mechanism and used a fuzzy decision mechanism to improve the model decision. On the challenging text-to-sql benchmark Spider and the other three datasets, F-Semtosql achieves faster convergence and occupies the first position.

关键词： Fuzzy decision fuzzy semantic deep network natural language processing (NLP) text-to-sql

来源：评论

学校读者我要写书评

暂无评论

Assessing the utility of text-to-sql approaches for satisfying software developer information needs

引用

EMPIRICAL SOFTWARE ENGINEERING 2024年第1期29卷 1-48页

作者： Tomova, Mihaela Hofmann, Martin Huetterer, Constantin Maeder, Patrick Tech Univ Ilmenau D-98693 Ilmenau Germany Friedrich Schiller Univ Fac Biol Sci D-07743 Jena Germany

Software analytics integrated with complex databases can deliver project intelligence into the hands of software engineering (SE) experts for satisfying their information needs. A new and promising machine learning technique known as text-to-sql automatically extracts information for users of complex databases without the need to fully understand the database structure nor the accompanying query language. Users pose their request as so-called natural language utterance, i.e., question. Our goal was evaluating the performance and applicability of text-to-sql approaches on data derived from tools typically used in the workflow of software engineers for satisfying their information needs. We carefully selected and discussed five seminal as well as state-of-the-art text-to-sql approaches and conducted a comparative assessment using the large-scale, cross-domain Spider dataset and the SE domain-specific SEOSS-Queries dataset. Furthermore, we study via a survey how SE professionals perform in satisfying their information needs and how they perceive text-to-sql approaches. For the best performing approach, we observe a high accuracy of 94% in query prediction when training specifically on SE data. This accuracy is almost independent of the query's complexity. At the same time, we observe that SE professionals have substantial deficits in satisfying their information needs directly via sql queries. Furthermore, SE professionals are open for utilizing text-to-sql approaches in their daily work, considering them less time-consuming and helpful. We conclude that state-of-the-art text-to-sql approaches are applicable in SE practice for day-to-day information needs.

关键词： Software analytics Database querying Natural language processing text-to-sql Machine learning Complex queries

来源：评论

学校读者我要写书评

暂无评论

Bravely Say I Don't Know: Relational Question-Schema Graph for text-to-sql Answerability Classification

引用

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 2023年第4期22卷 1-18页

作者： Yu, Wei Yang, Haiyan Wang, Mengzhu Wang, Xiaodong 30th Res Inst China Elect Technol Grp Corp Chuangye Rd Chengdu 610041 Peoples R China Sichuan Minzu Coll Wenhua Rd Kangding Peoples R China Natl Univ Def Technol Deya Rd Changsha Peoples R China

Recently, the text-to-sql task has received much attention. Many sophisticated neural models have been invented that achieve significant results. Most current work assumes that all the inputs are legal and the model should generate an sql query for any input. However, in the real scenario, users are allowed to enter the arbitrary text that may not be answered by an sql query. In this article, we focus on the issue-answerability classification for the text-to-sql system, which aims to distinguish the answerability of the question according to the given database schema. Existing methods concatenate the question and the database schema into a sentence, then fine-tune the pre-trained language model on the answerability classification task. In this way, the database schema is regarded as sequence text that may ignore the intrinsic structure relationship of the schema data, and the attention that represents the correlation between the question token and the database schema items is not well designed. To this end, we propose a relational Question-Schema graph framework that can effectively model the attention and relation between question and schema. In addition, a conditional layer normalization mechanism is employed to modulate the pre-trained language model to generate better question representation. Experiments demonstrate that the proposed framework outperforms all existing models by largemargins, achieving new state of the art on the benchmark TRIAGEsql. Specifically, the model attains 88.41%, 78.24%, and 75.98% in Precision, Recall, and F1, respectively. Additionally, it outperforms the baseline by approximately 4.05% in Precision, 6.96% in Recall, and 6.01% in F1.

关键词： text-to-sql answerability classification relational graph

来源：评论

学校读者我要写书评

暂无评论

ER-sql: Learning enhanced representation for text-to-sql using table contents

引用

NEUROCOMPUTING 2021年 465卷 359-370页

作者： Guo, Aibo Zhao, Xiang Ma, Wubin Natl Univ Def Technol Deya Rd 109 Changsha Hunan Peoples R China

text-to-sql emerges to play an important role in interactive data analysis, which provides a friendly interface for converting natural language into relational database language (i.e., sql). In order to translate a user's query into an executable sql statement, semantic parsing is essential to the transformation process. In particular, existing efforts provide some feasible solutions, and state-of-the-art models mainly adopt the sketch-based paradigm such that template values are to be filled. To this end, most methods extract values based on column representations. However, if the query contains multiple values that belong to different columns, these methods may fail to extract the values accurately. Moreover, it can be difficult to infer the right values when the query does not explicitly mention the corresponding column names. To bridge the gap, we propose a novel neural architecture, namely, ER-sql for learning enhanced representations for text-to-sql. Based on pre-trained model BERT, ER-sql uses column contents to better extract features of columns. Moreover, ER-sql harnesses the column representations to latently reformulate the query. To verify the effectiveness of ER-sql, comprehensive experiments demonstrate that ER-sql achieves better results than existing models on the benchmark dataset Wikisql, as well as on a representative Chinese dataset TableQA. (C) 2021 The Author(s). Published by Elsevier B.V.

关键词： text-to-sql Semantic parsing Pre-trained language model Enhanced representation

来源：评论

学校读者我要写书评

暂无评论

Towards text-to-sql over Aggregate Tables

引用

Data Intelligence 2023年第2期5卷 457-474页

作者： Shuqin Li Kaibin Zhou Zeyang Zhuang Haofen Wang Jun Ma College of Design and Innovation Tongji UniversityShanghai200092 School of Software Tongji UniversityShanghai201804 School of Automotive Studies Tongji UniversityShanghai201804

text-to-sql aims at translating textual questions into the corresponding sql *** tables are widely created for high-frequent *** text-to-sql has emerged as an important task,recent studies paid little attention to the task over aggregate *** increased aggregate tables bring two challenges:(1)mapping of natural language questions and relational databases will suffer from more ambiguity,(2)modern models usually adopt self-attention mechanism to encode database schema and *** mechanism is of quadratic time complexity,which will make inferring more time-consuming as input sequence length *** this paper,we introduce a novel approach named WAGG for text-to-sql over aggregate *** effectively select among ambiguous items,we propose a relation selection mechanism for relation *** deal with high computation costs,we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate *** also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation,where extensive experiments show the effectiveness and efficiency of our proposed method with 4%increase of accuracy and 15%decrease of inference time w.r.t a strong baseline RAT-sql.

关键词： text-to-sql Question Answering Business Intelligence Deep Learning

来源：评论

学校读者我要写书评

暂无评论

N-BEST HYPOTHESES RERANKING FOR text-to-sql SYSTEMS

N-BEST HYPOTHESES RERANKING FOR TEXT-TO-SQL SYSTEMS

引用

IEEE Spoken Language Technology Workshop (SLT)

作者： Zeng, Lu Parthasarathi, Sree Hari Krishnan Hakkani-Tur, Dilek Amazon Alexa Seattle WA 98109 USA

ISBN: (纸本)9798350396904

text-to-sql task maps natural language utterances to structured queries that can be issued to a database. State-of-theart (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a sql parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7:7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2:5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.

关键词： text-to-sql Semantic parsing

来源：评论

学校读者我要写书评

暂无评论

sql-to-Schema Enhances Schema Linking in text-to-sql 35th

SQL-to-Schema Enhances Schema Linking in Text-to-SQL

引用

35th International Conference on Database and Expert Systems Applications (DEXA)

作者： Yang, Sun Su, Qiong Li, Zhishuai Li, Ziyue Mao, Hangyu Liu, Chenxi Zhao, Rui Peking Univ Beijing Peoples R China Guizhou Univ Guiyang Guizhou Peoples R China SenseTime Res Shanghai Peoples R China Nanyang Technol Univ Singapore Singapore

ISBN: (纸本)9783031683084;9783031683091

Sophisticated text-to-sql methods often face errors, such as schema-linking errors, join errors, nested errors, and group-by errors. To mitigate these, it's crucial to filter out unnecessary tables and columns, focusing the language model on relevant ones. Previous methods have attempted to sort tables and columns based on relevance or directly identify necessary elements, but these approaches suffer from long training times, high costs with GPT-4 tokens, or poor schema linking performance. We propose a two-step schema linking method: first, generate an initial sql query using the full database schema;then, extract the relevant tables and columns to form a concise schema. This method, tested with Code Llama and GPT-4, shows optimal performance compared to mainstream methods on the Spider dataset, reducing errors and improving efficiency in sql generation.

关键词： text-to-sql Schema Linking Large Language Model

来源：评论

学校读者我要写书评

暂无评论

Graph Reasoning Enhanced Language Models for text-to-sql 47

Graph Reasoning Enhanced Language Models for Text-to-SQL

引用

47th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

作者： Gong, Zheng Sun, Ying Hong Kong Univ Sci & Technol Thrust Artificial Intelligence Guangzhou Guangdong Peoples R China

ISBN: (纸本)9798400704314

text-to-sql parsing has attracted substantial attention recently due to its potential to remove barriers for non-expert end users interacting with databases. A key challenge in text-to-sql parsing is developing effective encoding mechanisms to capture the complex relationships between question words, database schemas, and their associated connections within the heterogeneous graph structure. Existing approaches typically introduce some useful multi-hop structures manually and then incorporate them into graph neural networks (GNNs) by stacking multiple layers, which (1) ignore the difficult-to-identify but meaningful semantics embedded in the multi-hop reasoning path, and (2) are limited by the expressive capability of GNN to capture long-range dependencies among the heterogeneous graph. To address these shortcomings, we introduce GRL-sql, a graph reasoning enhanced language model, which innovatively applies structure encoding to capture the dependencies between node pairs, encompassing one-hop, multi-hop and distance information, subsequently enriched through self-attention for enhanced representational power over GNNs. Furthermore, GRL-sql incorporates an interaction module that enables joint reasoning and fusion over the question-schema representations for enhancing global context modeling. Comprehensive experiments demonstrate the effectiveness and robustness of our proposed GRL-sql.

关键词： text-to-sql Graph neural network Language model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：