Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant i...
详细信息
Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-sql benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSSQueries dataset consisting of natural language utterances and accompanying sql queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 sql queries;each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained sqlNet and Ratsql baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and sql queries is hosted at ***/s/75ed49ef01ac2f83b3e2. (C) 2022 The Authors. Published by Elsevier Inc.
text-to-sql, a computational linguistics task, seeks to facilitate the conversion of natural language queries into sql queries. Recent methodologies have leveraged the concept of slot-filling in conjunction with prede...
详细信息
text-to-sql, a computational linguistics task, seeks to facilitate the conversion of natural language queries into sql queries. Recent methodologies have leveraged the concept of slot-filling in conjunction with predetermined sql templates to effectively bridge the semantic gap between natural language questions and structured database queries, achieving commendable performance by harnessing the power of multi-task learning. However, employing identical features across diverse tasks is an ill-suited practice, fraught with inherent drawbacks. Firstly, based on our observation, there are clear boundaries in the natural language corresponding to SELECT and WHERE clauses. Secondly, the exclusive features integral to each subtask are inadequately emphasized and underutilized, thereby hampering the acquisition of discriminative features for each specific subtask. In an endeavor to rectify these issues, the present work introduces an innovative approach: the hierarchical feature decoupling model for sql query generation from natural language. This novel approach involves the deliberate separation of features pertaining to subtasks within both SELECT and WHERE clauses, further dissociating these features at the subtask level to foster better model performance. Empirical results derived from experiments conducted on the Wikisql benchmark dataset reveal the superiority of the proposed approach over several state-of-the-art baseline methods in the context of text-to-sql query generation.
This study examines how conversational business analytics can bridge the skill gap of end users that hinders traditional self-service analytics. By leveraging generative AI, conversational business analytics enables e...
详细信息
This study examines how conversational business analytics can bridge the skill gap of end users that hinders traditional self-service analytics. By leveraging generative AI, conversational business analytics enables end users to independently retrieve data, process it, and generate information. Using text-to-sql as an example, this study proposes theoretical models grounded in expected utility theory to examine two levels of AI support: partial support, where AI translates natural language requests into sql and the generated information serves directly as the basis for decision-making, and full support, which includes an additional validation step. The models define conditions where AI-driven information generation surpasses human delegation. These conditions underscore the critical interplay between AI accuracy and validation effectiveness as pivotal factors for the successful integration of AI. The findings suggest that partial support is viable when the AI accuracy is sufficiently high. In contrast, full support necessitates both adequate accuracy and robust validation. Insufficient validation impairs decisions, highlighting the need for effective validation techniques to fully leverage conversational business analytics. Moreover, the dependence on user-driven validation introduces additional risks, as its effectiveness is contingent on the user's experience or familiarity with sql and underlying data structures. This insight challenges conventional validation techniques for AI-generated information and highlights the need to use techniques that reduce the reliance on the technical expertise of end users.
The objective of the Structure Query Language (sql) project is to transform spoken language into sql commands that can be executed. Typically, creating models that generate sql requires paired examples of sql code and...
详细信息
Database application is at the core of most web application systems such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repository o...
详细信息
Database application is at the core of most web application systems such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repository of various fields. However, the usage of these database systems for data and information retrieval is severely limited because of lacking support for processing search queries expressed in a natural language (NL). Most web interfaces for databases today only take search queries entered in some form of logical combination of keywords or text strings, which restrict the scope and depth of what a web user really wants to search for, even though natural language based data or information retrieval has made significant advances in recent years. To overcome or at least to alleviate such limitation in web information services, we propose in this article an improved neural model based on an existing framework IRNet for NL query of databases, in which a representation of Gated Graph Neural Network (GGNN) is introduced to encode the database entities and relations. We also represent and use the database values in the prediction model to identify and match table and column names for automatic synthesize a correct sql statement from a query expressed in a NL sentence. Experiments with a public dataset demonstrates the promising potential of our approach.
Understanding the complexity of the translation of Natural Language (NL) sentences to sql queries becomes an essential part in the resolution process. The majority of the proposed models either focus on simple queries...
详细信息
Understanding the complexity of the translation of Natural Language (NL) sentences to sql queries becomes an essential part in the resolution process. The majority of the proposed models either focus on simple queries or suffer when exposed to unseen domains or new schemas structures;This can be understood as the greater part of solutions are based on limited datasets or treat the problem in an end-to-end perspective. Our previously proposed model which is sqlSketch that provides an intelligent method for handling complex queries was able to outperform all the state-of-the-art models on the Greatsql dataset. This paper addresses the problem of translating NL sentences to sql queries in an effective way by leveraging our previous sqlSketch model with a type aware layer, a values classification method as well as a compatibility based module that enhance the quality of the predicted items (sqlSketch-TVC). We evaluate the new model using the Components and Exact matching metrics. The results show that sqlSketch-TVC outperforms the other models on all sql components and provides a novel way for inferring values from the input Question.
Recently, numerous studies have been proposed to attack the natural language interfaces to data-bases (NLIDB) problem by researchers either as a conventional pipeline-based or an end-to-end deep-learning-based solutio...
详细信息
Recently, numerous studies have been proposed to attack the natural language interfaces to data-bases (NLIDB) problem by researchers either as a conventional pipeline-based or an end-to-end deep-learning-based solution. Although each approach has its own advantages and drawbacks, regardless of the approach preferred, both approaches exhibit black-box nature, which makes it difficult for potential users to comprehend the rationale behind the decisions made by the intelligent system to produce the translated sql. Given that NLIDB targets users with little to no technical background, having interpretable and explainable solutions becomes crucial, which has been overlooked in the recent studies. To this end, we propose xDBTagger, an explainable hybrid translation pipeline that explains the decisions made along the way to the user both textually and visually. We also evaluate xDBTagger quantitatively in three real-world relational databases. The evaluation results indicate that in addition to being lightweight, fast, and fully explainable, xDBTagger is also competitive in terms of translation accuracy compared to both pipeline-based and end-to-end deep learning approaches.
A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine ...
详细信息
A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.
As a crucial task in natural language processing, table question answering has garnered significant attention from both the academic and industrial communities. It enables intelligent querying and question answering o...
详细信息
As a crucial task in natural language processing, table question answering has garnered significant attention from both the academic and industrial communities. It enables intelligent querying and question answering over structured data by translating natural language into corresponding sql statements. Recently, there have been notable advancements in the general domain table question answering task, achieved through prompt learning with large language models. However, in specific domains, where tables often have a higher number of columns and questions tend to be more complex, large language models are prone to generating invalid sql or Nosql statements. To address the above issue, this paper proposes a novel few-shot table prompt question answering approach. Specifically, we design a prompt template construction strategy for structured sql generation. It utilizes prompt templates to restructure the input for each test data and standardizes the model output, which can enhance the integrity and validity of generated sql. Furthermore, this paper introduces a contrastive exemplar selection approach based on the question patterns and formats in domain-specific contexts. This enables the model to quickly retrieve the relevant exemplars and learn characteristics about given question. Experimental results on the two datasets in the domains of electric energy and structural inspection show that the proposed approach outperforms the baseline models across all comparison settings.
Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and the effectiveness of studies. However, retrieving eligible patients for a trial from the ...
详细信息
ISBN:
(纸本)9791095546344
Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and the effectiveness of studies. However, retrieving eligible patients for a trial from the electronic health record (EHR) database remains a challenging task for clinicians since it requires not only medical knowledge about eligibility criteria, but also an adequate understanding of structured query language (sql). In this paper, we introduce a new dataset that includes the first-of-its-kind eligibility-criteria corpus and the corresponding queries for criteria-to-sql (Criteria2sql), a task translating the eligibility criteria to executable sql queries. Compared to existing datasets, the queries in the dataset here are derived from the eligibility criteria of clinical trials and include Order-sensitive, Counting-based, and Boolean-type cases which are not seen before. In addition to the dataset, we propose a novel neural semantic parser as a strong baseline model. Extensive experiments show that the proposed parser outperforms existing state-of-the-art general-purpose text-to-sql models while highlighting the challenges presented by the new dataset. The uniqueness and the diversity of the dataset leave a lot of research opportunities for future improvement.
暂无评论