With increasing complexity and volume of collected data continuing to rise, it is becoming ever more important to develop systems with high interactability. Businesses with an interest in big data continue to seek sol...
详细信息
ISBN:
(纸本)9781665434416
With increasing complexity and volume of collected data continuing to rise, it is becoming ever more important to develop systems with high interactability. Businesses with an interest in big data continue to seek solutions that limit cost while providing effective, simplified solutions to current issues in data retrieval. Combined analysis and application of a multi-factorial system will likely lead to promising results in ease of reporting of complex data by nontechnical end users. This survey is focused on natural language processing (NLP) implementations for data query systems, especially related to massive data sets (1TB+) in OLTP databases, OLAP databases, and data warehouses. We are seeking the most up-to-date and effective uses of NLP for Speech-to-sql and text-to-sql generation, and the most recent advancements in data warehousing to optimize ELT efficiency and data retrieval, focusing on the highest performing code implementations on the Spider and Wikisql datasets. Many models, including sequence-to-sequence (seq2seq), sequence-to-sql (Seq2sql), and fuzzy semantic to sql (F-Semtosql), among others, are briefly described and compared. As well, recent advancements in data warehousing technology like multi-disk buffering in the ELT process and hybrid multi-dimensional and relational OLAP databases (HOLAPs) are discussed. The learning gathered here is applied to fill a gap in the current industrial knowledge base in service of increased efficiency in data access, retrieval, and reporting in a customer-facing environment.
The key challenge of cross-domain context-dependent text-to-sql generation tasks lies in capturing the relation of natural language utterance and sql queries in different turns. A line of works attempt to combat this ...
详细信息
ISBN:
(纸本)9783030757656;9783030757649
The key challenge of cross-domain context-dependent text-to-sql generation tasks lies in capturing the relation of natural language utterance and sql queries in different turns. A line of works attempt to combat this challenge by capturing the overlaps among consecutively generated sql queries. Existing models sequentially generate the sql query for a single turn and model the sql overlaps via copying tokens or segments generated in previous turns. However, they are not flexible enough to capture various overlapping granularities, e.g., columns, filters, or even the whole query, as they neglect the intrinsic structures inhabited in sql queries. In this paper, we employ tree-structured intermediate representations of sql queries, i.e., SemQL, for sql generation and propose a novel subtree-copy mechanism to characterize the sql overlaps. At each turn, we encode the interaction questions and previously generated trees as context and decode the SemQL tree in a top-down fashion. Each node is either generated according to SemQL grammar or copied from previously generated SemQL subtrees. Our model can capture various overlapping granularities by copying nodes at different levels of SemQL trees. We evaluate our approach on the SParC dataset and the experimental results show the superior performance of our model compared with state-of-the-art baselines.
Back in 1970's, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs...
详细信息
ISBN:
(纸本)9783959773126
Back in 1970's, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs) became the holy grail for the database community. Different approaches have been proposed from the database, machine learning and NLP communities. Interest in the topic has had its peaks and valleys. After a long and adventurous journey of almost 50 years, there is a rekindled interest in NLIDBs in recent years, fueled by the need for democratizing data access and by the recent advances in deep learning and natural language processing in particular. There is a surge of works on natural language interfaces for databases using neural translation, and suddenly it becomes hard to keep up with advancements in the field. Are we close to finding the holy grail of data access? What are the lurking challenges that we need to surpass and what research opportunities arise? Finally, what is the role of the database community?
This paper presents the development process of a natural language to sql model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, ha...
详细信息
ISBN:
(纸本)9798350358810;9798350358803
This paper presents the development process of a natural language to sql model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73% and 84% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used successfully on a daily basis. The approach used in the model development could be implemented in a similar fashion for other database environments and with a more powerful pre-trained language model.
There is a requirement of automated Space-craft Health monitoring and mission maintenance System which is able to process Natural-Language Query and revert back in required format for which size of space database is a...
详细信息
ISBN:
(纸本)9798350386813;9798350386820
There is a requirement of automated Space-craft Health monitoring and mission maintenance System which is able to process Natural-Language Query and revert back in required format for which size of space database is a hurdle. Hence, we propose an end-to-end customizable real-time pipeline for space mission health monitoring, utilizing LLM that addresses issue of very large databases by extracting only relevant columns in initial stages of pipeline itself leveraginf BERT for NER, LLM for fetching schema and PandasAI to execute these queries on large datasets efficiently, producing user-friendly outputs. The pipeline is robust, space-efficient, and customizable, offering features such as cross-table referencing and handling same feature names in multiple tables. We achieved 70% realtime accuracy.
Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform ...
详细信息
Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use sql as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-sql problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.
With the continuous deepening of energy transformation and the continuous promotion of electricity marketization reform, the structural form, system characteristics, and operational organization of the power system ha...
详细信息
暂无评论