this paper explores combining large language models and retrieval-augmented generation (RAG) techniques to build a question-answering system for academic papers. the system is built upon Qwen2.5 models and open-source...
详细信息
ISBN:
(数字)9798331541750
ISBN:
(纸本)9798331541767
this paper explores combining large language models and retrieval-augmented generation (RAG) techniques to build a question-answering system for academic papers. the system is built upon Qwen2.5 models and open-source tools like LlamaIndex, utilizing the arXiv dataset to implement an end-to-end pipeline from user queries to relevant answers through models including query routing, hybrid retrieval, and answer generation. In the retrieval phase, the system leverages boththe BGE-M3 embedding model and the BGE-Reranker-V2-M3 reranking model. the generation stage integrates knowledge from multiple sources using a fine-tuned Qwen2.5 model. Experiments show significant improvements in retrieval accuracy and answer coherence by incorporating Reranker and RAG-Fusion. this study highlights the potential of LLM and RAG technologies in academic applications and offers insights for enhancing paper question-answering systems.
In the NLP field,the mainstream practice in processing a text is through the syntactic and semantic analysis of its componential ***,more and more researchers have come to realize that without the help of the context,...
详细信息
In the NLP field,the mainstream practice in processing a text is through the syntactic and semantic analysis of its componential ***,more and more researchers have come to realize that without the help of the context,difficulties like ambiguity,ellipsis and anaphora can hardly get really *** this paper a new approach of context formalization is introduced-the Context Frame,which is part of the Hierarchical Network of Concepts(HNC) *** comparing Conceptual Dependency(CD) theory,Frame Semantics and HNC Context Frame theory,we analyzed the advantages of the HNC theory in text processing,introduced the application of the theory,and at last we briefly introduced how Context Frame is used to resolve ellipses in Chinese texts.
this paper describes MILER (Multi-modal data Logger for Evaluation and Report). a web-based multi-service monitoring, logging ami reporting tool for advanced multi-modal dialog systems. MILER has been designed to dire...
详细信息
this paper describes MILER (Multi-modal data Logger for Evaluation and Report). a web-based multi-service monitoring, logging ami reporting tool for advanced multi-modal dialog systems. MILER has been designed to directly arrange and synchronize logging dala collected from live services and to provide real-time reports about service usage and system perfbrmance. Special attention has been given to the architecture design in order to achieve service and access-device independence and reliable synchronization of data from distributed logs. MILER allows researchers to analyze multimodal interactions, analyze the call flow. reconstruct the system/user dialogue turns, play the recorded user utterances. and provide a preliminary dialogue performance evaluation. It also supports labeling and annotation of the dialogue turns for further offline analysis. Once the user inputs (i.e. speech and other input modalities) are manually transcribed and labeled. along with detailed log events from each dialog. MILER derives a set of objective measures. which includes word and concept accuracy. number of attempts per concept, dialog turn counts and duration, and task completion rates. Subjectivemeasures extracted from user's surveys, including perceived task success and ease of use measures, can be combined withthe objective measures and the results used later for accuracy compination
Legal systems worldwide vary in structure and principles, reflecting the diverse legal traditions of different countries. the legal system, inherently complex and reliant on meticulous documentation, often faces chall...
详细信息
ISBN:
(数字)9798350386349
ISBN:
(纸本)9798350386356
Legal systems worldwide vary in structure and principles, reflecting the diverse legal traditions of different countries. the legal system, inherently complex and reliant on meticulous documentation, often faces challenges related to time-consuming manual processes and the potential for human errors. the system proposed provides a transformative solution to the above problems. the system emerges as a groundbreaking solution within the intricate landscape of legal systems which responds to these challenges by seamlessly integrating advanced AI techniques. At its core, OpenAI embeddings takes center stage, demonstrating unparalleled proficiency in document generation, comprehension, and abnormality detection, addressing the complexities ingrained in legal documentation. In contrast to traditional approaches, this system maximizes the versatility of ChatGPT 3.5, allowing it to not only issue commands but also proficiently generate a diverse array of legal documents. By incorporating an understanding module equipped with PyPDF, Amazon Textract, and langchain utilities, the system adeptly handles document intricacies. the utilization of OpenAI Embeddings further enhances natural language understanding. Leveraging sentiment analysis and Named Entity Recognition (NER) in its natural languageprocessing (NLP) toolkit, the system employs an intuitive web interface for irregularities detection. the exploration of AI for automated irregularity detection showcases its transformative potential in ensuring document accuracy within the legal domain. this project, therefore, stands as a beacon of innovation, promising to reshape the dynamics of legal document processing by merging advanced AI capabilities withthe unique demands of legal systems.
this paper presents some recent enhancement to Bell Labs Speech technology Integration Platform (BLSTIP), a common platform to integrate Bell Lab's speech, telephony, Internet, and dialogue technologies for spoken...
详细信息
this paper presents some recent enhancement to Bell Labs Speech technology Integration Platform (BLSTIP), a common platform to integrate Bell Lab's speech, telephony, Internet, and dialogue technologies for spoken and multi-modal dialogue system research and prototyping. Last year, we introduced BLSTIP to our partners as a speech technology platform for collaborative research and new application development. BLSTIP software is packaged as a single, network downloadable installation file. It supports a variety of speech applications such as natural language call routing/steering for call centers, natural languageinformation system, messaging system with voice user interface, speaker verification, speech application trial and data collection, etc. As an enhancement to BLSTIP, we also designed a VoiceXML (Voice eXtensible Markup language) integration infrastructure to study emerging web hosted speech applications such as voice portal, multimodal internet access, and wireless internet access.
this scholarly article conducts a comparative evaluation of prominent large-scale language models, specifically encompassing Google’s BARD, ChatGPT 3.5, and ChatGPT 4. It offers a comprehensive dissection of each mod...
详细信息
In response to the challenges of the vast and scattered content, as well as the low efficiency in learning and searching within the safety regulations in the field of electric power, an intelligent question-answering ...
详细信息
ISBN:
(数字)9798350353563
ISBN:
(纸本)9798350353570
In response to the challenges of the vast and scattered content, as well as the low efficiency in learning and searching within the safety regulations in the field of electric power, an intelligent question-answering system has been designed based on large model technology. this system aims to organize knowledge related to safety regulations and improve search efficiency. the system employs natural languageprocessing techniques to categorize entries in safety regulations. It utilizes a vector database for entry vectorization and storage. the integration of large models with safety regulation entries is achieved through a large model development framework, forming a question-answering system. A file management module is constructed using a frontend framework to provide interactive functionality to the question-answering system. the performance experiment of the system on the question answering system shows that the overall accuracy of the system's answer to the safety rules and regulations in the field of electric power is more than 60%, which is professional than the general large model.
Not only in the country like Sri Lanka, but also most of the countries have numerous historically valued places and monuments that provide great survival and civilization history. While searching information about tho...
详细信息
ISBN:
(数字)9781728185019
ISBN:
(纸本)9781728185026
Not only in the country like Sri Lanka, but also most of the countries have numerous historically valued places and monuments that provide great survival and civilization history. While searching information about those places, there exists a lack of information and trusted information sources. Even though some information is available, it does not include the convenient and efficient ways to retrieve the information. the proposed system contributes a solution to the aforementioned problem with Artificial Intelligence [AI] & Deep Learning [DL] concepts. the proposed chatbot solution helps to enhance the user experience while retrieving the available information about the archeological places from the system. It can automate the searching task by enabling a methodology for chatting withthe user via a conversational interface. the proposed voice detection module, natural languageprocessing model and the dialog management model leverages a higher accuracy rate and it will showcase the power of search assistants. Furthermore, it shows how it can be an alternative to the usage of the application and enhance the user experience without any hesitation.
In this paper, a dialogue system for natural language based call steering is described and studied. the system is based on natural language speech recognition and understanding within a mixed initiative dialogue. the ...
详细信息
In this paper, a dialogue system for natural language based call steering is described and studied. the system is based on natural language speech recognition and understanding within a mixed initiative dialogue. the system is implemented on Bell Labs. Speech technology Integration Platform (BLSTIP) using dialogue and natural language understanding components from BT laboratories. A prototype system in the operator service domain [2] is described. In order to improve the acoustic and language modeling for natural language based dialogue applications, various approaches are described and studied, the structure of the dialogue manager is also presented in which mixed-initiative dialogue can be supported with efficiency. Call classification and steering experiments were performed. the results confirm the efficacy of the proposed approach.
In order to help people obtain useful information from patent documents in different languages. this paper proposes a cross-language retrieval system to search Chinese and English patent documents simultaneously. this...
详细信息
In order to help people obtain useful information from patent documents in different languages. this paper proposes a cross-language retrieval system to search Chinese and English patent documents simultaneously. this system consists of query translation module, document retrieval module and user interaction module. Query translation module is used to translate query based on bilingual dictionaries. Document retrieval module consists of monolingual retrieval system using standard vector space model. In order to retrieve in highly parallel, we use the Map Reduce model to calculate the similarity. User interaction module provides users with interactive mechanism used to improve the retrieval accuracy in the system. It contains two parts: the second translation and relevance feedback. the experimental results show that our system has good performance.
暂无评论