As data exploration has increased rapidly in recent years, the datastore and dataprocessing are getting more and more attention in extracting important information. To find a scalable solution to process the large-sc...
详细信息
As data exploration has increased rapidly in recent years, the datastore and dataprocessing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructureddata in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability.
Customer Service Agents face many difficulties in finding suitable knowledge articles for resolution of customers' queries. This paper introduces Information Filtering System that helps customer service agents fin...
详细信息
ISBN:
(纸本)9781728119243
Customer Service Agents face many difficulties in finding suitable knowledge articles for resolution of customers' queries. This paper introduces Information Filtering System that helps customer service agents finding suitable knowledge articles as well as specific sections of text within knowledge articles for resolution of customers' queries through extraction of keywords from customers' interactions with customer service agents. This paper describes two approaches for resolution of queries, Coarse grain Approach that helps in finding Knowledge Article IDs and Fine grain approach helps in finding knowledge article ID and specific section with steps to be followed within that knowledge article using Domain Ontology Builder. This paper also describes two methods for annotation of Knowledge articles, dataset setup, experimental details of different algorithms with results, and list of benefits by using proposed system.
Background: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), s...
详细信息
Background: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic -based extraction pipelines gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. Objective: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. Methods: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question -and -answer scale extraction, which was tested via 2 low -parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer -grade GPU. Results: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. Conclusions: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer -grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.
Customer Service Agents face many difficulties in finding suitable knowledge articles for resolution of customers' queries. This paper introduces Information Filtering System that helps customer service agents fin...
详细信息
ISBN:
(数字)9781728119243
ISBN:
(纸本)9781728119250
Customer Service Agents face many difficulties in finding suitable knowledge articles for resolution of customers' queries. This paper introduces Information Filtering System that helps customer service agents finding suitable knowledge articles as well as specific sections of text within knowledge articles for resolution of customers' queries through extraction of keywords from customers' interactions with customer service agents. This paper describes two approaches for resolution of queries, Coarse grain Approach that helps in finding Knowledge Article IDs and Fine grain approach helps in finding knowledge article ID and specific section with steps to be followed within that knowledge article using Domain Ontology Builder. This paper also describes two methods for annotation of Knowledge articles, dataset setup, experimental details of different algorithms with results, and list of benefits by using proposed system.
暂无评论