From the sentiment analysis of social media to the monitoring of public opinion in the financial field, the application of naturallanguage analysis continues to expand, and people's demand for naturallanguage pr...
详细信息
In recent years, naturallanguageprocessing has become a crucial research direction in artificial intelligence. The Transformer model, in particular, has emerged as a foundational model in the field of natural langua...
详细信息
Continual event extraction is a practical task in naturallanguageprocessing that requires models to learn quickly from new event types and data sources without forgetting pre-existing knowledge. It is important sinc...
详细信息
Large language Models (LLMs) have emerged as strategic in the promotion of naturallanguageprocessing (NLP) to the extent of having machines translate languages as well as making logical deductions. The role of the o...
详细信息
Large language models are crucial for processing social media data, aiding in recommendation systems, sentiment analysis, and more. This study proposes a method to enhance sentiment analysis accuracy by combining word...
详细信息
ISBN:
(纸本)9798350374353;9798350374346
Large language models are crucial for processing social media data, aiding in recommendation systems, sentiment analysis, and more. This study proposes a method to enhance sentiment analysis accuracy by combining word embeddings with insights from a language-based model. Focused on consumer reviews, sentiments are categorized based on extreme polarities, excluding intermediate ratings. Empirical assessments show superior outcomes when combining opinions, summaries, and pre-processed summaries. Machine learning algorithms are applied to hybrid vectors to classify sentiments accurately, distinguishing between positive and negative viewpoints. This approach offers nuanced sentiment analysis within consumer feedback, facilitating understanding of product reviews. Overall, the method integrates feature extraction and machine learning to analyze binary polar data, providing sentiment estimates in a concise vector form, with promising implications for sentiment analysis and consumer sentiment understanding.
Large language Models (LLMs) have demonstrated remarkable performance in various application domains, largely due to their self-supervised pre-training on extensive high-quality text datasets. However, despite the imp...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Large language Models (LLMs) have demonstrated remarkable performance in various application domains, largely due to their self-supervised pre-training on extensive high-quality text datasets. However, despite the importance of constructing such datasets, many leading LLMs lack documentation of their dataset construction and training procedures, leaving LLM practitioners with a limited understanding of what makes a high-quality training dataset for LLMs. To fill this gap, we initially identified 18 characteristics of high-quality LLM training datasets, as well as 10 potential data pre-processing methods and 6 data quality assessment methods, through detailed interviews with 13 experienced LLM professionals. We then surveyed 219 LLM practitioners from 23 countries across 5 continents. We asked our survey respondents to rate the importance of these characteristics, provide a rationale for their ratings, specify the key data pre-processing and data quality assessment methods they used, and highlight the challenges encountered during these processes. From our analysis, we identified 13 crucial characteristics of high-quality LLM datasets that receive a high rating, accompanied by key rationale provided by respondents. We also identified some widely-used data pre-processing and data quality assessment methods, along with 7 challenges encountered during these processes. Based on our findings, we discuss the implications for researchers and practitioners aiming to construct high-quality training datasets for optimizing LLMs.
Large language models (LLMs) have shown great promise for capturing contextual information in naturallanguageprocessing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Large language models (LLMs) have shown great promise for capturing contextual information in naturallanguageprocessing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. We model the multi-modal decoding process probabilistically and perform joint acoustic and lexical beam searches to incorporate cues from both modalities: audio and text. Our experiments demonstrate that infusing lexical knowledge from the LLM into an acoustics-only diarization system improves the overall speaker-attributed word error rate (SA-WER). The experimental results show that LLMs can provide complementary information to acoustic models for the speaker diarization task via the proposed beam search decoding approach showing up to 39.8% relative delta-SA-WER improvement from the baseline system. Thus, we substantiate that the proposed technique is able to exploit contextual information that is inaccessible to acoustics-only systems which is represented by speaker embeddings. In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech-processing tasks by capturing semantic and contextual cues.
India has many different spoken languages. The majority of the populace communicates in their native tongues. In every device, support for these languages is available. Hindi language posts and material are becoming m...
详细信息
In the field of space science and utilization, we have constructed a domain knowledge graph based on text data. However, a significant problem with this knowledge graph is the absence of a large number of domain-relev...
详细信息
Neural Networks can learn naturallanguage representations within levels of abstraction, and have recently shown much promise for naturallanguageprocessing (NLP) applications. Kinds of Neural Network language Models...
详细信息
暂无评论