Supervised spelling error correction models have achieved outstanding performances on rich-source languages. However, these models are difficult to directly apply to Vietnamese spelling correction due to the corpus sc...
详细信息
ISBN:
(纸本)9789819794393;9789819794409
Supervised spelling error correction models have achieved outstanding performances on rich-source languages. However, these models are difficult to directly apply to Vietnamese spelling correction due to the corpus scarcity. To address this issue, we first construct a basic high-quality Vietnamese Spelling Correction (ViSC) corpus via automatic speech recognition (ASR) generation and human annotation. Then, we propose a part-of-speech and confusion-set double-constrained method to mimic the practical error distribution and use them as external knowledge to guide the large language models (LLMs) to construct diverse pseudo data. Finally, we exploit pseudo corpora to pre-train and ViSC corpus to fine-tune spelling error correction models. Experiments on the benchmark dataset show that our proposed corpus construction method consistently outperforms various baselines, leading to state-of-the-art results on all Vietnamese-specific pre-trained language model-enhanced spelling correction models. Detailed analysis demonstrates that part-of-speech and confusion-set are complementary and significant in controlling a stable and diverse corpus generation. In-depth comparison experiments reveal that the proper utilization of pseudo corpus is essential for improving Vietnamese spelling error correction. Besides, we release our codes and constructed corpus at https://***/DarkFanta3y/VSEC_corpus to facilitate future research.
The proceedings contain 14 papers. The topics discussed include: LLM-based SPARQL query generation from naturallanguage over federated knowledge graphs;a benchmark for the detection of metalinguistic disagreements be...
The proceedings contain 14 papers. The topics discussed include: LLM-based SPARQL query generation from naturallanguage over federated knowledge graphs;a benchmark for the detection of metalinguistic disagreements between LLMs and knowledge graphs;assessing large language models for SPARQL query generation in scientific question answering;benchmarking ontology validation capabilities of LLMs;ontology corpora for LLM-based knowledgeengineering research;information for conversation generation: proposals utilizing knowledge graphs;OAEI-LLM: a benchmark dataset for understanding large language model hallucinations in ontology matching;a comprehensive benchmark for evaluating LLM-generated ontologies;and hybrid evaluation of Socratic dialogue for teaching.
Transformer-based pre-trained language models have dominated the field of naturallanguageprocessing (NLP) for quite some time now. However, the Nepali language, spoken by approximately 32 million people worldwide, r...
详细信息
Anomaly-based detection is effective against evolving insider threats but still suffers from low precision. Current data processing can result in information loss, and models often struggle to distinguish between beni...
详细信息
Large language models (LLMs) have shown remarkable abilities in different fields, including standard naturallanguageprocessing (NLP) tasks. To elicit knowledge from LLMs, prompts play a key role, consisting of natur...
详细信息
The integration of Geographic Information Systems (GIS) with naturallanguageprocessing (NLP) has transformed how spatial data is accessed and utilized. By enabling GIS interfaces to interpret naturallanguage querie...
详细信息
Quality inspection in the production of modular construction (MC) is paramount for the success of MC projects (e.g., geometry errors are sensitive for installation). However, automatic and accurate inspection task gen...
详细信息
Protein engineering is important for biomedical applications, but conventional approaches are often inefficient and resource-intensive. While deep learning (DL) models have shown promise, their training or implementat...
详细信息
Food recommendation systems help consumers make sustainable and nutritionally complete choices, promoting healthy eating habits and addressing the growing interest in food sustainability and waste reduction. Large Lan...
详细信息
Large language models (LLMs) are increasingly deployed for general problem-solving across various domains yet remain constrained to chaining immediate reasoning steps and depending solely on parametric knowledge. Inte...
详细信息
暂无评论