Visual Chinese Character Checking (C3) aims to detect and correct errors in handwritten Chinese text images, including faked characters and misspelled characters. This task is beneficial for subsequent tasks by improv...
ISBN:
(纸本)9789819794423;9789819794430
Visual Chinese Character Checking (C3) aims to detect and correct errors in handwritten Chinese text images, including faked characters and misspelled characters. This task is beneficial for subsequent tasks by improving the efficiency of identifying errors in handwritten text. Recent methods are mainly based on Optical Character Recognition (OCR) and Pre-trained Language Models (PLMs). Visual Chinese Character Checking is an emerging task, and relevant research has made progress. However, we believe that existing work has not fully leveraged the inherent knowledge of pre-trained models and has not addressed the semantic bias issue between pre-trained models and the character checking task. These challenges result in deficiencies in recognizing misspelled Chinese characters and correcting misused characters. Therefore, we propose various multimodal contrastive learning methods based on image-to-image and image-to-text comparisons. These methods are used throughout the processes of character recognition, error detection, and correction. By aligning the semantic feature representations among different models, our approach makes these models more suitable for the Visual Chinese Character Checking task, thereby enhancing their capabilities.
An appropriate style can enhance the impact of social posts and comments. Although existing research is effective in accurately transferring text styles, it often results in some content loss, which disrupts the origi...
ISBN:
(纸本)9789819794393;9789819794409
An appropriate style can enhance the impact of social posts and comments. Although existing research is effective in accurately transferring text styles, it often results in some content loss, which disrupts the original semantic information. To address the issue of content preservation in style transfer, we extend the existing normalizing flow model and proposed a style editing module. By leveraging the transformation process of latent states in the flow model, we model the sentence content and style representations. On this basis, we accomplish style editing by replacing the original style representation with the target style. Additionally, to mitigate the impact of style editing on content representation, we introduce adversarial learning on the latent states before and after style editing, further optimizing the flow model to enhance content preservation. Extensive experiments on various datasets demonstrate that our method achieved an improvement of 3.9% in content preservation compared to the latest research. Additionally, our method attained an average style accuracy of 90.1%, proving its capability to enhance content preservation while ensuring accurate style transfer(1)((1)The code is available at https://***/djqqiao/FST).
This paper presents an investigation of the performance of various end-to-end ASR models trained on low-rescourced Livvi-Karelian. Several Wav2Vec 2.0 and Whisper based models were fine-tuned, tested and compared with...
ISBN:
(纸本)9783031779602;9783031779619
This paper presents an investigation of the performance of various end-to-end ASR models trained on low-rescourced Livvi-Karelian. Several Wav2Vec 2.0 and Whisper based models were fine-tuned, tested and compared with the hybrid TDNN-F/HMM. In the course of the experiments, end-to-end Transformer-based models have demonstrated a good performance, however the best results obtained were due to a combination of N-gram and Transformer-based models. The result of 19.83% WER on the test set were obtained using the Wav2Vec 2.0 large model with N-gram augmentation, thus being on par with SOTA models for other low-resource languages. Besides, this paper presents a new language corpus of Livvi-Karelian, containing transcripts from radio broadcasts, featuring samples from 17 speakers (7 males and 10 females). Covering about 4.5 h of audio recordings, it contains 32,037words, thus being a valuable tool for linguistic research. The findings of the presented work may be of considerable interest both for low-resource ASR and field Finnougristics.
The development of Large Vision-Language Models (LVLMs) has been hindered by hallucinations. Existing methods often struggle to accurately infer relationships between objects and their attributes, and frequently overl...
ISBN:
(纸本)9789819794423;9789819794430
The development of Large Vision-Language Models (LVLMs) has been hindered by hallucinations. Existing methods often struggle to accurately infer relationships between objects and their attributes, and frequently overlook the challenges posed by semantic duality. In this work, we develop METER, a novel multimodal hallucination detection method that utilizes a mixture of experts through tool-supported reasoning and ensembling. Specifically, our model rethinks and infers based on the decomposed reasoning steps derived from the chain-of-thought prompts, which eliminates the need for additional manual templates and recognizes attributes of hallucination step by step. We also use topics discovered from image-text pairs to distinguish ambiguous text, mitigating semantic duality. Furthermore, we investigate the effects of incorporating external tools into hallucination detection, exploring the variations and efficacy of tool ensembling in mitigating hallucinations. Additionally, we successfully alleviate hallucinations by incorporating METER's explanation into the prompt. Extensive experiments demonstrate the effectiveness of our model. Our codes are available at https://***/lambdarw/METER.
Dataset bias, i.e., the over-reliance on dataset-specific literal heuristics, is getting increasing attention for its detrimental effect on the generalization ability of NLU models. Existing works focus on eliminating...
ISBN:
(纸本)9789819794300;9789819794317
Dataset bias, i.e., the over-reliance on dataset-specific literal heuristics, is getting increasing attention for its detrimental effect on the generalization ability of NLU models. Existing works focus on eliminating dataset bias by down-weighting problematic data in the training process, which induces the omission of valid feature information while mitigating bias. In this work, we analyze the causes of dataset bias from the perspective of causal inference and propose CausalAPM, a generalizable literal disentangling framework to ameliorate the bias problem from feature granularity. The proposed approach projects literal and semantic information into independent feature subspaces, and constrains the involvement of literal information in subsequent predictions. Extensive experiments on three NLP benchmarks (i.e., MNLI, FEVER, and QQP) demonstrate that our proposed framework significantly improves the OOD generalization performance while maintaining ID performance.
Joint dialog sentiment classification and act recognition focus on the simultaneous identification of sentiment and act categories of every utterance from conversations. Current classification-based methods often rely...
ISBN:
(纸本)9789819794423;9789819794430
Joint dialog sentiment classification and act recognition focus on the simultaneous identification of sentiment and act categories of every utterance from conversations. Current classification-based methods often rely on complex structures to model label dependencies, which can be sensitive to noisy features in dialogue contexts. In light of this, we introduce an end-to-end generative framework, termed Sentiment and Act T5 (SAT5), specifically designed to address this joint task. Our SAT5 framework simply leverages the inherent ability of generative models to adequately model label dependencies without additional structures. Meanwhile, considering the order sensitivity issues in generative models, a set loss mechanism is further incorporated to eliminate these concerns effectively. Additionally, we develop a graph-based feature optimization strategy grounded in the information bottleneck principle, aimed at minimizing the impact of noisy features. Experiments on two public datasets demonstrate that our SAT5 framework significantly outperforms previous models. In-depth analysis further validates the efficacy and rationality of our set loss mechanism and feature optimization strategy.
News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title click-bait extensively exists in news domain and increases the difficulty for news recomme...
ISBN:
(纸本)9789819794393;9789819794409
News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title click-bait extensively exists in news domain and increases the difficulty for news recommendation to offer satisfactory services for users. Fortunately, we find that news abstract, as a critical field of news, aligns cohesively with the news authenticity. To this end, we propose a Title Debiasing News Recommendation with Cross-field Contrastive learning (TDNR-C-2) to overcome the title bias by incorporating news abstract. Specifically, a multi-field knowledge extraction module is devised to extract multi-view knowledge about news from various fields. Afterwards, we present a cross-field contrastive learning module to conduct bias removal via contrasting learned knowledge from title and abstract fileds. Experimental results on a real-world dataset demonstrate the superiority of the proposed TDNR-C-2 over existing state-of-the-art methods. Further analysis also indicates the significance of news abstract for title debiasing.
Semantic Dependency Graph is a framework for representing deep semantic knowledge through flexible graph structures. While recent works indicate that large language models (LLMs) have impressive language and knowledge...
ISBN:
(纸本)9789819794362;9789819794379
Semantic Dependency Graph is a framework for representing deep semantic knowledge through flexible graph structures. While recent works indicate that large language models (LLMs) have impressive language and knowledge understanding abilities, it remains unclear whether they can understand this deep semantic knowledge. To explore this problem, we design four prompt-style probing tasks from aspects of semantic structure and semantic relations to adapt the inherent abilities of LLMs. To ensure thorough evaluation, we conduct extensive experiments in both in-context learning (ICL) and supervised fine-tuning (SFT) scenarios. Our findings indicate that the understanding of deep semantic knowledge requires larger parameter scale, especially the understanding of high-order semantic structure knowledge and semantic relation knowledge. Furthermore, our experiments reveal that while LLMs perform well on the in-domain (ID) test set via SFT, their generalization ability on out-of-domain (OOD) test set remains inadequate.
Hypotactic structure translation, essentially a discourse-level question, is a big challenge in Chinese-to-English (C-E) machine translation (MT). This paper explores the assessment of translation quality of hypotacti...
ISBN:
(纸本)9789819794362;9789819794379
Hypotactic structure translation, essentially a discourse-level question, is a big challenge in Chinese-to-English (C-E) machine translation (MT). This paper explores the assessment of translation quality of hypotactic structure for C-E MT. We argue that translating a hypotactic structure in C-E translation is essentially transforming a Chinese discoursal semantic hypotactic structure (DS-hypotactic structure) to an English syntactic hypotactic structure (S-hypotactic structure). We propose a scheme to identify the Chinese DS-hypotactic structure and the corresponding English S-hypotactic structure, and formulate the assessment criteria. We then assess the translation quality of current MT systems. The results show that MT technology is still far from satisfactory in translating hypotactic structures. Finally, we discuss the criterion for identifying the Chinese DS-hypotactic structure and reasons and solutions of this MT problem.
Designing expressive speech synthesis for child voice remains an unresolved problem. One of the major dilemmas faced by child TTS systems and child speech synthesis is the scarcity of datasets to train opaque data-hun...
ISBN:
(纸本)9783031779602;9783031779619
Designing expressive speech synthesis for child voice remains an unresolved problem. One of the major dilemmas faced by child TTS systems and child speech synthesis is the scarcity of datasets to train opaque data-hungry DNN-based models. Only a few datasets were proposed for the purpose of building child conversational AI agents, and many of them come with challenges such as noisy data and indiscernible speech. With this in mind, we introduce the ChildTinyTalks (CTT) dataset, comprising 2 h of speech collected from 25 kids in grades ranging from third to fourth grade, who are telling stories and sharing their experiences. The new dataset containing 1200 audio samples has been transcribed at the word level, comprising 4 classes of voice expressions. To verify the effectiveness of CTT in real-world situations, AutoVocoder models were trained and synthesized samples were generated. The models were trained on both the LJSpeech large scale dataset and our CTT dataset. Initial experimental results indicate that the CTT dataset can steadily give comparable results with acoustic model trained on a large-scale dataset with a size of less than 10% of the large dataset.
暂无评论