The objective of referring image segmentation is to extract referred entities from an image using a particular naturallanguage sentence. The main idea for this task is interacting textual and visual features to build...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
The objective of referring image segmentation is to extract referred entities from an image using a particular naturallanguage sentence. The main idea for this task is interacting textual and visual features to build multi-modal relationships. The prior state-of-the-art methods mainly focus on local multi-level intermediate feature interaction or global text-to-image alignment, which might result in insufficient interaction for capturing global multi-modal information exchange or fine-grained referred object details, respectively. To overcome this issue, we introduce a referring image segmentation framework with two-stage multi-modal interaction. Specifically, we devise an innovative multi-level cross-modal fusion module to effectively facilitate the interaction of intermediate features of linguistic and visual modalities for fine-grained details of referred objects. Besides, we further align the linguistic and visual information by introducing an elaborate global alignment module for accurately localizing the entire referred objects. The comprehensive experiments conducted on three referring image segmentation datasets illustrate that our proposed two-stage multi-modal interaction framework exhibits a marked superiority over the contemporary state-of-the-art approaches.
With the rise in the amount of news available today, the need for its classification has emerged. In this paper, we present methods for tagging news categories using different deep learning models along with a compari...
详细信息
ISBN:
(纸本)9783031585340;9783031585357
With the rise in the amount of news available today, the need for its classification has emerged. In this paper, we present methods for tagging news categories using different deep learning models along with a comparison of their effects. These models include single-channel CNN model, multichannel CNN model, and multimodal CNN model. This study involves integration of naturallanguage understanding with convolutional methods that understands descriptions, titles, and tags to enhance news ranking. The novel part of this approach is to find out using naturallanguage understanding with the transfer learning from the supplemental external features that are associated with images. The accuracy of the single-channel model was found to be 81.30%, of the multi-channel model was 85.98% and that of the multi-modal model was 85.39%. We have used the N24 news dataset for the validation of the models.
Large language Models (LLMs) have emerged as the dominant paradigm in naturallanguageprocessing owing to their remarkable performance across various target tasks. However, naively fine-tuning them for specific downs...
详细信息
Do regulatory guidance documents use binding language despite being purportedly non-binding? Regulatory agencies play a crucial role in modern societies by issuing regulations. While most regulations are promulgated a...
详细信息
ISBN:
(纸本)9798400701979
Do regulatory guidance documents use binding language despite being purportedly non-binding? Regulatory agencies play a crucial role in modern societies by issuing regulations. While most regulations are promulgated as rules with public notice and comment procedures, administrative guidance documents are as abundant but less studied. They have less formal requirements and are meant as non-binding guidelines, yet skeptics argue they are often used to evade judicial review, and courts turn to their text to inquire whether they are effectively binding. Recent advancements in text analysis methods have allowed scholars to analyze regulatory text, including the measurement of binding language. However, guidance documents have not been part of this trend, largely due to their inaccessibility. This article contributes to the field of empirical legal studies and administrative law by constructing a novel dataset of guidance documents, leveraging a unique policy change. It uses text analysis methods with qualitative insights from doctrinal court decisions, and finds that guidance documents are in fact less binding than rules, but that binding language increased over time and that substantial portions of available documents score higher than a document struck down by a court.
We present a systematic evaluation of large language models' sensitivity to argument roles, i.e., who did what to whom, by replicating psycholinguistic studies on human argument role processing. In three experimen...
详细信息
The amount of textual information that can be analyzed in order to look for meaningful information has become a constraint as the amount of digital content that is being produced everyday increases. When it comes to m...
详细信息
Aspect term extraction (ATE) is an important naturallanguageprocessing task, which aims to extract aspect terms from reviews. Recently, data augmentation has emerged as a reliable approach for relieving data sparsit...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Aspect term extraction (ATE) is an important naturallanguageprocessing task, which aims to extract aspect terms from reviews. Recently, data augmentation has emerged as a reliable approach for relieving data sparsity in the NLP area. For ATE, self-labeling and semi-generation methods have been proposed to implement effective data augmentation. However, they either rely on external data or a pretrained generation model. In this paper, we propose a simple and self-contained augmentation method, which produces new instances for augmentation by context decoupling and infrequent term refilling, without using external data and generation models. We conduct experiments on four benchmark SemEval datasets. The test results show that our method yields substantial improvements, and performs comparably to the state-of-the-art method which uses external data.
Large language models (LLMs) have shown exceptional performance in the domain of composite artificial intelligence tasks, offering a preliminary insight into the potential of general artificial intelligence. The fine-...
详细信息
ISBN:
(纸本)9789819794331;9789819794348
Large language models (LLMs) have shown exceptional performance in the domain of composite artificial intelligence tasks, offering a preliminary insight into the potential of general artificial intelligence. The fine-tuning process for LLMs necessitates significant computational resources, often surpassing those available from standard consumer-grade GPUs. To this end, we introduce the Adaptive Quantization Low-Rank Adaptation fine-tuning (AQLoRA), a method that reduces memory demands during fine-tuning by utilizing quantization coupled with pruning techniques. This dual strategy not only reduces memory usage but also preserves accuracy. AQLoRA refines the original Low-Rank Adaptation fine-tuning (LoRA) method by efficiently quantizing LLMs weights, prioritizing computational resource allocation based on weight importance, and effectively integrating the quantized model with auxiliary weights post fine-tuning. Applying AQLoRA to the ChatGLM2-6B model, we demonstrate its effectiveness in both naturallanguage generation (NLG) and naturallanguage understanding (NLU) across diverse fine-tuning datasets and scenarios. Our findings reveal that AQLoRA achieves balance between performance and memory efficiency, reducing memory consumption by 25% in NLG tasks. For NLU tasks, it enhances performance by 10% and reduces memory consumption by 10% compared to state-of-the-art methods.
Aspect-based sentiment analysis (ABSA) represents a crucial field of naturallanguageprocessing (NLP). It focuses on deriving detailed sentiment insights from textual content. Dialogue-level aspect-based sentiment qu...
详细信息
Aspect-based sentiment analysis (ABSA) represents a crucial field of naturallanguageprocessing (NLP). It focuses on deriving detailed sentiment insights from textual content. Dialogue-level aspect-based sentiment quadruple extraction (DiaASQ) is specifically concerned with pinpointing target-aspect-opinion-emotion quadruples within conversations. DiaASQ is important in industries like e-commerce, social media analytics, and customer feedback. However, Current ABSA approaches predominantly focus on single-text scenarios, often overlooking the complexities involved in sentiment analysis within conversational contexts. To fill this gap, this paper presents the IFusionQuad model, which is specifically designed for the DiaASQ task. Our contributions include the innovative integration of CloBlock in ABSA, enhancing feature representation with context-aware weights. The InteractiveNet Fusion Module further advances dialogue understanding by aggregating dialogue- specific features such as threads, speakers, and replies. Components such as CloBlock, gating mechanism, and Biaffine attention effectively mitigate data noise issues, improving the relevance of feature extraction. empirical evaluation on standard datasets demonstrates that the IFusionQuad model outperforms baseline methods, achieving substantial improvements in quadruple extraction. Specifically, our model shows a 6.59% increase in micro F1 and a 7.05% increase in identification F1 for Chinese datasets, and a 2.65% and 4.69% increase in micro F1 and identification F1, respectively, for English datasets. The results clearly demonstrate our IFusionQuad model's efficacy, which consistently outperforms baseline models across all evaluation datasets on the DiaASQ task.
Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text. Recent methods unlock learning joint audio-text embeddings from raw audio-text pairs describing audio i...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text. Recent methods unlock learning joint audio-text embeddings from raw audio-text pairs describing audio in naturallanguage. Despite recent advancements, there is little exploration of systematic methods to train models for recognizing sound events and sources in alternative scenarios, such as distinguishing fireworks from gunshots at outdoor events in similar situations. This study introduces causal reasoning and counterfactual analysis in the audio domain. We use counterfactual instances and include them in our model across different aspects. Our model considers acoustic characteristics and sound source information from human-annotated reference texts. To validate the effectiveness of our model, we conducted pre-training utilizing multiple audio captioning datasets. We then evaluate with several common downstream tasks, demonstrating the merits of the proposed method as one of the first works leveraging counterfactual information in audio domain. Specifically, the top-1 accuracy in open-ended language-based audio retrieval task increased by more than 43%.
暂无评论