Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by t...
详细信息
ISBN:
(纸本)9798891760608
Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the performance of a state-of-the-art vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values (Dollar Street) and show that performance inequality exists among households of different income levels. Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries. We highlight insights that can help mitigate these issues and propose actionable steps for economic-level inclusive AI development. Code is available at Analysis for Bridging the Digital Divide.
While large language models exhibit certain cross-lingual generalization capabilities, they suffer from performance degradation (PD) on unseen closely-related languages (CRLs) and dialects relative to their high-resou...
详细信息
Large language models (LLMs) have brought a great breakthrough to the naturallanguageprocessing (NLP) community, while leading the challenge of handling concurrent customer queries due to their high throughput deman...
详细信息
Fact verification systems assess a claim's veracity based on evidence. An important consideration in designing them is faithfulness, i.e. generating explanations that accurately reflect the reasoning of the model....
详细信息
ISBN:
(纸本)9798891760608
Fact verification systems assess a claim's veracity based on evidence. An important consideration in designing them is faithfulness, i.e. generating explanations that accurately reflect the reasoning of the model. Recent works have focused on natural logic, which operates directly on naturallanguage by capturing the semantic relation of spans between an aligned claim with its evidence via set-theoretic operators. However, these approaches rely on substantial resources for training, which are only available for high-resource languages. To this end, we propose to use question answering to predict natural logic operators, taking advantage of the generalization capabilities of instruction-tuned language models. Thus, we obviate the need for annotated training data while still relying on a deterministic inference system. In a few-shot setting on FEVER, our approach outperforms the best baseline by 4.3 accuracy points, including a state-of-the-art pre-trained seq2seq natural logic system, as well as a state-of-the-art prompt-based classifier. Our system demonstrates its robustness and portability, achieving competitive performance on a counterfactual dataset and surpassing all approaches without further annotation on a Danish verification dataset. A human evaluation indicates that our approach produces more plausible proofs with fewer erroneous natural logic operators than previous natural logic-based systems.
Identifying important neurons for final predictions is essential for understanding the mechanisms of large language models. Due to computational constraints, current attribution techniques struggle to operate at neuro...
详细信息
To ensure large language models contain up-to-date knowledge, they need to be updated ***, model editing is challenging as it might also affect knowledge that is unrelated to the new ***-of-the-art methods identify pa...
详细信息
Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive huma...
详细信息
Recent advancements in Large language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV foota...
详细信息
Cross-lingual learning aims to transfer knowledge from one naturallanguage to another. Zero-shot cross-lingual named entity recognition (NER) tasks are to train an NER model on source languages and to identify named ...
详细信息
Cross-lingual learning aims to transfer knowledge from one naturallanguage to another. Zero-shot cross-lingual named entity recognition (NER) tasks are to train an NER model on source languages and to identify named entities in other languages. Existing knowledge distillation-based models in a teacher-student manner leverage the unlabeled samples from the target languages and show their superiority in this setting. However, the valuable similarity information between tokens in the target language is ignored. And the teacher model trained solely on the source language generates low-quality pseudo-labels. These two facts impact the performance of cross-lingual NER. To improve the reliability of the teacher model, in this study, we first introduce one extra simple binary classification teacher model by similarity learning to measure if the inputs are from the same class. We note that this binary classification auxiliary task is easier, and the two teachers simultaneously supervise the student model for better performance. Furthermore, given such a stronger student model, we propose a progressive knowledge distillation framework that extensively fine-tunes the teacher model on the target-language pseudo-labels generated by the student model. empirical studies on three datasets across seven different languages show that our presented model outperforms state-of-the-art methods.
Since the release of ChatGPT, the field of naturallanguageprocessing has experienced rapid advancements, particularly in Large language Models (LLMs) and their multimodal counterparts, Large Multimodal Models (LMMs)...
详细信息
暂无评论