Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept...
详细信息
ISBN:
(纸本)9783031776090;9783031776106
Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts - an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (imageNet), and that these explanations can be used to debug deep learning-based models. Code: ***/AngusNicolson/textcavs.
作者:
Nie, YongshaAnhui Univ
Anhui Prov Int Joint Res Ctr Adv Technol Med Imag Sch Comp Sci & Technol Hefei 230601 Peoples R China
image dehazing is an important and challenging task in imageprocessing. Existing dehazing methods often encounter color distortion in the dehazed results. To address this issue, in this paper, we propose a novel appr...
详细信息
ISBN:
(纸本)9789819786848;9789819786855
image dehazing is an important and challenging task in imageprocessing. Existing dehazing methods often encounter color distortion in the dehazed results. To address this issue, in this paper, we propose a novel approach named Color Enhanced Dehazing network (CED). It consists of two main branches: a dehazing branch and a color reconstruction branch. Initially, we employ Fast Fourier Transform to separate the low-frequency sub-image from the hazy image, which contains substantial color information. Alongside inputting the original hazy image into the dehazing branch, we concurrently feed the low-frequency sub-image into the color reconstruction branch. This allows us to extract and reconstruct corresponding color information to augment the dehazing process. To thoroughly fuse the information from two branches, we design a Selective Spatial-Channel adjustment fusion module (SSC) for the feature fusion across different branches. Extensive experiments on benchmark datasets well demonstrate the effectiveness and superiority of the proposed method in the image dehazing.
image-text retrieval is a critical challenge for understanding the semantic relationship between vision and language domains. Previous studies have focused on analyzing either global or local features, neglecting the ...
详细信息
ISBN:
(纸本)9789819600540;9789819600557
image-text retrieval is a critical challenge for understanding the semantic relationship between vision and language domains. Previous studies have focused on analyzing either global or local features, neglecting the intrinsic connections between these two levels of granularity. In addition, current mainstream methods attempt to construct a unified semantic space by aggregating the weighted features of different segments, aiming to enhance the interaction with different granularities of information. However, these methods may be disrupted by irrelevant segments, leading to semantic misalignment. To address these challenges, we propose a Bidirectional Focused Attention and Global-Local Matching Fusion Network (BFAGL). The proposed method well integrates the similarity matching of global semantic contexts with the precision of local detail analysis, ensuring that global semantic alignment is achieved without compromising critical local information. Furthermore, it incorporates bi-directional focal attention into the local matching process to promote a nuanced understanding of the contextual semantic relationships of images and text. Experiments on the Flickr30K and MS-COCO datasets demonstrate the state-of-the-art performance of BFAGL in image-text retrieval tasks.
Underwater object detection is an important computervision task that has been widely used in marine life identification and tracking. However, problems such as low contrast conditions, occlusion condition, unbalanced...
详细信息
Against the backdrop of accelerating digital transformation, online collaboration platforms have become indispensable tools for remote work and learning. However, with the increase in the number of users, the problem ...
详细信息
Student expression recognition has become an essential tool for assessing learning experiences and emotional states. This paper introduces xLSTM-FER, a novel architecture derived from the Extended Long Short-Term Memo...
详细信息
ISBN:
(纸本)9789819600540;9789819600557
Student expression recognition has become an essential tool for assessing learning experiences and emotional states. This paper introduces xLSTM-FER, a novel architecture derived from the Extended Long Short-Term Memory (xLSTM), designed to enhance the accuracy and efficiency of expression recognition through advanced sequence processing capabilities for student facial expression recognition. xLSTM-FER processes input images by segmenting them into a series of patches and leveraging a stack of xLSTM blocks to handle these patches. xLSTMFER can capture subtle changes in real-world students' facial expressions and improve recognition accuracy by learning spatial-temporal relationships within the sequence. Experiments on CK+, RAF-DF, and FERplus demonstrate the potential of xLSTM-FER in expression recognition tasks, showing better performance compared to state-of-the-art methods on standard datasets. The linear computational and memory complexity of xLSTM-FER make it particularly suitable for handling high-resolution images. Moreover, the design of xLSTM-FER allows for efficient processing of non-sequential inputs such as images without additional computation.
In the context of vocational English teaching, writing skills are crucial for students' professional communication ***, traditional teaching methods often face challenges such as delayed feedback and insufficient ...
详细信息
The proceedings contain 10 papers. The special focus in this conference is on Diabetic Foot Ulcers Grand Challenge. The topics include: Diabetic Foot Ulcer Unsupervised Segmentation with vision Transformers Atten...
ISBN:
(纸本)9783031808708
The proceedings contain 10 papers. The special focus in this conference is on Diabetic Foot Ulcers Grand Challenge. The topics include: Diabetic Foot Ulcer Unsupervised Segmentation with vision Transformers Attention;self-supervised Instance Segmentation of Diabetic Foot Ulcers via Feature Correspondence Distillation;multi-stage Segmentation of Diabetic Foot Ulcers Using Self-supervised Learning;SSL Based Encoder Pretraining for Segmenting a Heterogeneous Chronic Wound image Database with Few Annotations;multi-scale Attention Network for Diabetic Foot Ulcer Segmentation Using Self-supervised Learning;a Supervised Segmentation Solution: Diabetic Foot Ulcers Challenge 2024;CDe: Focus on the Color Differences in Diabetic Foot images;diabetic Foot Ulcer Grand Challenge 2024: Overview and Baseline Methods.
To ensure the continuity of the thickness measurement process and the accuracy of data when using the electromagnetic ultrasonic thickness measurement module for automatic thickness measurement of membrane water-coole...
详细信息
To prevent image distortion, this paper explores methods for enhancing and optimizing graphic design images using 3D laser vision technology. The process involves collecting graphic design image data, mapping 3D laser...
详细信息
暂无评论