Diabetic Retinopathy (DR) is an eye disease associated with chronic diabetes. It remains the primary cause of visual impairment and blindness among the global working-age population. Early detection of DR is crucial f...
详细信息
ISBN:
(纸本)9783031821554;9783031821561
Diabetic Retinopathy (DR) is an eye disease associated with chronic diabetes. It remains the primary cause of visual impairment and blindness among the global working-age population. Early detection of DR is crucial for ensuring timely diagnosis and effective treatment. This paper proposes a new homogeneous ensemble-based approach constructed using a set of hybrid architectures as base learners and two combination rules (weighted and hard voting) for referable DR detection, using fundus images from the Messidor-2, Kaggle DR, and APTOS datasets. The hybrid architectures are created using deep feature extraction techniques, dimensionality reduction techniques to reduce the size of the extracted features, and a decision tree algorithm (DT) for classification. The results showed the potential of the proposed new approach which achieved high accuracy values over the three datasets: 90.65%, 93.01%, and 83.32% using the APTOS, Kaggle DR, and Messidor-2 datasets respectively. Therefore, we recommend using the proposed approach since it is impactful for referable DR classification, and it represents a promising tool to assist ophthalmologists in diagnosing DR.
In an era dominated by the ubiquitous sharing of visual content on social media, the proliferation of fake photographs and digital fakes has emerged as a critical issue. This paper explores how fake images are detecte...
详细信息
Text-to-image Person Retrieval (TIPR) aims to utilize natural language descriptions as queries to retrieve pedestrian images. However, existing methods only concentrated on aligning individual text-image pairs and ign...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Text-to-image Person Retrieval (TIPR) aims to utilize natural language descriptions as queries to retrieve pedestrian images. However, existing methods only concentrated on aligning individual text-image pairs and ignored the specific self-representations within both visible images and textual descriptions of the same identity. This neglects the impact of intra-modal information distribution on TIPR. In this paper, a novel Relation-aware Semantic Alignment Network (RSAN) is proposed to learn reliable and comprehensive semantic visual-textual associations across different modalities. Specifically, A Global Semantic Alignment Matching (GSAM) loss is introduced to enhance the coherence of inter-modality features while preserving intra-modal representations for cross-modal matching. Additionally, an Adapter-assisted Information Aggregation (AIA) module is designed to further complement contextual information fusion between the image features and text embeddings. Extensive experiments conducted on two public benchmark datasets demonstrate the superiority of the proposed RSAN.
This paper presents an overview and analysis of numerous research projects on image fusion methods, with a particular emphasis on deep learning-based methods. The research analyses the inadequacies of current fusion m...
详细信息
In recent years, Transformers have achieved significant success in image fusion. These methods utilize self-attention mechanism across different spatial or channel dimensions and have demonstrated impressive performan...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
In recent years, Transformers have achieved significant success in image fusion. These methods utilize self-attention mechanism across different spatial or channel dimensions and have demonstrated impressive performance. However, existing methods only optimize along a single dimension and struggle to simultaneously capture the complex dependencies between spatial and channel dimensions. To address this problem, we propose a novel multi-dimensional adaptive interaction transformer network, named as MAITFuse, to enhance the multilevel information expression and detail retention capabilities of images. We design a Multi-Dimensional Feature Extraction (MDFE) module to extract features across spatial and channel dimensions in parallel, and introduce a novel weighted cross-attention fusion method to integrate multi-dimensional information effectively. Experimental results show that, compared to existing fusion methods, our proposed method achieves superior fusion performance across various datasets.
Low-light image enhancement (LLIE) can be reformulated as an image-specific curve estimation (CE) problem. Traditional CE-based methods struggle with issues such as uniform processing across different regions, static ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Low-light image enhancement (LLIE) can be reformulated as an image-specific curve estimation (CE) problem. Traditional CE-based methods struggle with issues such as uniform processing across different regions, static parameter estimation, and lack of effective global semantic enhancement. To address these limitations, we propose a novel unsupervised learning framework, Patch-wise Dynamic Curve Estimation (PDCE), which dynamically adjusts and optimizes enhancement curves according to local patch brightness and the iteration process. Specifically, we present a Vision-Language Curve Discriminator (VLCD), which dynamically determines the curve type for each patch, avoiding uniformly applying the curve on the whole image. We introduce a Curve Parameter Estimator (CPE), which dynamically updates curve parameters and adjusts enhancement effects based on the output of the previous iteration. Furthermore, we design a visual State Space-based Semantic Enhancement Module (VSEM), which captures global receptive fields and enriches semantic features through the Mamba-based U-Net architecture. Extensive experimental results show the superiority of our PDCE over state-of-the-art methods for LLIE.
The visually impaired are unable to enjoy leisure activities as much as ordinary people due to various limitations. To expand the scope of leisure activities for the visually impaired, we have developed a vibration gl...
详细信息
ISBN:
(数字)9791188428137
ISBN:
(纸本)9798331507602
The visually impaired are unable to enjoy leisure activities as much as ordinary people due to various limitations. To expand the scope of leisure activities for the visually impaired, we have developed a vibration glove-based system that helps with piano learning. Previous research used 88 infrared light-emitting diodes and gloves with infrared receivers to provide feedback to the user, but this method had many limitations. In particular, the inconvenient user experience and low accuracy were the biggest problems. Our method solves both problems using a camera and an imageprocessing algorithm. As a result of testing the model on 20 piano images, it was shown that all keys were perfectly recognized in 75% of cases, and the gloves could be comfortably used in practice without any difficulty. Thus, our method presents a simpler user experience for the visually impaired, without requiring any special modifications to the piano.
This Volume 5150 of Part 3 of 3 parts of the the conference proceedings contains 70 papers. Topics discussed include image and video enhancement, image and video coding, motion estimation, image and video quality, ima...
详细信息
This Volume 5150 of Part 3 of 3 parts of the the conference proceedings contains 70 papers. Topics discussed include image and video enhancement, image and video coding, motion estimation, image and video quality, image and video noise reduction, lossless coding, systems and architectures, image and video indexing, face detection and recognition, image and video security and watermarking, three dimensional imageprocessing, image and video segmentation and image and video retrieval applications.
This Volume 5150 Part 2 of 2 parts of the conference proceedings contains 73 papers. Topics discussed include coding standard, image and video security and watermarking, MPEG video coding standard, error resilient cod...
详细信息
This Volume 5150 Part 2 of 2 parts of the conference proceedings contains 73 papers. Topics discussed include coding standard, image and video security and watermarking, MPEG video coding standard, error resilient coding, image and video segmentation, visualization, systems and architectures, three dimensional imageprocessing, object based coding, image compression beyond wavelets, semantic characterization of multimedia documents, image based rendering and related technologies, image and video enhancement and image and video coding.
image fusion is a method used in imageprocessing to provide a more complete representation by amalgamating features and data from many images. Multimodal medical image fusion involves the integration of medical image...
详细信息
ISBN:
(数字)9798331518523
ISBN:
(纸本)9798331518530
image fusion is a method used in imageprocessing to provide a more complete representation by amalgamating features and data from many images. Multimodal medical image fusion involves the integration of medical images from many imaging modalities, including computed tomography (CT) scans, positron emission tomography (PET), and magnetic resonance imaging (MRI), into one single dataset. This integration enhances the visualisation of anatomical structures and clinical situations, hence improving diagnostic accuracy by leveraging the strengths of each medium. This study employs MRI, CT, and PET scans as experimental modalities. This review aims to compare various multi modal medical image approach based on Stationary Wavelet Transform (SWT), Non-Subsampled Shearlet Transform (NSST), Convolutional Neural Network (CNN) and NonSubsampled Contourlet Transform (NSCT). This study examines the latest conventional and non-conventional research conducted within these disciplines. It further evaluates these approaches according to diverse image quality metrics and many quantitative assessments. According to this comparison, CNN-based fusion demonstrates superior results, as the overall visual and parametric quality of the fusion outcomes surpasses that of the other approaches evaluated.
暂无评论