Text-to-image Person Retrieval (TIPR) aims to utilize natural language descriptions as queries to retrieve pedestrian images. However, existing methods only concentrated on aligning individual text-image pairs and ign...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Text-to-image Person Retrieval (TIPR) aims to utilize natural language descriptions as queries to retrieve pedestrian images. However, existing methods only concentrated on aligning individual text-image pairs and ignored the specific self-representations within both visible images and textual descriptions of the same identity. This neglects the impact of intra-modal information distribution on TIPR. In this paper, a novel Relation-aware Semantic Alignment Network (RSAN) is proposed to learn reliable and comprehensive semantic visual-textual associations across different modalities. Specifically, A Global Semantic Alignment Matching (GSAM) loss is introduced to enhance the coherence of inter-modality features while preserving intra-modal representations for cross-modal matching. Additionally, an Adapter-assisted Information Aggregation (AIA) module is designed to further complement contextual information fusion between the image features and text embeddings. Extensive experiments conducted on two public benchmark datasets demonstrate the superiority of the proposed RSAN.
In recent years, Transformers have achieved significant success in image fusion. These methods utilize self-attention mechanism across different spatial or channel dimensions and have demonstrated impressive performan...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
In recent years, Transformers have achieved significant success in image fusion. These methods utilize self-attention mechanism across different spatial or channel dimensions and have demonstrated impressive performance. However, existing methods only optimize along a single dimension and struggle to simultaneously capture the complex dependencies between spatial and channel dimensions. To address this problem, we propose a novel multi-dimensional adaptive interaction transformer network, named as MAITFuse, to enhance the multilevel information expression and detail retention capabilities of images. We design a Multi-Dimensional Feature Extraction (MDFE) module to extract features across spatial and channel dimensions in parallel, and introduce a novel weighted cross-attention fusion method to integrate multi-dimensional information effectively. Experimental results show that, compared to existing fusion methods, our proposed method achieves superior fusion performance across various datasets.
image fusion is a method used in imageprocessing to provide a more complete representation by amalgamating features and data from many images. Multimodal medical image fusion involves the integration of medical image...
详细信息
ISBN:
(数字)9798331518523
ISBN:
(纸本)9798331518530
image fusion is a method used in imageprocessing to provide a more complete representation by amalgamating features and data from many images. Multimodal medical image fusion involves the integration of medical images from many imaging modalities, including computed tomography (CT) scans, positron emission tomography (PET), and magnetic resonance imaging (MRI), into one single dataset. This integration enhances the visualisation of anatomical structures and clinical situations, hence improving diagnostic accuracy by leveraging the strengths of each medium. This study employs MRI, CT, and PET scans as experimental modalities. This review aims to compare various multi modal medical image approach based on Stationary Wavelet Transform (SWT), Non-Subsampled Shearlet Transform (NSST), Convolutional Neural Network (CNN) and NonSubsampled Contourlet Transform (NSCT). This study examines the latest conventional and non-conventional research conducted within these disciplines. It further evaluates these approaches according to diverse image quality metrics and many quantitative assessments. According to this comparison, CNN-based fusion demonstrates superior results, as the overall visual and parametric quality of the fusion outcomes surpasses that of the other approaches evaluated.
Low-light image enhancement (LLIE) can be reformulated as an image-specific curve estimation (CE) problem. Traditional CE-based methods struggle with issues such as uniform processing across different regions, static ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Low-light image enhancement (LLIE) can be reformulated as an image-specific curve estimation (CE) problem. Traditional CE-based methods struggle with issues such as uniform processing across different regions, static parameter estimation, and lack of effective global semantic enhancement. To address these limitations, we propose a novel unsupervised learning framework, Patch-wise Dynamic Curve Estimation (PDCE), which dynamically adjusts and optimizes enhancement curves according to local patch brightness and the iteration process. Specifically, we present a Vision-Language Curve Discriminator (VLCD), which dynamically determines the curve type for each patch, avoiding uniformly applying the curve on the whole image. We introduce a Curve Parameter Estimator (CPE), which dynamically updates curve parameters and adjusts enhancement effects based on the output of the previous iteration. Furthermore, we design a visual State Space-based Semantic Enhancement Module (VSEM), which captures global receptive fields and enriches semantic features through the Mamba-based U-Net architecture. Extensive experimental results show the superiority of our PDCE over state-of-the-art methods for LLIE.
The visually impaired are unable to enjoy leisure activities as much as ordinary people due to various limitations. To expand the scope of leisure activities for the visually impaired, we have developed a vibration gl...
详细信息
ISBN:
(数字)9791188428137
ISBN:
(纸本)9798331507602
The visually impaired are unable to enjoy leisure activities as much as ordinary people due to various limitations. To expand the scope of leisure activities for the visually impaired, we have developed a vibration glove-based system that helps with piano learning. Previous research used 88 infrared light-emitting diodes and gloves with infrared receivers to provide feedback to the user, but this method had many limitations. In particular, the inconvenient user experience and low accuracy were the biggest problems. Our method solves both problems using a camera and an imageprocessing algorithm. As a result of testing the model on 20 piano images, it was shown that all keys were perfectly recognized in 75% of cases, and the gloves could be comfortably used in practice without any difficulty. Thus, our method presents a simpler user experience for the visually impaired, without requiring any special modifications to the piano.
image fusion is a technique used in imageprocessing to create a more comprehensive representation by combining features and data from several images. Incorporating medical images from several imaging modalities, such...
详细信息
ISBN:
(数字)9798331518523
ISBN:
(纸本)9798331518530
image fusion is a technique used in imageprocessing to create a more comprehensive representation by combining features and data from several images. Incorporating medical images from several imaging modalities, such as computed tomography (CT) scans, Positron Emission Tomography (PET) and magnetic resonance imaging (MRI), into a single set of data is what multi-modal medical image fusion is all about. Better visualization of anatomical structures and clinical conditions is the outcome of this integration, which increases diagnostic accuracy by using the strengths of each modality. In this paper, MRI, CT, and PET scans are used as experimental modalities. This review aims to compare various multi modal medical image approach based on Multi-resolution Singular Value Decomposition (MSVD), Principal Component Analysis (PCA), Discrete Wavelet Transform (DWT), and Wavelet Packet Decomposition (WPD). This paper focuses on exploring latest conventional and non-conventional research being done using these domains. It also compares these methods based on various image quality parameters, and various quantitative checks. Based on this comparison, PCA shows the best results in the comparison as the overall visual and parametric quality of fusion results are better than compared methods.
In this paper, we present a novel yet intuitive unsupervised feature learning approach, referred to as Minimizing Interframe Differences (MID). The idea is the following: as long as the unsupervised features successfu...
详细信息
visually impaired people face problems with independent navigation due to limited visual information. Facing significant challenges, they often travel with an assistant or their relative. This project introduces an ap...
详细信息
ISBN:
(数字)9798331518523
ISBN:
(纸本)9798331518530
visually impaired people face problems with independent navigation due to limited visual information. Facing significant challenges, they often travel with an assistant or their relative. This project introduces an approach, increasing the independence of a blind user by developing new software solutions. The proposed system employs DenseNet201 for feature extraction and Long Short-Term Memory (LSTM) networks for generating accurate, context-aware captions. These captions are converted into real-time auditory descriptions using the gTTS library, enabling users to interpret and navigate their environment confidently. Evaluated on the Flickr8k dataset, the system achieved a BLEU score of 0.721, demonstrating its ability to generate high-quality captions. The system's architecture is designed to balance accuracy, efficiency, and user accessibility. Additionally, the system incorporates a modular architecture optimized for computational efficiency and scalability. Future work includes exploring wearable technology for continuous real-time feedback, integrating advanced natural language processing (NLP) models for richer contextual understanding, and enhancing its applicability in complex indoor and outdoor environments. This approach represents a significant step toward empowering visually impaired individuals with improved mobility and environmental awareness.
Convolutional Neural Networks (CNNs) have grown into a powerful picture identification tool. They are an important part of making Internet of Things (IoT) apps. In this regard, CNNs make it possible to analyse visual ...
详细信息
ISBN:
(数字)9798331508685
ISBN:
(纸本)9798331519476
Convolutional Neural Networks (CNNs) have grown into a powerful picture identification tool. They are an important part of making Internet of Things (IoT) apps. In this regard, CNNs make it possible to analyse visual data gathered from networked devices, like cameras and sensors, effectively and accurately. With an emphasis on its potential for real-time applications for example smart surveillance, healthcare monitoring, autonomous cars, and industrial automation, this study investigates the incorporation of CNN-based image identification into IoT contexts. We tackle important issues such as the requirement for low-latency processing, energy economy, and computing limitations on edge devices. In order to maximise CNN performance in the resource-constrained IoT ecosystem, strategies like model compression, edge computing, and distributed architectures are covered. The ability to interpret large volumes of visual data is improved by the integration of CNNs with IoT, providing creative solutions for automation and intelligent decision-making across numerous industries.
Increasing electronic waste (e-waste) is a serious environmental and financial problem that calls for creative ideas for effective resource recovery and recycling. This paper proposes an IoT- and Blockchain-integrated...
详细信息
ISBN:
(数字)9798331521394
ISBN:
(纸本)9798331521400
Increasing electronic waste (e-waste) is a serious environmental and financial problem that calls for creative ideas for effective resource recovery and recycling. This paper proposes an IoT- and Blockchain-integrated e-waste management system based on Convolutional Neural Networks (CNNs) to boost automated waste classification and, thus, circular economy practices. While Blockchain guarantees open and safe tracking of waste transportation, IoT sensors and RFID tags allow real-time monitoring of e-waste. Training on a collection of 50,000 e-waste images, a CNN-based image classification model attained an accuracy of 96.2% in classifying components into recyclables, reusables, and hazardous items. Using automated decision-making, the system showed a 28% reduction in processing time and a 32% gain in sorting efficiency over conventional approaches. Blockchain incorporation enhanced traceability through 100% secure transaction records, lowering fraud and illegal disposal. Experimental data show that this method improves 23% resource recovery rates by promoting sustainable e-waste management. The proposed architecture minimizes environmental effects by encouraging ethical e-waste disposal and a closed-loop recycling system. These results show how Blockchain-secured, AI-driven IoT devices might help to advance circular economic ideas for world e-waste management.
暂无评论