Text-to-image person search is challenging due to the cross-scale correspondences and information inequality between modalities. Specifically, images and text are complexly linked at different scales and images are us...
Text-to-image person search is challenging due to the cross-scale correspondences and information inequality between modalities. Specifically, images and text are complexly linked at different scales and images are usually more informative and complete than text. It is crucial to establish semantic correlations between modalities and focus on task-relevant information in images. In this paper, we propose a novel Adaptive and Collaborative Multi-scale Alignment network (ACMA) for text-based person search that learns semantically consistent and information-aligned multi-modal representations. Firstly, we introduce a novel joint embedding module that adaptively integrates features of different pixels and words, thereby extracting semantically consistent multi-modal features at different scales. Second, we design a cross-modal fusion feature-based auxiliary visual branch to guide the extraction of key visual features that are beneficial for cross-modal matching. Extensive experiments validate that ACMA outperforms the state-of-the-art method.
A content-based image Retrieval (CBIR) has become an essential tool for managing and searching large-scale images. However, the accuracy and performance of CBIR systems can be improved by combining data mining techniq...
详细信息
In the Internet era, the explosive growth of media data processing poses significant challenges for the research of image Coding for Machines (ICM) in improving the efficiency of AI models while reducing the burdens o...
In the Internet era, the explosive growth of media data processing poses significant challenges for the research of image Coding for Machines (ICM) in improving the efficiency of AI models while reducing the burdens of data storage and transmission. Existing ICM methods face challenges in achieving sufficient generalization ability when developing a single codec to handle diverse downstream tasks. To address these issues, we propose a unified ICM framework that facilitates diverse downstream tasks with a novel importance allocation mechanism. Equipped with a spatially variable-rate image compression codec, we introduce two options: online updating and offline predicting the non-uniform quality map, which governs the quality distribution of reconstructed images based on specific downstream tasks. Our proposed method is rigorously evaluated through extensive experiments on diverse and comprehensive fine-grained image classification datasets. The experiment results conclusively demonstrate the effectiveness of the proposed method in achieving a superior rate-distortion trade-off for ICM.
Multi-focused plenoptic images possess many special characteristics related to the micro-images (MIs) array, which are expected to be useful in further increasing its compression performance. Those special characteris...
Multi-focused plenoptic images possess many special characteristics related to the micro-images (MIs) array, which are expected to be useful in further increasing its compression performance. Those special characteristics come from the much overlap and sharpness variance among its micro-images, and proper handling of such properties can lead to better patch-based prediction. In this paper, for multi-focused plenoptic image data, we design a new prediction model taking into account the disparity shift constraint coming from the overlaps and the sharpness variation. Experiment results show coding gain respectively of 21% over the HEVC Intra and 27% when the proposed method is combined with the Intra Block Copy (IBC) tool which is reported very effective in plenoptic image coding.
Recently, studies on generative models using 3D information are active. GIRAFFE, one of the latest 3D-aware generative models, shows better feature disentanglement than existing generative models because it generates ...
Recently, studies on generative models using 3D information are active. GIRAFFE, one of the latest 3D-aware generative models, shows better feature disentanglement than existing generative models because it generates an image through volume rendering of independently formed 3D neural feature fields. However, GIRAFFE still suffers from an issue where foreground and background disentanglement is not smooth. In order to accomplish better disentanglement performance than GIRAFFE, we propose co-adversarial learning of the generative model at both image- and feature-levels. As a result of rich simulation experiments, the proposed generative model can produce photo-realistic images with only fewer parameters than existing 3D-aware generative models, along with excellent foreground-background disentanglement performance.
The proceedings contain 60 papers. The special focus in this conference is on Computing and Communication Networks. The topics include: Enhancing Security in Wireless Sensor Networks: A Broadcast/Multicast Authenticat...
ISBN:
(纸本)9789819708918
The proceedings contain 60 papers. The special focus in this conference is on Computing and Communication Networks. The topics include: Enhancing Security in Wireless Sensor Networks: A Broadcast/Multicast Authentication Framework with Identity-Based Signature Schemes;TSFTO: A Two-Stage Fuzzy-Based Tasks Orchestration Algorithm for Edge and Fog Computing Environments;tracking Climatic Variations Through Smart IoT-Driven Approach: An Exploratory Analysis;Design and Manufacturing of an Efficient Low-Cost Fire Surveillance System Based on GSM communications and IoT Technology;a Comparative Analysis of Optimized Routing Protocols for High-Performance Mobile Ad Hoc Networks;integrating Sandpiper Optimization Algorithm and Secure Aware Routing Protocol for Efficient Cluster Head Selection in Wireless Sensor Networks;the Effects of Weather Conditions on Data Transmission in Free Space for Wireless Sensor Networks;Using Wireless Sensor Network Performance and Optimizing in Underground Mines with Virtual MIMO Antenna;a Novel Logistic Regression-Based Fire Detection Model Using IoT in Underground Coal Mines;An MQTT IoT Intrusion Detection System Using Deep-Learning;multi-objective Optimal Feature Selection for Cyber Security Integrated with Deep Learning;application of Gradient Boosting Classifier-Based Computational Intelligence to Detect Drug Addiction Threat in Society;insurance 4.0: Smart Insurance Technologies and Challenges Ahead;review on Vision Transformer for Satellite image Classification;primitive Roots and Euler Totient Function-Based Progressive visual Cryptography;approach for Fire Detection Using imageprocessing;review of Various Approaches for Authorship Identification in Digital Forensics;machine Learning Model for Traffic Prediction and Pattern Extraction in High-Speed Optical Networks;improved Markov Decision Process in Wireless Sensor Network for Optimal Energy Consumption.
Aiming at the deficiency that the traditional P-M diffusion model can't distinguish the flat area and the detailed area with similar gradient changes, a new diffusion model is proposed. In image engineering, the a...
详细信息
The imperceptible color vibration method, which embeds imperceptible patterns in display images through rapid chromatic changes, enables the presentation of information to machines while maintaining the viewing experi...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
The imperceptible color vibration method, which embeds imperceptible patterns in display images through rapid chromatic changes, enables the presentation of information to machines while maintaining the viewing experience of users. However, when markers are embedded in video images, decoding becomes challenging during periods with significant interframe differences, which adds noise to the decoding. In this study, we enhanced the signal-point distance by embedding two quadrature chromatic changes within the color space across the entire image. Additionally, we employed soft-decision decoding, which involves accumulating multiple confidence values per pixel, thereby enabling robust decoding during periods with significant interframe differences in video images. Our evaluation confirmed that the proposed method significantly improves the decoding rate compared to conventional techniques. Furthermore, we measured the time required for decoding and verified the effectiveness of the proposed soft-decision decoding method.
Recent advancements in learning-based image compression methods have shown promising results. The success of these methods heavily relies on the entropy model, which predicts the probability distribution of the quanti...
Recent advancements in learning-based image compression methods have shown promising results. The success of these methods heavily relies on the entropy model, which predicts the probability distribution of the quantized latent representation of the image based on available knowledge. However, most existing entropy models follow an estimate-then-merge pipeline, leading to two potential issues: limited flexibility in modeling spatial context and inadequate fusion of different prior sources. In this paper, we propose a novel approach called the MergeThen-Estimate (MTE) entropy model. Our method addresses these issues by first uniformly merging available priors into a ’prior token’ using a Prior Embedding Module for each spatial location in the quantized latent representation. Next, we introduce a Content-aware Context Model to dynamically capture the dependencies of the currently coding representation on its neighboring available priors. Experiments on the Kodak dataset demonstrate the superiority of our proposed MTE entropy model.
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, e...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, existing methods face two primary challenges: limited performance improvement and high model complexity. In this paper, to address these two challenges, we propose a perceptual image compression solution by introducing a conditional diffusion model. Given that compression performance heavily depends on the decoder’s generative capability, we base our decoder on the diffusion transformer architecture. To address the model complexity problem, we implement the diffusion transformer architecture with Swin transformer. Equipped with enhanced generative capability, we further augment the decoder with informative features using a multi-scale feature fusion module. Experimental results demonstrate that our approach surpasses existing perceptual image compression methods while achieving lower model complexity.
暂无评论