image captioning is a cross-modal task that combines computer vision and natural language processing. The model is required to generate an appropriate caption for the given image. To address this challenge, we propose...
详细信息
The task of user identity linkage across social networks aims to predict whether users from different social networks refer to the same person. This task plays a crucial role in cross-social network information dissem...
详细信息
ISBN:
(纸本)9798350302936
The task of user identity linkage across social networks aims to predict whether users from different social networks refer to the same person. This task plays a crucial role in cross-social network information dissemination and intelligent recommendations. However, existing user identity linkage tasks suffer from several challenges: 1) excessive reliance on social network topology, neglecting users' visual modality information;2) inadequate handling of noise in user feature data;and 3) ineffective fusion of users' multimodal information. To address these issues, we investigated a method that utilizes heterogeneous multimodal posts, including user-generated text, images, and check-in messages, to achieve user identity linkage across social networks. We innovatively leveraged a pre-trained model for image-to-text conversion to further explore users' image data and proposed an adversarial learning model based on the multimodal self-attention mechanism (AMSA). The AMSA model consists of four components: user feature extraction, user feature processing, user feature fusion, and adversarial learning. Specifically, AMSA initially employed advanced pre-trained models to extract features from multiple modalities of users, including images and text. Subsequently, it utilized multiple mechanisms, such as multi-head self-attention, to process data from each modality separately and then fused them into user representation vectors. Finally, AMSA employed adversarial learning to enhance the model's learning capacity and mitigate semantic disparities in user information across different platforms. We conducted model performance evaluations on publicly available datasets, and experimental results demonstrated the superiority of the proposed AMSA model.
Alzheimer's disease (AD) and Mild Cognitive Impairment (MCI) are neurogenerative impairments with similar symptoms and risk factors. Sulcal width and depth are known biomarkers for discriminating between AD and MC...
详细信息
ISBN:
(数字)9781665482509
ISBN:
(纸本)9781665482509
Alzheimer's disease (AD) and Mild Cognitive Impairment (MCI) are neurogenerative impairments with similar symptoms and risk factors. Sulcal width and depth are known biomarkers for discriminating between AD and MCI. This paper presents a novel 2D image representation for a brain mesh surface, called a height map. The basic idea behind the height map is to represent the surface as a function of spherical coordinates of the mesh vertices. We present a method to derive a height map from a given neuroimage (MRI) and extract sulcal regions from the height map. We demonstrate the height map's utility for classifying a given neuroimage into healthy, MCI and AD classes. Two approaches for extracting sulcal regions are explored. The proposed method is computationally light, and obtaining sulcal regions from a brain surface mesh takes about 24 seconds on a standard Intel i5-7200 CPU. The proposed method achieves 76.1% accuracy, and 76.3% F1-score for healthy, MCI, AD classification on a publicly available dataset.
The problem of generating textual descriptions for the visual data has gained research attention in the recent years. In contrast to that the problem of generating visual data from textual descriptions is still very c...
详细信息
image captioning is a challenging task that connects two major artificial intelligence fields: computer vision and natural language processing. image captioning models use traditional images to generate a natural lang...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
image captioning is a challenging task that connects two major artificial intelligence fields: computer vision and natural language processing. image captioning models use traditional images to generate a natural language description of the scene. However, the scene could contain private information that we want to hide but still generate the captions. Inspired by the trend of jointly designing optics and algorithms, this paper addresses the problem of privacy-preserving scene captioning. Our approach promotes privacy preservation, by hiding the faces in the images, during the acquisition process with a designed refractive camera lens while extracting useful features to perform image captioning. The refractive lens and an image captioning deep network architecture are optimized end-to-end to generate descriptions directly from the blurred images. Simulations show that our privacy-preserving approach degrades private visual attributes (e.g., face detection fails with our distorted images) while achieving comparable captioning performance with traditional non-private methods on the COCO dataset.
Given the requirements for robust target classification and accurate target state estimation in visual tracking, SiamFC++ proposes a set of practical guidelines for designing high-performance general-purpose trackers ...
详细信息
The proceedings contain 475 papers. The topics discussed include: weight-based regularization for improving robustness in image classification;weakly supervised few-shot and zero-shot semantic segmentation with mean i...
ISBN:
(纸本)9781665468916
The proceedings contain 475 papers. The topics discussed include: weight-based regularization for improving robustness in image classification;weakly supervised few-shot and zero-shot semantic segmentation with mean instance aware prompt learning;a retriever-reader framework with visual entity linking for knowledge-based visual question answering;2S-DFN: dual-semantic decoding fusion networks for fine-grained image recognition;Action-GPT: leveraging large-scale language models for improved and generalized action generation;protecting intellectual property of EEG-based model with watermarking;making adversarial attack imperceptible in frequency domain: a watermark-based framework;content-adaptive adversarial embedding for image steganography using deep reinforcement learning;a robust generative image steganography method based on guidance features in image synthesis;adversarial audio watermarking: embedding watermark into deep feature;deniable diffusion generative steganography;and sea surface object detection based on background dynamic perception and cross-layer semantic interaction.
The goal of fine-grained image description generation techniques is to learn detailed information from images and simulate human-like descriptions that provide coherent and comprehensive textual details about the imag...
详细信息
Medical images have a vital role in the healthcare industry. The medical sector uses the internet to facilitate the distant sharing of medical information among hospitals and clinics and provide patients with e-health...
详细信息
ISBN:
(纸本)9789819916474;9789819916481
Medical images have a vital role in the healthcare industry. The medical sector uses the internet to facilitate the distant sharing of medical information among hospitals and clinics and provide patients with e-health services. We must share a patient's report secretly so that the intruders can't steal the patient's data. The pixel value differencing technique is utilised in this study to store a patient's medical information report in various medical imaging, such as ultrasound images, computed tomography scans, X-rays, magnetic resonance images, electrocardiographs, and microscopic images. The fundamental objective is to maintain the visual appearance of the medical images so that physicians can analyse and give accurate results and extract information reports precisely. This PVD scheme works on different types of image formats such as Portable Network Graphics (PNG), Joint Photographic Experts Group (JPG or JPEG), BitMaP (BMP), and Tag image File Format (TIFF). Measurement metrics such as embedding capacity, the difference in histograms between the stego and the cover image, and the peak signal-to-noise ratio (PSNR) are employed to evaluate the effectiveness of the suggested method. On a series of medical images, we have tested this new PVD approach and found that it provides significant payload capacity with the high visual quality of the stego image. The majority of PVD techniques described in the literature only apply to grayscale images, and those that apply to RGB images have falling off boundary problem. RGB images have pixel values that span from 0 to 255, but when the pixels are modified using the PVD technique, sometimes these pixel values fall outside of this range, which causes erroneous results to be obtained during extraction. Additionally, utilising a difference in the histograms of the stego and the cover image, the attacker in a typical PVD technique can disclose the existence and length of the secret message. This novel PVD methodology tackles th
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
暂无评论