In order to avoid accidental spraying of pedestrians and non motorized vehicles by sprinkler trucks during operation, a method for identifying and ranging pedestrians and non motorized vehicles based on monocular visi...
详细信息
In this study four deep learning-based image segmentation models were constructed to segment brain tumors from FLAIR and T1ce MRI modalities. The four models were 2D U-Net, and the three 2D U-Net based variants which ...
详细信息
The proceedings contain 44 papers. The topics discussed include: application of deformable registration method of medical images based on unsupervised learning in radiotherapy;adaptive RetinexNet and fusion strategy f...
ISBN:
(纸本)9781510664036
The proceedings contain 44 papers. The topics discussed include: application of deformable registration method of medical images based on unsupervised learning in radiotherapy;adaptive RetinexNet and fusion strategy for low-intensity image enhancement;multi-branch offset architecture for unaligned cross-view geo localization;the effectiveness of image augmentation in pneumonia diagnosis using convolutional neural network;out of distribution detection for medical images;research on sign language gesture division and gesture extraction in complex background;a research on deep learning methods for 3D point cloud semantic segmentation;handwritten Chinese character text image correction method based on block similarity;a target distance estimation method through front-to-rear binocular vision inspired by head bobbing behavior of walking bird;and effectiveness of preprocessing strategies for work hours prediction based on machine learning model.
ResNet has emerged as a widely adopted backbone in the field of computervision. This research introduces a novel approach to augment conventional Convolution Neural Networks (CNNs) for image classification by incorpo...
详细信息
This paper aims to address the real-time transmission of image information verification in industrial operations through the fusion of multi-source information. It combines channel and image feature extraction, employ...
详细信息
In this paper, a transform domain information hiding algorithm with visual security based on compressed sensing (CS) is proposed. To increase the security, an improved coupled map lattices is used for keystream and me...
详细信息
The visual dialog task requires a deep understanding of an image and a dialog history to answer multiple consecutive questions. Existing research focuses on enhancing cross-modal interaction and fusion but often overl...
详细信息
ISBN:
(纸本)9783031442223;9783031442230
The visual dialog task requires a deep understanding of an image and a dialog history to answer multiple consecutive questions. Existing research focuses on enhancing cross-modal interaction and fusion but often overlooks the computational complexity and higher-level interaction between the two modalities. This paper proposes a hierarchical vision and language Transformer (HVLT) to address these issues. Specifically, HVLT employs a convolution-like design to learn the interaction and fusion of images and text at different levels. We employ a token merging module to aggregate four spatially adjacent image tokens and four temporally adjacent text tokens into one token and use the expanded [CLS] token to fuse image and text information in a new dimension. This hierarchical architecture allows the model to focus on feature maps of different sizes and dialog history at word, phrase, and sentence levels and reduces the time overhead. We tailor two training objectives for HVLT: masked language regression (MLR) and next sentence prediction (NSP), which help the model understand images and language and learn their relationships. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate the competitive performance of HVLT. Finally, we visualize the attention to gain insights into how HVLT works in practice, shedding light on its interpretability.
Non-uniformities in pixels are prevalent in existing image sensors. In low illumination environments, the pixel nonuniformities result in the undesirable fixed-pattern noise (FPN), which severely limits the imaging ca...
详细信息
The encryption of images is an essential component of ensuring data security in the digital age. Delving into chaotic mappings, our study unveils their robust potential for image encryption. In this paper, we propose ...
The encryption of images is an essential component of ensuring data security in the digital age. Delving into chaotic mappings, our study unveils their robust potential for image encryption. In this paper, we propose a novel scheme for encryption-decryption by merging three chaotic mappings - Logistic, Tent, and Intermittent - into a single scheme. Our rigorous empirical evaluations, spanning histogram uniformity, entropy metrics, and key sensitivity, establish the unparalleled efficacy of our amalgamated approach.
White blood cell classification is a task that is given paramount importance in the field of pathology in order to accurately diagnose a plethora of ailments and diseases. Blood cell classification is essential for th...
详细信息
暂无评论