This paper aims to address the real-time transmission of image information verification in industrial operations through the fusion of multi-source information. It combines channel and image feature extraction, employ...
详细信息
Attention mechanism is an essential component in convolutional neural networks. Although attention modules like SE and CBAM have achieved tremendous success in fields such as image classification and object detection,...
详细信息
The proceedings contain 44 papers. The topics discussed include: application of deformable registration method of medical images based on unsupervised learning in radiotherapy;adaptive RetinexNet and fusion strategy f...
ISBN:
(纸本)9781510664036
The proceedings contain 44 papers. The topics discussed include: application of deformable registration method of medical images based on unsupervised learning in radiotherapy;adaptive RetinexNet and fusion strategy for low-intensity image enhancement;multi-branch offset architecture for unaligned cross-view geo localization;the effectiveness of image augmentation in pneumonia diagnosis using convolutional neural network;out of distribution detection for medical images;research on sign language gesture division and gesture extraction in complex background;a research on deep learning methods for 3D point cloud semantic segmentation;handwritten Chinese character text image correction method based on block similarity;a target distance estimation method through front-to-rear binocular vision inspired by head bobbing behavior of walking bird;and effectiveness of preprocessing strategies for work hours prediction based on machine learning model.
In this paper, a transform domain information hiding algorithm with visual security based on compressed sensing (CS) is proposed. To increase the security, an improved coupled map lattices is used for keystream and me...
详细信息
Distributed deep neural network (DNN) training is important to support artificial intelligence (AI) applications, such as image classification, natural language processing, and autonomous driving. Unfortunately, the d...
详细信息
ISBN:
(纸本)9798350342918
Distributed deep neural network (DNN) training is important to support artificial intelligence (AI) applications, such as image classification, natural language processing, and autonomous driving. Unfortunately, the distributed property makes the DNN training vulnerable to system failures. Check-pointing is generally used to support failure tolerance, which however suffers from high runtime overheads. In order to enable high-performance and low-latency checkpointing, we propose a lightweight checkpointing system for distributed DNN training, called LightCheck. To reduce the checkpointing overheads, we leverage fine-grained asynchronous checkpointing by pipelining checkpointing in a layer-wise way. To further decrease the checkpointing latency, we leverage the software-hardware codesign methodology by coalescing new hardware devices into our checkpointing system via a persistent memory (PM) manager. Experimental results on six representative real-world DNN models demonstrate that LightCheck offers more than 10x higher checkpointing frequency with lower runtime overheads than stateof-the-art checkpointing schemes. We have released the opensource codes for public use in https://***/LighT-chenml/ ***.
In order to avoid accidental spraying of pedestrians and non motorized vehicles by sprinkler trucks during operation, a method for identifying and ranging pedestrians and non motorized vehicles based on monocular visi...
详细信息
ResNet has emerged as a widely adopted backbone in the field of computervision. This research introduces a novel approach to augment conventional Convolution Neural Networks (CNNs) for image classification by incorpo...
详细信息
The encryption of images is an essential component of ensuring data security in the digital age. Delving into chaotic mappings, our study unveils their robust potential for image encryption. In this paper, we propose ...
The encryption of images is an essential component of ensuring data security in the digital age. Delving into chaotic mappings, our study unveils their robust potential for image encryption. In this paper, we propose a novel scheme for encryption-decryption by merging three chaotic mappings - Logistic, Tent, and Intermittent - into a single scheme. Our rigorous empirical evaluations, spanning histogram uniformity, entropy metrics, and key sensitivity, establish the unparalleled efficacy of our amalgamated approach.
Non-uniformities in pixels are prevalent in existing image sensors. In low illumination environments, the pixel nonuniformities result in the undesirable fixed-pattern noise (FPN), which severely limits the imaging ca...
详细信息
The visual dialog task requires a deep understanding of an image and a dialog history to answer multiple consecutive questions. Existing research focuses on enhancing cross-modal interaction and fusion but often overl...
详细信息
ISBN:
(纸本)9783031442223;9783031442230
The visual dialog task requires a deep understanding of an image and a dialog history to answer multiple consecutive questions. Existing research focuses on enhancing cross-modal interaction and fusion but often overlooks the computational complexity and higher-level interaction between the two modalities. This paper proposes a hierarchical vision and language Transformer (HVLT) to address these issues. Specifically, HVLT employs a convolution-like design to learn the interaction and fusion of images and text at different levels. We employ a token merging module to aggregate four spatially adjacent image tokens and four temporally adjacent text tokens into one token and use the expanded [CLS] token to fuse image and text information in a new dimension. This hierarchical architecture allows the model to focus on feature maps of different sizes and dialog history at word, phrase, and sentence levels and reduces the time overhead. We tailor two training objectives for HVLT: masked language regression (MLR) and next sentence prediction (NSP), which help the model understand images and language and learn their relationships. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate the competitive performance of HVLT. Finally, we visualize the attention to gain insights into how HVLT works in practice, shedding light on its interpretability.
暂无评论