Existing attention based image captioning approaches treat local feature and global feature in the image individually, neglecting the intrinsic interaction between them that provides important guidance for generating ...
详细信息
ISBN:
(纸本)9789819916443;9789819916450
Existing attention based image captioning approaches treat local feature and global feature in the image individually, neglecting the intrinsic interaction between them that provides important guidance for generating caption. To alleviate above issue, in this paper we propose a novel Local-Global visual Interaction Network (LGVIN) that novelly explores the interactions between local feature and global feature. Specifically, we devise a new visual interaction graph network that mainly consists of visual interaction encoding module and visual interaction fusion module. The former implicitly encodes the visual relationships between local feature and global feature to obtain an enhanced visual representation containing rich local-global feature relationship. The latter fuses the previously obtained multiple relationship features to further enrich different-level relationship attribute information. In addition, we introduce a new relationship attention based LSTM module to guide the word generation by dynamically focusing on the previously output fusion relationship information. Extensive experimental results show that the superiority of our LGVIN approach, and our model obviously outperforms the current similar relationship based image captioning methods.
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challen...
ISBN:
(纸本)9781713899921
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey the desired edit. We present a method for image editing via visual prompting. Given example pairs that represent the "before" and "after" images of an edit, our approach learns a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that even with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
In surveillance video, target tracking is an important part. Based on imageprocessing technology, this paper studies a real-time and effective method to collect and recognize camera motion information. Firstly, the i...
详细信息
ISBN:
(纸本)9798350310801
In surveillance video, target tracking is an important part. Based on imageprocessing technology, this paper studies a real-time and effective method to collect and recognize camera motion information. Firstly, the influence of visual dead angle and illumination on recognition is analyzed. Secondly, according to the characteristic of background light intensity, the corresponding algorithm is designed to realize the positioning and tracking control strategy of the target and surrounding environment scenery. Finally, the correctness of the method is verified by MATLAB simulation software, so as to obtain a better and scalable scheme, which is more economical and feasible after the occlusion rate is minimized.
In this paper, we propose an innovative method for refining segmentation method that improves the visual quality of Video-based Point Cloud Compression (V-PCC) encoder. Recently standardized as an international standa...
详细信息
Blood cells play an essential role in various bodily functions, such as protection against infections and the body's defense. The accurate classification of blood cells, generally grouped as red, white, and platel...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
Blood cells play an essential role in various bodily functions, such as protection against infections and the body's defense. The accurate classification of blood cells, generally grouped as red, white, and platelets is important for clinical diagnosis and hematological analysis. However, identifying these cells is a specialized and time-consuming process. Therefore, there is a hot-topic for high-precision automatic blood cell classification methods. Convolutional neural networks (CNNs) are a deep learning model used for visual data analysis and are very powerful in extracting features from data. In this study, we propose a hybrid classification model that combines the feature extraction power of CNNs with the ensemble-based prediction capabilities of Random Forest and XGBoost algorithms. The proposed hybrid model is compared with different methods on the BloodMNIST dataset in terms of classification performance and inference time. The results show that the tree-based methods outperform CNN by up to 8.49 and 11.62 points and achieve up to 82.9 times better inference times than other methods.
In this paper, considering the retinal structure of human eye, and the composition characteristics of screen content images (SCIs), a multi-pathway convolutional neural network (CNN) with picture-text competition is p...
详细信息
ISBN:
(纸本)9781665475921
In this paper, considering the retinal structure of human eye, and the composition characteristics of screen content images (SCIs), a multi-pathway convolutional neural network (CNN) with picture-text competition is proposed for SCIs quality assessment. According to the visual mechanism of human retina, we design a retinal structure simulation module, which uses multiple parallel convolution pathways to simulate the parallel transmission of visual signals by bipolar cells and uses a multi-pathway feature fusion (MPFF) module to allocate the weight for each channel to simulate horizontal cells' regulation of the information transmission. In addition, we design an adaptive feature extraction and competition module (AFEC) to directly extract the features of textural and pictorial regions and distribute the weight. Furthermore, the attention module combined with deformable convolution and channel attention can accurately extract image edge features and reduce redundancy of information. Experimental results show that the proposed method is superior to the mainstream methods.
This study tackles the difficult issues of image captioning while negotiating the complexity of visual data processing. The complexity of visual data and the associated processing requirements make image captioning a ...
详细信息
The role of computer vision technology in the field of artificial intelligence development is very important, but there is a problem of poor application effect of key technologies. Traditional neural network algorithm...
详细信息
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction co...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction convolution neural network (CNN) model combined with novel binocular feature interaction consistent with human visual system (HVS). In order to simulate the characteristics of HVS sensing multi-scale information at the same time, parallel multi-scale feature extraction module (PMSFM) followed by compensation information is proposed. And modified convolutional block attention module (MCBAM) with less computational complexity is designed to generate visual attention maps for the multi-scale features extracted by the PMSFM. In addition, we employ cross-stacked strategy for multi-level binocular fusion maps and binocular disparity maps to simulate the hierarchical perception characteristics of HVS. Experimental results show that our method is superior to the state-of-the-art metrics and achieves an excellent performance.
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising...
详细信息
ISBN:
(纸本)9781665475921
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising results, most of them rely on the local deep features and neglect global relationship information between the local regions. Therefore, inevitably leading to over-smooth or detail loss results. To solve this issue, we design a UNet-based end-to-end architecture for image desnowing. Specially, to better characterize global information and preserve image detail, we combine Window-based Self-Attention (WSA) transformer block with Residue Spatial Attention (RSA) to build basic unit of our network. Besides, to protect the structure of the image effectively, we also introduce a Residue Channel (RC) loss to guide high-quality image restoration. Extensive experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves new state-of-the-art results.
暂无评论