This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than ...
详细信息
ISBN:
(纸本)9784885523434
This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.
There are two main approaches to object detection: CNN-based and Transformer-based. The former views object detection as a dense local matching problem, while the latter sees it as a sparse global retrieval problem. R...
ISBN:
(纸本)9798350307184
There are two main approaches to object detection: CNN-based and Transformer-based. The former views object detection as a dense local matching problem, while the latter sees it as a sparse global retrieval problem. Research in neuroscience has shown that the recognition decision in the brain is based on two processes, namely familiarity and recollection. Based on this biological support, we propose an efficient and effective dual-processing object detection framework. It integrates CNN- and Transformer-based detectors into a comprehensive object detection system consisting of a shared backbone, an efficient dual-stream encoder, and a dynamic dual-decoder. To better integrate local and global features, we design a search space for the CNN-Transformer dual-stream encoder to find the optimal fusion solution. To enable better coordination between the CNN- and Transformer-based decoders, we provide the dual-decoder with a selective mask. This mask dynamically chooses the more advantageous decoder for each position in the image based on high-level representation. As demonstrated by extensive experiments, our approach shows flexibility and effectiveness in prompting the mAP of the various source detectors by 3.0 similar to 3.7 without increasing FLOPs.
Deep networks have achieved great success in image rescaling (IR) task that seeks to learn the optimal downscaled representations, i.e., low-resolution (LR) images, to reconstruct the original high-resolution (HR) ima...
ISBN:
(纸本)9798350307184
Deep networks have achieved great success in image rescaling (IR) task that seeks to learn the optimal downscaled representations, i.e., low-resolution (LR) images, to reconstruct the original high-resolution (HR) images. Compared with super-resolution methods that consider a fixed downscaling scheme, e.g., bicubic, IR often achieves significantly better reconstruction performance thanks to the learned downscaled representations. This highlights the importance of a good downscaled representation. Existing IR methods mainly learn the downscaled representation by jointly optimizing the downscaling and upscaling models. Unlike them, we seek to improve the downscaled representation through a different and more direct way - directly optimizing the downscaled image itself instead of the down-/upscaling models. Consequently, we propose a Hierarchical Collaborative Downscaling (HCD) method that performs gradient descent w.r.t. the reconstruction loss in both HR and LR domains to improve the downscaled representations, so as to boost IR performance. Extensive experiments show that our HCD significantly improves the reconstruction performance both quantitatively and qualitatively. Particularly, we improve over popular IR methods by >0.57 dB PSNR on Set5. Moreover, we also highlight the flexibility of our HCD since it can generalize well across diverse image rescaling models. The code is available at https://***/xubingna/HCD.
The incorporation of distributed deep learning for medical imageprocessing in cloud settings is the subject of this study. The findings demonstrate the high viability and significant performance advantages realized b...
详细信息
The proceedings contain 45 papers. The topics discussed include: skin cancer classification using levy stable based ensemble and it’s real-time implementation on OpenVINO toolkit;effects of region of interest locatio...
ISBN:
(纸本)9798350325416
The proceedings contain 45 papers. The topics discussed include: skin cancer classification using levy stable based ensemble and it’s real-time implementation on OpenVINO toolkit;effects of region of interest location on osteoarthritis detection using deep feature learning;invariant convolutional networks;an order and difference local binary pattern for hyperspectral texture classification;leveraging transfer learning for analyzing cattle front teat placement;animated lightning bolt generation using machine learning;and CNN ensemble robust to rotation using radon transform.
In computer 3D graphics analysis technology, we can use image conversion and processing to intuitively convert text into 3D images. This paper introduces computer 3D graphics technology, focuses on the basic principle...
详细信息
Most recent methods of deep image enhancement can be generally classified into two types: decompose-and-enhance and illumination estimation-centric. The former is usually less efficient, and the latter is constrained ...
详细信息
In recent years, quantization technology has proven to be very effective in the field of supervised image retrieval, owing to its capacity to provide both high accuracy and swift retrieval speeds. However, the challen...
详细信息
We discussed the proposed methodology for fire detection, which includes six stages: data acquisition, IoT transmitter, compression, recognition, enhancement, and analysis and evaluation. The methodology was implement...
详细信息
Sugarcane is a significant crop with several uses in the food, bio-energy, and bio-based product sectors. Many elements, including climate, soil fertility, and plant diseases, can have an impact on the quality of suga...
详细信息
暂无评论