Garbage classification is the key link of garbage recycling. Based on machine vision and lightweight convolutional neural network MobileNet V2, a garbage collection system for municipal solid waste classification and ...
详细信息
In the context of construction engineering, the classification and calculation of the amount of decoration has become the main problem. the structure of engineering drawings is complex and the drawing standards are no...
详细信息
Visual Question Answering (VQA) systems, while advancing in intelligence, still face challenges in handling complex queries. Understanding the behavior of VQA models is crucial, especially in assessing their reli...
详细信息
image captioning is the task of generating a textual description that accurately represents the content of an image. this task involves combining computervision techniques, such as object recognition and scene unders...
image captioning is the task of generating a textual description that accurately represents the content of an image. this task involves combining computervision techniques, such as object recognition and scene understanding, with natural language processing to produce a human-like description of an image. Over time, various models have been introduced to perform image captioning, all aiming to accurately describe the content of an image. these models have practical applications such as improving the accessibility of multimedia content, assisting individuals with visual impairments, medical image captioning, and enhancing image search and retrieval. this paper explores some of the models and studies their efficiency using different evaluation metrics.
Remote sensing image captioning has been widely applied to traffic management, geographic research, etc. Although the neural network approach has been successfully improving the performance of the Remote sensing image...
Remote sensing image captioning has been widely applied to traffic management, geographic research, etc. Although the neural network approach has been successfully improving the performance of the Remote sensing image captioning system, Remote sensing image captioning is still facing object identification challenges due to the small size of objects, uneven distribution, and high coupling withthe surrounding image background. In this paper, we propose a novel remote sensing image captioning encoder-decoder model Hierarchical rearrangement-Multi-Layer Perceptron (HMLP) whose encoder adapts hierarchical rearrangement-multi-layer perceptron to improve the capability of objection recognition. Extensive experiments have been conducted to testify HMLP by three datasets RSCID, UCM-caption, and NWPU-caption. Results show that HMLP outperforms many image captioning systems in the evaluation metrics BLEU4, METEOR, ROUGE-L, and CIDEr.
We introduce a multi-stage framework that uses local geometry changes on a hand surface and focuses on learning interaction between a primary and assistive hand/object for hand action recognition in videos from a egoc...
详细信息
the proceedings contain 16 papers. the topics discussed include: dangerous behavior recognition based on pose estimation and action analysis;simultaneous recognition algorithm of human activity and phone position base...
ISBN:
(纸本)9781450396103
the proceedings contain 16 papers. the topics discussed include: dangerous behavior recognition based on pose estimation and action analysis;simultaneous recognition algorithm of human activity and phone position based on multi-sensor data fusion;bilateral pose transformer for human pose estimation;GBSAR moving target detection capability evaluation and refocus based detection algorithm;a method of dual-spectrum feature fusion for face recognition under non-ideal lighting conditions;suppress target height induced false alarm in CSAR moving target detection;threshold selection on circular histogram using Renyi entropy;hierarchical vision transformer with channel attention for RGB-D image segmentation;uni-dimensional autoencoder reinforced multilayer perceptron network for individual behavior detection;and a lightweight video summarization method considering the subjective transition degree for online educational screen content videos.
the recent surge in public space criminal activities underscores the need for an efficient system to promptly detect, recognize, and track criminals. Existing AI-based criminal detection literature, while insightful, ...
详细信息
Diabetic Retinopathy is one reason of avoidable vision loss that frequently affects working people around the world. Blood vessel segmentation aids in the early identification of retinal fundus pictures, which is crit...
详细信息
In this digital era, we have a wide variety of image editing software that is prone to create malicious alterations on images. Hence, the evaluation for authenticity of image contents and identification of malicious m...
详细信息
暂无评论