Food categorization is pivotal in numerous aspects of everyday life, assisting in the selection of food, managing diets, and addressing essential survival requirements. By leveraging the complementary information of v...
详细信息
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application *** the introduction of end-to-end direct regression methods,the field has ent...
详细信息
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application *** the introduction of end-to-end direct regression methods,the field has entered a new stage of ***,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal *** this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external ***,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding *** call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
Natural language processing (NLP) is rapidly developing. A series of Large Language Models (LLMs) have emerged, represented by ChatGPT, which have made significant breakthroughs in natural language understanding and g...
详细信息
Variational Autoencoder (VAE), as one of the main generative models, has a powerful representation learning capability. However, the hidden space representation learned by VAE is a high-dimensional and complex vector ...
详细信息
ISBN:
(数字)9798350385557
ISBN:
(纸本)9798350385564
Variational Autoencoder (VAE), as one of the main generative models, has a powerful representation learning capability. However, the hidden space representation learned by VAE is a high-dimensional and complex vector space, which makes it difficult to explain how the model gradually learns and composes the final generated results on different semantic features. To address this problem, firstly, this paper increases the degree of decoupling between different semantic features by increasing the independence between the hidden variables of the modal hermitian space, and explains the learning process of the model on different hermitian spaces by visualization based on the feature decoupling model. In addition, this paper also proposes a hidden variable contribution index to measure the influence of different dimensional hidden variables on the generation results, so as to explain the learning process of the model.
Reinforcement learning has been successfully applied in software testing, but the existing testing methods cannot perform effective testing according to the characteristics of applications, and using outdated interact...
详细信息
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...
详细信息
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the *** limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image *** address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer *** inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information ***,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate *** textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the ***,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded *** experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
Twitter has become a popular platform to receive daily updates. The more the people rely on it, the more critical it becomes to get genuine information out. False information can easily be shared on Twitter, which inf...
详细信息
Wireless power transmission has been widely used to replenish energy for wireless sensor networks, where the energy consumption rate of sensor nodes is usually time varying and indefinite. However, few works have inve...
详细信息
As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Mod...
详细信息
Random sample partition (RSP) is a newly developed data management and processing model for Big Data processing and analysis. To apply the RSP model for Big Data computation tasks, it is very important to measure the ...
详细信息
暂无评论