With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...
详细信息
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the *** limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image *** address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer *** inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information ***,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate *** textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the ***,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded *** experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
Rapid urbanization has made road construction and maintenance imperative, but detecting road diseases has been time-consuming with limited accuracy. To overcome these challenges, we propose an efficient YOLOv7 road di...
详细信息
People-centric activity recognition is one of the most critical technologies in a wide range of real-world applications,including intelligent transportation systems, healthcare services, and brain-computer interfaces....
详细信息
People-centric activity recognition is one of the most critical technologies in a wide range of real-world applications,including intelligent transportation systems, healthcare services, and brain-computer interfaces. Large-scale data collection and annotation make the application of machine learning algorithms prohibitively expensive when adapting to new tasks. One way of circumventing this limitation is to train the model in a semi-supervised learning manner that utilizes a percentage of unlabeled data to reduce the labeling burden in prediction tasks. Despite their appeal, these models often assume that labeled and unlabeled data come from similar distributions, which leads to the domain shift problem caused by the presence of distribution gaps. To address these limitations, we propose herein a novel method for people-centric activity recognition,called domain generalization with semi-supervised learning(DGSSL), that effectively enhances the representation learning and domain alignment capabilities of a model. We first design a new autoregressive discriminator for adversarial training between unlabeled and labeled source domains, extracting domain-specific features to reduce the distribution gaps. Second, we introduce two reconstruction tasks to capture the task-specific features to avoid losing information related to representation learning while maintaining task-specific consistency. Finally, benefiting from the collaborative optimization of these two tasks, the model can accurately predict both the domain and category labels of the source domains for the classification task. We conduct extensive experiments on three real-world sensing datasets. The experimental results show that DGSSL surpasses the three state-of-the-art methods with better performance and generalization.
A mobile ad hoc network (MANET) is an independent wireless temporary network established by employing a set of mobile nodes (i.e. laptops, smartphones, iPods, etc.) appropriate for the environment in which the network...
详细信息
A mobile ad hoc network (MANET) is an independent wireless temporary network established by employing a set of mobile nodes (i.e. laptops, smartphones, iPods, etc.) appropriate for the environment in which the network infrastructures are not fixed. The most common problems faced by MANET are energy efficiency, high energy consumption, low network lifetime as well as high traffic overhead which create an impact on overall network topology. Hence, it is necessary to provide an energy-effective CH election to take steps against such issues. Therefore, this paper proposes a novel model to enhance the network lifetime and energy efficiency by performing a routing strategy in MANET. In this paper, an optimal CH is selected by proposing a novel Fuzzy Marine White Shark optimization (FMWSO) algorithm which is obtained by integrating fuzzy operation with two optimization algorithms namely the marine predator algorithm and white shark optimizer. The proposed approach comprises three diverse stages namely Generation of data, Cluster Generation and CH selection. A novel FMWSO algorithm is proposed in such a way to determine the CH selection in MANET thereby enhancing the network topology, network lifetime and minimizing the overhead rate, and energy consumption. Finally, the performance of the proposed FMWSO approach is compared with various other existing techniques to determine the effectiveness of the system. The proposed FMWSO approach consumes minimum energy of 0.62 mJ which is lower than other approaches.
Modeling urban mobility behaviours with micro-scopic traffic flow simulation is now crucial for studying intel-ligent urban decision-making algorithms, such as traffic light control and road congestion charging. Howev...
详细信息
With the advent of the Web 3.0 era, the amount and types of data in the network have sharply increased, and the application scenarios of recommendation algorithms are continuously expanding. Location recommendation ha...
详细信息
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application *** the introduction of end-to-end direct regression methods,the field has ent...
详细信息
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application *** the introduction of end-to-end direct regression methods,the field has entered a new stage of ***,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal *** this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external ***,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding *** call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
Owing to the challenge of target occlusion leading to tracking failure during the target tracking process, achieving efficient and robust tracking of targets under occlusion scenarios has become a focal point of resea...
详细信息
Factors have always played an important role in stock analysis, but they are only effective for specific problems in specific scenarios. Therefore, constructing factors timely and quickly for different scenarios is an...
详细信息
Purpose: This paper presents a theoretical analysis of the DynaTrans algorithm, a novel approach for dynamic optimization of urban transportation networks. Design/methodology/approach: We introduce an Adaptive Closene...
详细信息
暂无评论