The integration of multiple pre-trained models in robotic navigation has the advantage of combining diverse strengths, leading to robust and generalized performance. However, the effectiveness of these models is often...
详细信息
ISBN:
(纸本)9789819607884;9789819607891
The integration of multiple pre-trained models in robotic navigation has the advantage of combining diverse strengths, leading to robust and generalized performance. However, the effectiveness of these models is often limited by path planning strategies, necessitating improvements in navigation capabilities. To overcome this, we introduce the Free-form Instruction Guided Robotic Navigation Path Planning with Large Vision-Language Model (FIG-RN). This model leverages free-form instructions to extract landmarks and directional cues, utilizing a pre-trained visual-language model to associate these landmarks with map nodes, thereby laying the groundwork for subsequent path planning. It evaluates landmark-node matches, node accessibility, and orientation to optimize path planning. Compared to traditional models, FIG-RN offers significant benefits: (i) it requires no map annotations due to its use of high-quality pre-trained models, (ii) it maximizes information use from instructions for better path efficacy, and (iii) it refines visual-language model matching values for improved local navigation. Experimentally, FIG-RN outperforms LM-Nav in success rate, efficiency, and accuracy, with improvements of 0.2, 0.2143, and 0.208, respectively.
Task-oriented dialog (TOD) systems use external knowledge sources to help users accomplish specific tasks. While most current TOD research focuses on simple information-collecting tasks in a slot-filling framework, mu...
详细信息
ISBN:
(数字)9789819607921
ISBN:
(纸本)9789819607914;9789819607921
Task-oriented dialog (TOD) systems use external knowledge sources to help users accomplish specific tasks. While most current TOD research focuses on simple information-collecting tasks in a slot-filling framework, multi-step reasoning tasks like troubleshooting remain under-explored. Leveraging the advancements of large language models (LLMs), we propose a novel LLM-based multi-agent learning framework to build troubleshooting dialogue systems and evaluate the effectiveness of various multi-agent learning settings in a TOD system. Our results show that LLMs designed for open-domain dialog face challenges when directly applied to TOD systems, but with multi-agent cooperative enhancements, LLMs can achieve commendable performance.
In response to the long training time of satellite remote sensing application models, this paper first introduces the main methods of parallel training, as well as the support provided by PyTorch for parallel training...
详细信息
ISBN:
(数字)9798331544577
ISBN:
(纸本)9798331544584
In response to the long training time of satellite remote sensing application models, this paper first introduces the main methods of parallel training, as well as the support provided by PyTorch for parallel training. Based on this, the target detection model YOLOv4 and the U-Net-based land-sea segmentation model were trained in parallel, and the experiments showed that the training time of both types of models was significantly reduced.
Advancements in artificial intelligence (AI) have transformed robotics by enabling systems to autonomously execute complex tasks with minimal human involvement. Traditional methods, however, often depend on costly har...
详细信息
Accurate estimation of spatial gait parameters is crucial for assessing fall risk in older adults, helping to identify potential movement impairments and prevent falls. Traditionally, extracting stride length from wea...
详细信息
Object detection technology has been widely applied in the conservation and monitoring of wildlife in recent years. However, due to the complexity of field environments, the collected images of wild animals often suff...
详细信息
Underwater images are widely used in marine science, ocean engineering, and underwater robotics. However, challenges such as insufficient lighting, scattering, and absorption often degrade image quality, limiting thei...
详细信息
Mobile robots are becoming increasingly ubiquitous in modern society, requiring more human-like interaction capabilities, such as following operator instructions and collaborating with humans. Conventional robot progr...
详细信息
In the era of artificial intelligence generated content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image) are dynamically reshaping the natural content. Brain signals, serving as potential re...
详细信息
Robot grasping is widely recognized as a crucial component of robotics activities. Several deep learning based grasping algorithms for planar and 6-degree-of-freedom have been presented, and they have produced good re...
详细信息
ISBN:
(纸本)9789819607730;9789819607747
Robot grasping is widely recognized as a crucial component of robotics activities. Several deep learning based grasping algorithms for planar and 6-degree-of-freedom have been presented, and they have produced good results in simulation and real world. However, when these algorithms do grasping posture estimation, their projected grasping poses may not always make sense for the grasping site, even if they cover the item under consideration. These algorithms tend to focus on the thing as a whole and perform activities that differ significantly from human behavior. To that end, we propose our GI-Grasp, a novel strategy that allows the robot to perceive the object to be grasped at a finer scale by introducing vision-language models (VLMs) to determine which part of the object is more suitable for grasping, guiding the robot to act like a human. First, we segment the RGB images of the grasping scene into instances in order to detect and localize the items to be clutched. Secondly, we provide the robot with a priori knowledge of the objects to be grasped through VLMs to help the robot understand the compositional details of the objects to be grasped and identify the spatial constraints related to the grasping task. Finally, acceptable position prediction is combined with the grasping algorithm to improve the robot's grasping accuracy. Our real-world experiments have proven that GI-Grasp of object features assists robots in grasping items in a more human like (and reasonable) style, increasing the success rate of grasping.
暂无评论