检索结果-内蒙古大学图书馆

17th international conference on Intelligent robotics and Applications

作者： Du, Yuhao Wu, Chengzhong Feng, Mingtao Luo, Jianqiao Zhong, Hang Miao, Zhiqiang Wang, Yaonan Xidian Univ Xian Peoples R China Jiangxi Prov Commun Terminal Ind Co Ltd Xian Peoples R China Hunan Univ Changsha Peoples R China

ISBN: (纸本)9789819607884;9789819607891

The integration of multiple pre-trained models in robotic navigation has the advantage of combining diverse strengths, leading to robust and generalized performance. However, the effectiveness of these models is often limited by path planning strategies, necessitating improvements in navigation capabilities. To overcome this, we introduce the Free-form Instruction Guided Robotic Navigation Path Planning with Large Vision-Language Model (FIG-RN). This model leverages free-form instructions to extract landmarks and directional cues, utilizing a pre-trained visual-language model to associate these landmarks with map nodes, thereby laying the groundwork for subsequent path planning. It evaluates landmark-node matches, node accessibility, and orientation to optimize path planning. Compared to traditional models, FIG-RN offers significant benefits: (i) it requires no map annotations due to its use of high-quality pre-trained models, (ii) it maximizes information use from instructions for better path efficacy, and (iii) it refines visual-language model matching values for improved local navigation. Experimentally, FIG-RN outperforms LM-Nav in success rate, efficiency, and accuracy, with improvements of 0.2, 0.2143, and 0.208, respectively.

关键词： Robotic Navigation Path Planning Pre-trained models Free-form Instructions

来源：评论

学校读者我要写书评

暂无评论

Enhancing Troubleshooting Task-Oriented Dialog Systems with Large Language models 1

引用

17th international conference on Intelligent robotics and Applications

作者： Zhou, Jiahao Zhang, Qiang Zhang, Fengda Yuan, Caixia Beijing Univ Posts & Telecommun Beijing Peoples R China State Grid Smart Grid Res Inst CoLtd Beijing Peoples R China State Grid Shandong Elect Power Res Inst Jinan Peoples R China

ISBN: (数字)9789819607921

ISBN: (纸本)9789819607914;9789819607921

Task-oriented dialog (TOD) systems use external knowledge sources to help users accomplish specific tasks. While most current TOD research focuses on simple information-collecting tasks in a slot-filling framework, multi-step reasoning tasks like troubleshooting remain under-explored. Leveraging the advancements of large language models (LLMs), we propose a novel LLM-based multi-agent learning framework to build troubleshooting dialogue systems and evaluate the effectiveness of various multi-agent learning settings in a TOD system. Our results show that LLMs designed for open-domain dialog face challenges when directly applied to TOD systems, but with multi-agent cooperative enhancements, LLMs can achieve commendable performance.

关键词： Large Language Model Task Oriented Dialog System Multi-agent

来源：评论

学校读者我要写书评

暂无评论

Satellite Remote Sensing Application Model Parallel Training Technology Research

Satellite Remote Sensing Application Model Parallel Training...

引用

robotics, Control and automation (ICRCA), international conference on

作者： Xuebo Zhang National Key Laboratory of Intelligent Spatial Information College of Aerospace Information Aerospace Engineering University Beijing China

ISBN: (数字)9798331544577

ISBN: (纸本)9798331544584

In response to the long training time of satellite remote sensing application models, this paper first introduces the main methods of parallel training, as well as the support provided by PyTorch for parallel training. Based on this, the target detection model YOLOv4 and the U-Net-based land-sea segmentation model were trained in parallel, and the experiments showed that the training time of both types of models was significantly reduced.

关键词： Training Satellites automation Training data Object detection Parallel processing Data models Remote sensing Robots

来源：评论

学校读者我要写书评

暂无评论

Humanoid robotics Powered by Vision-Based LLM Zero Shot Interactions

Humanoid Robotics Powered by Vision-Based LLM Zero Shot Inte...

引用

2025 IEEE SoutheastCon, SoutheastCon 2025

作者： Smith, Gabriel David Debrah-Pinamang, Kwasi Voicu, Razvan Cristian Kennesaw State University Department of Robotics & Mechatronics Engineering MariettaGA United States

ISBN: (纸本)9798331504847

Advancements in artificial intelligence (AI) have transformed robotics by enabling systems to autonomously execute complex tasks with minimal human involvement. Traditional methods, however, often depend on costly hardware, continuous monitoring, and intricate software integration, which constrain scalability and widespread implementation. This research presents an innovative approach that combines vision-based large language models (LLMs) with zero-shot prompting to autonomously program robotic systems, including a humanoid robot equipped with dual manipulators. The proposed system harnesses contextual image data to efficiently generate task-specific code, eliminating the need for iterative corrections. Training is conducted through OpenAI's Assistant feature, utilizing documents predominantly comprising images, while continuous operation is facilitated by a self-looping mechanism. Experimental results highlight the system's capability to perform manipulator tasks with notable accuracy, paving the way for scalable, adaptive, and dynamic automation. This study addresses both practical and theoretical challenges in automation, providing a cost-effective framework for next-generation robotic systems. © 2025 IEEE.

关键词： Anthropomorphic robots

来源：评论

学校读者我要写书评

暂无评论

End-to-End Deep Learning models for Estimating Stride Length in Frail Older Adults 24

End-to-End Deep Learning Models for Estimating Stride Length...

引用

2nd international conference on Frontiers of Intelligent Manufacturing and automation, CFIMA 2024

作者： Zhang, Yan Cai, Jinghao Li, Xiang Wu, Chuanyan Ma, Xin Li, Yibin Song, Rui Zhang, Huanghe School of Control Science and Engineering Shandong University Shandong China The Center for Robotics School of Control Science and Engineering Shandong University Shandong China School of Intelligent Engineering Shandong Management University Shandong China

ISBN: (纸本)9798400710681

Accurate estimation of spatial gait parameters is crucial for assessing fall risk in older adults, helping to identify potential movement impairments and prevent falls. Traditionally, extracting stride length from wearable sensors relies on the double integration of accelerometer data combined with the zero-velocity update (ZUPT) technique. However, this method may not be suitable for older adults, as their zero-velocity phases might be absent. To address this limitation, we propose three fundamental end-to-end deep learning models—deep neural networks (DNN), deep convolutional neural networks (DCNN), and long short-term memory (LSTM)—to estimate stride length from inertial sensor data corresponding to specific strides. These models are trained on a publicly available dataset (i.e., GSTRIDE) comprising 165,694 strides from 155 elderly patients and are evaluated using hold-out and 10-fold cross-validation methodologies. The DCNN model achieved the best results, with a mean absolute error (MAE) of 8.9 cm using the hold-out method. Compared to the traditional double integration method, the proposed deep learning models do not require zero-velocity phase detection. Instead, they rely on initial contact detection, making them more suitable for older adults, where initial contacts can be reliably detected even when they do not occur under the hindfoot. © 2024 Copyright held by the owner/author(s).

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Animal image recognition system based on Yolov8-AFPN

Animal image recognition system based on Yolov8-AFPN

引用

2024 international conference on Mechatronic Engineering and Artificial Intelligence, MEAI 2024

作者： Qin, Lijuan Tang, Xiaoyu Li, Ning School of Information Science and Engineering Shenyang Ligong University Shenyang China State Key Laboratory of Robotics Shenyang Institute of Automation Chinese Academy of Science Shenyang China

ISBN: (纸本)9781510689176

Object detection technology has been widely applied in the conservation and monitoring of wildlife in recent years. However, due to the complexity of field environments, the collected images of wild animals often suffer from severe lighting effects, partial blurriness, and significant environmental differences. These issues greatly impact the efficiency and accuracy of object recognition. To overcome these challenges, this paper employs Gaussian filtering as an image enhancement technique and introduces an Asymptotic Feature Pyramid Network (AFPN) to improve the YOLOv8 model. The AFPN extracts features from the bottom up and uses adaptive spatial fusion operations to optimize the feature fusion process, significantly enhancing the detection performance of animal image recognition models. © 2025 SPIE.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Underwater Image Denoising and Enhancement via Diffusion models and Attention Mechanisms 24

Underwater Image Denoising and Enhancement via Diffusion Mod...

引用

4th international conference on Signal Processing and Communication Technology, SPCT 2024

作者： Chen, Wei Gong, Chengxin Liu, Yanghe Wu, Bodong Zhu, Minrui Moscow State University Moscow Russia Shenzhen MSU-BIT University Guangdong Shenzhen China

ISBN: (纸本)9798400710636

Underwater images are widely used in marine science, ocean engineering, and underwater robotics. However, challenges such as insufficient lighting, scattering, and absorption often degrade image quality, limiting their effectiveness. To address this, we propose AquaDiffusionNet, a model combining Diffusion models and Attention Mechanisms for underwater image denoising and enhancement. Built on the U-Net architecture, it removes complex noise using diffusion models and captures critical features through multi-head attention mechanisms. AquaDiffusionNet achieves simultaneous denoising and detail enhancement, improving image contrast and clarity. Experiments on public datasets like UIEB show significant performance gains, with PSNR increased by 3.0 dB and SSIM by 0.07, outperforming traditional methods. This model enhances visual quality and provides superior input for subsequent image analysis, driving advancements in underwater vision technology. © 2024 Copyright held by the owner/author(s).

关键词： Underwater imaging

来源：评论

学校读者我要写书评

暂无评论

Boosting Behavior Tree Generation for Robots with Large Language models and Genetic Programming

Boosting Behavior Tree Generation for Robots with Large Lang...

引用

2025 IEEE international conference on Simulation, Modeling, and Programming for Autonomous Robots, SIMPAR 2025

作者： Verdaguer-Gonzalez, Aaron Dalmau-Moreno, Magi Merino, Luis Garcia, Nestor Eurecat Technology Centre of Catalonia Robotics and Automation Unit Barcelona Spain Universidad Pablo Olavide Service Robotics Laboratory Sevilla Spain

ISBN: (纸本)9798331516857

Mobile robots are becoming increasingly ubiquitous in modern society, requiring more human-like interaction capabilities, such as following operator instructions and collaborating with humans. Conventional robot programming methods often fall short in achieving these complex behaviors. Behavior Trees (BTs) offer a promising alternative due to their modularity, scalability and reactivity. We propose utilizing general purpose, system-prompted Large Language Model (LLM) assistants to decompose task descriptions into executable BTs, which are subsequently refined using Genetic Programming (GP) and a state machine like low-resource BT execution simulator, where they will be tested for task completion. This approach eliminates the need for fine-tuning LLMs, thereby reducing computational costs and saving time and energy. Our method successfully solves all proposed scenarios, enhances applicability across diverse environments, and democratizes behavior generation for non-experts, outperforming baseline methods in efficiency. © 2025 IEEE.

关键词： Mobile robots

来源：评论

学校读者我要写书评

暂无评论

Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2025年第5期6卷 1080-1099页

作者： Mai, Weijian Zhang, Jian Fang, Pengfei Zhang, Zhijun South China University of Technology School of Automation Science and Engineering Guangzhou510640 China Southeast University School of Computer Science and Engineering Nanjing210096 China South China University of Technology Key Library of Autonomous Systems and Network Control Ministry of Education The School of Automation Science and Engineering Guangzhou510640 China Institute for Super Robotics Huangpu Guangzhou510555 China Nanchang University Jiangxi Thousand Talents Plan Nanchang330031 China Jishou University College of Computer Science and Engineering Jishou416000 China Guangdong Artificial Intelligence and Digital Economy Laboratory Pazhou Lab Guangzhou510335 China Shaanxi University of Technology Shaanxi Provincial Key Laboratory of Industrial Automation School of Mechanical Engineering Hanzhong723001 China Changsha Normal University School of Information Science and Engineering Changsha410100 China Guangdong University of Petrochemical Technology School of Automation Science and Engineering Institute of Artificial Intelligence and Automation Maoming525000 China

In the era of artificial intelligence generated content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image) are dynamically reshaping the natural content. Brain signals, serving as potential reflections of how the brain interprets external information, exhibit a distinctive one-to-many correspondence with various external modalities. This correspondence makes brain signals emerge as a promising guiding condition for multimodal synthesis (e.g., image, text, and audio), which is crucial for developing practical brain–computer interface systems and unraveling complex mechanisms underlying human perception. This survey comprehensively examines the emerging field of brain-conditional multimodal synthesis, termed AIGC-brain, to delineate the current landscape and future directions. To begin, related neuroimaging datasets and generative models are introduced as the foundation of AIGC-brain decoding and analysis. Next, we present a comprehensive taxonomy according to AIGC-brain methodologies, followed by task-specific representative work and implementation details to facilitate in-depth comparison and analysis. Quality assessments are then introduced for both qualitative and quantitative evaluation. Finally, this survey explores insights gained, outlining current challenges and prospects of AIGC-brain. As a pioneering survey, this article paves the way for future advances in AIGC-brain research. © 2020 IEEE.

关键词： Taxonomies

来源：评论

学校读者我要写书评

暂无评论

GI-Grasp: Target-Oriented 6DoF Grasping Strategy with Grasp Intuition Based on Vision-Language models 17th

GI-Grasp: Target-Oriented 6DoF Grasping Strategy with Grasp ...

引用

17th international conference on Intelligent robotics and Applications

作者： Jia, Tong Zhang, Haiyu Yang, Guowei Liu, Yizhe Wang, Hao Guo, Shiyi Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Peoples R China

ISBN: (纸本)9789819607730;9789819607747

Robot grasping is widely recognized as a crucial component of robotics activities. Several deep learning based grasping algorithms for planar and 6-degree-of-freedom have been presented, and they have produced good results in simulation and real world. However, when these algorithms do grasping posture estimation, their projected grasping poses may not always make sense for the grasping site, even if they cover the item under consideration. These algorithms tend to focus on the thing as a whole and perform activities that differ significantly from human behavior. To that end, we propose our GI-Grasp, a novel strategy that allows the robot to perceive the object to be grasped at a finer scale by introducing vision-language models (VLMs) to determine which part of the object is more suitable for grasping, guiding the robot to act like a human. First, we segment the RGB images of the grasping scene into instances in order to detect and localize the items to be clutched. Secondly, we provide the robot with a priori knowledge of the objects to be grasped through VLMs to help the robot understand the compositional details of the objects to be grasped and identify the spatial constraints related to the grasping task. Finally, acceptable position prediction is combined with the grasping algorithm to improve the robot's grasping accuracy. Our real-world experiments have proven that GI-Grasp of object features assists robots in grasping items in a more human like (and reasonable) style, increasing the success rate of grasping.

关键词： Robotic Grasping VLMs Embodied Intelligence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：