检索结果-内蒙古大学图书馆

2022 IEEE International Conference on robotics and Biomimetics, ROBIO 2022

作者： Yuan, Songyu Xie, Tian Zhu, Shiqiang Chen, Yeheng Li, Yuehua Zheng, Tao Gu, Jason Research Center for Intelligent Robotics Research Institute of Interdis-ciplinary Innovation Zhejiang Lab Hangzhou311100 China Zhejiang Engineering Research Center for Intelligent Robotics Hangzhou311100 China Dalhousie University Department of Electrical and Computer Engineering HalifaxNSB3H 4R2 Canada

ISBN: (纸本)9781665481090

As the basis of multi-sensor fusion, accurate extrinsic calibration among multi-sensors is vital for hetero-geneous information fusion. However, most existing methods only focus on the calibration between two specific heteroge-neous sensors, such as camera-LiDAR, LiDAR-thermal, and etc. which may cause inevitable accumulation when applied to a multi-sensor system. To address this problem, a novel calibration target and a high-accuracy method are proposed to simultaneously calibrate the intrinsic and extrinsic of a 3D LiDAR, a thermal camera and a visible camera without any user intervention. The proposed method only relies on one calibration board, to calibrate all the intrinsic and extrinsic parameters in one-step, without any strict pose requirement or long-time optimization. Furthermore, 2D-3D corresponding features can be extracted with higher precision by considering sensor model, comparing to the common feature extraction methods. Experiments with Velodyne VLP-16 LiDAR, ZED camera and D843NT thermal camera demonstrate that the proposed method can complete calibration among three of them automatically. Finally, this paper also presents a target-based method of uniformly expressing the calibration errors among multi-sensor system, with a competitive performance against most state-of-the-art methods. © 2022 IEEE.

关键词： Calibration

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Scene Understanding for Automatic Target Recognition Using Large vision-Language Models

arXiv

引用

arXiv 2025年

作者： Ranasinghe, Yasiru Vibashan, V.S. Uplinger, James De Melo, Celso Patel, Vishal M. The Department of Electrical and Computer Engineering The Johns Hopkins University BaltimoreMD United States The DEVCOM Army Research Laboratory Adelphi United States

Automatic target recognition (ATR) plays a critical role in tasks such as navigation and surveillance, where safety and accuracy are paramount. In extreme use cases, such as military applications, these factors are often challenged due to the presence of unknown terrains, environmental conditions, and novel object categories. Current object detectors, including open-world detectors, lack the ability to confidently recognize novel objects or operate in unknown environments, as they have not been exposed to these new conditions. However, Large vision-Language Models (LVLMs) exhibit emergent properties that enable them to recognize objects in varying conditions in a zero-shot manner. Despite this, LVLMs struggle to localize objects effectively within a scene. To address these limitations, we propose a novel pipeline that combines the detection capabilities of open-world detectors with the recognition confidence of LVLMs, creating a robust system for zero-shot ATR of novel classes and unknown domains. In this study, we compare the performance of various LVLMs for recognizing military vehicles, which are often underrepresented in training datasets. Additionally, we examine the impact of factors such as distance range, modality, and prompting methods on the recognition performance, providing insights into the development of more reliable ATR systems for novel conditions and classes. © 2025, CC BY.

关键词： Automatic target recognition

来源：评论

学校读者我要写书评

暂无评论

Task-Oriented Grasp Prediction with Visual-Language Inputs

Task-Oriented Grasp Prediction with Visual-Language Inputs

引用

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Chao Tang Dehao Huang Lingxiao Meng Weiyu Liu Hong Zhang Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen China Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China Stanford University United States

To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., object grounding) and grasping it in a task-oriented manner (i.e., task grounding). Nevertheless, prior researches on visual-language grasping (VLG) focus on object grounding, while disregarding the fine-grained impact of tasks on object grasping. Task-incompatible grasping of a tool will inevitably limit the success of subsequent manipulation steps. Motivated by this problem, this paper proposes GraspCLIP, which addresses the challenge of task grounding in addition to object grounding to enable task-oriented grasp prediction with visual-language inputs. Evaluation on a custom dataset demonstrates that GraspCLIP achieves superior performance over established baselines with object grounding only. The effectiveness of the proposed method is further validated on an assistive robotic arm for grasping previously unseen kitchen tools given the task specification. Our presentation video is available at: https://***/watch?v=e1wfYQPeAXU.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning a Better Control Barrier Function

Learning a Better Control Barrier Function

引用

IEEE Conference on Decision and Control

作者： Bolun Dai Prashanth Krishnamurthy Farshad Khorrami Electrical & Computer Engineering Department Control/Robotics Research Laboratory Tandon School of Engineering New York University Brooklyn NY

ISBN: (数字)9781665467612

ISBN: (纸本)9781665467629

Control barrier functions (CBFs) are widely used in safety-critical controllers. However, constructing a valid CBF is challenging, especially under nonlinear or non-convex constraints and for high relative degree systems. Meanwhile, finding a conservative CBF that only recovers a portion of the true safe set is usually possible. In this work, starting from a "conservative" handcrafted CBF (HCBF), we develop a method to find a CBF that recovers a reasonably larger portion of the safe set. Since the learned CBF controller is not guaranteed to be safe during training iterations, we use a model predictive controller (MPC) to ensure safety during training. Using the collected trajectory data containing safe and unsafe interactions, we train a neural network to estimate the difference between the HCBF and a CBF that recovers a closer solution to the true safe set. With our proposed approach, we can generate safe controllers that are less conservative and computationally more efficient. We validate our approach on two systems: a second-order integrator and a ball-on-beam.

关键词： Training Heuristic algorithms Neural networks Predictive models Data collection Data models Safety

来源：评论

学校读者我要写书评

暂无评论

Multi-label advertising image classification using traditional deep neural networks and vision language models: dataset and annotation agreement method

引用

Multimedia Tools and Applications 2025年 1-30页

作者： Mir, Tatheer Hussain Wei, Jui-Cheng Simanjuntak, Mutiara Cheng, Wan-Shu Lee, Yi-Hsun Zhang, You-Chen Chen, Jia-Bin Dai, Hong-Jie Intelligent System Laboratory Department of Electrical Engineering National Kaohsiung University of Science and Technology Kaohsiung Taiwan Department of Computer Science and Information Management Providence University Taichung Taiwan Research and Development Department Linker AI Co. Ltd Taipei Taiwan School of Post-Baccalaureate Medicine College of Medicine Kaohsiung Medical University Kaohsiung Taiwan National Institute of Cancer Research National Health Research Institutes Tainan Taiwan Center for Big Data Research Kaohsiung Medical University Kaohsiung Taiwan

Effectively classifying advertising images is crucial in targeting the right audience and maximizing marketing performance. To address this problem, this paper presents a multi-label advertising image classification study using popular deep-learning architectures. First, we compile a dedicated dataset for this task and evaluate the performance of traditional deep learning-based models based on the convolutional neural network (CNN) and vision transformer architectures. To ensure the quality of dataset annotations, we introduce an extended Krippendorf’s Alpha (α) method based on the Jaccard index to provide a reliable measure of inter-annotation agreement which can address the missing annotations and multiple labels to establish the dataset’s annotation consistency. Our results demonstrate that transformer-based architectures like ViT and Swin outperform the CNN-based model’s baseline and differential learning rate settings. Through the visualization analysis of saliency maps, we gain insights into the model’s decision-making processes and identify the factors influencing their predictions. Furthermore, we assess the impact of annotation quality on model performance, comparing models trained on different annotation reliability levels. Our results indicate that higher annotation consistency, as quantified by α-Jaccard, leads to improved model performance, emphasizing the importance of high-quality datasets in advertising image classification. Beyond traditional deep learning models, we explore the effectiveness of vision language models (VLMs) in this task by employing prompt engineering and comparing their performance with fine-tuned deep learning models. Our findings indicate that while VLMs provide richer contextual annotations, they suffer from over-classification tendencies, subjective biases, and significantly higher computational costs. In contrast, deep learning models remain a more efficient and scalable solution for structured, large-scale advertising classi

关键词： Advertising image Convolution neural network Krippendorf’s Alpha Learning rate configuration Marketing effectiveness Multi-label classification vision Language Models vision transformer

来源：评论

学校读者我要写书评

暂无评论

A computer vision Approach to Plant Growth Monitoring in an Embedded CubeSat Module

A Computer Vision Approach to Plant Growth Monitoring in an ...

引用

IEEE Conference on Aerospace

作者： Alexis Lopez Olaoluwayimika Olugbenle Michael C.F. Bazzocchi Department of Mechanical and Aerospace Engineering Astronautics and Robotics Laboratory (ASTRO Lab) Clarkson University Potsdam NY USA Department of Electrical and Computer Engineering Astronautics and Robotics Laboratory (ASTRO Lab) Clarkson University Potsdam NY USA Department of Mechanical and Aerospace Engineering Astronautics and Robotics Laboratory (ASTRO Lab) Clarkson University Potsdam NY Department of Earth and Space Science and Engineering York University Toronto Canada

The investigation and development of space-based food production systems are essential to improve the reliability and availability of fresh sustenance for astronauts. With their compact size and low-cost production, CubeSats can provide a unique platform for plant-based in-space experiments. Additionally, the combination of CubeSats and computer vision can allow for monitoring the health of plants during their growth cycles. This paper investigates the electronics and data handling of a crop growth module, otherwise referred to as an environmental monitoring and control subsystem (EMCS), as well as the integration of computer vision techniques for plant growth and development. Using the Otsu thresholding and holistically-nested edge detection algorithms, image segmentation and edge detection were performed, respectively. A support vector machine (SVM) was also employed to classify foliage and provide feedback on the plant’s health. The results from the system show that the computer vision approach can accurately predict the health of the plants based on color and texture. This study builds a foundation for future plant health monitoring research in deep space environments.

关键词：

来源：评论

学校读者我要写书评

暂无评论

PierGuard: A Planning Framework for Underwater Robotic Inspection of Coastal Piers

arXiv

引用

arXiv 2025年

作者： Wang, Pengyu Lin, Hin Wang Li, Jialu Wang, Jiankun Shi, Ling Meng, Max Q.-H. Shenzhen Key Laboratory of Robotics Perception and Intelligence Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Hong Kong JJiaxing Research Institute Southern University of Science and Technology Jiaxing China Department of Electronic Engineering The Chinese University of Hong Kong in Hong Kong Hong Kong Department of Electrical and Computer Engineering University of Alberta Canada

Using underwater robots instead of humans for the inspection of coastal piers can enhance efficiency while reducing risks. A key challenge in performing these tasks lies in achieving efficient and rapid path planning within complex environments. Sampling-based path planning methods, such as Rapidly-exploring Random Tree* (RRT*), have demonstrated notable performance in high-dimensional spaces. In recent years, researchers have begun designing various geometry-inspired heuristics and neural network-driven heuristics to further enhance the effectiveness of RRT*. However, the performance of these general path planning methods still requires improvement when applied to highly cluttered underwater environments. In this paper, we propose PierGuard, which combines the strengths of bidirectional search and neural network-driven heuristic regions. We design a specialized neural network to generate high-quality heuristic regions in cluttered maps, thereby improving the performance of the path planning. Through extensive simulation and real-world ocean field experiments, we demonstrate the effectiveness and efficiency of our proposed method compared with previous research. Our method achieves approximately 2.6 times the performance of the state-of-the-art geometric-based sampling method and nearly 4.9 times that of the state-of-the-art learning-based sampling method. Our results provide valuable insights for the automation of pier inspection and the enhancement of maritime safety. The updated experimental video is available in the supplementary materials. Copyright © 2025, The Authors. All rights reserved.

关键词： Motion planning

来源：评论

学校读者我要写书评

暂无评论

GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

arXiv

引用

arXiv 2023年

作者： Tang, Chao Huang, Dehao Ge, Wenqi Liu, Weiyu Zhang, Hong Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen China Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China Institute for Robotics and Intelligent Machines Georgia Institute of Technology Atlanta United States

Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set. The effectiveness of GraspGPT is further validated in real-robot experiments. Our code, data, appendix, and video are publicly available at https://***/view/graspgpt. Copyright © 2023, The Authors. All rights reserved.

关键词： Zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

Unsupervised Video Domain Adaptation with Masked Pre-Trainin...

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： Arun Reddy William Paul Corban Rivera Ketul Shah Celso M. de Melo Rama Chellappa Johns Hopkins University Applied Physics Laboratory Department of Electrical & Computer Engineering Johns Hopkins University DEVCOM U.S. Army Research Laboratory

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pretraining to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.

关键词： Representation learning Adaptation models computer vision Computational modeling Collaboration Benchmark testing Data models

来源：评论

学校读者我要写书评

暂无评论

PROSPECT: Precision Robot Spectroscopy Exploration and Characterization Tool

PROSPECT: Precision Robot Spectroscopy Exploration and Chara...

引用

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Nathaniel Hanson Gary Lvov Vedant Rautela Samuel Hibbard Ethan Holand Charles DiMarzio Taşkın Padır Institute for Experiential Robotics Lincoln Laboratory Massachusetts Institute of Technology Lexington Massachusetts USA Robotics Institute Carnegie Mellon University Pittsburgh Pennsylvania USA Electrical and Computer Engineering Department Northeastern University Boston Massachusetts USA

ISBN: (数字)9798350377705

ISBN: (纸本)9798350377712

Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy to capture spectral signatures from objects and register them with a 3D point cloud. Our methodology first takes a 3D scan of an object generated by a time-of-flight depth camera and decomposes the object into a series of planned viewpoints covering the surface. We generate motion plans for a robot manipulator and end-effector to visit these viewpoints while maintaining a fixed distance and surface normal. This process is enabled by the spherical motion of the end-effector and ensures maximal spectral signal quality. By continuously acquiring surface reflectance values as the end-effector scans the target object, the autonomous system develops a four-dimensional model of the target object: position in an R 3 coordinate frame, and a reflectance vector denoting the associated spectral signature. We demonstrate this system in building spectral-spatial object profiles of increasingly complex geometries. We show the proposed system and spectral acquisition planning produce more consistent spectral signals than naïve point scanning strategies. Our work represents a significant step towards high-resolution spectral-spatial sensor fusion for automated quality assessment.

关键词： Reflectivity Point cloud compression Geometry Spectroscopy Three-dimensional displays Robot kinematics Robot sensing systems End effectors Planning Surface treatment

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：