As the basis of multi-sensor fusion, accurate extrinsic calibration among multi-sensors is vital for hetero-geneous information fusion. However, most existing methods only focus on the calibration between two specific...
详细信息
Automatic target recognition (ATR) plays a critical role in tasks such as navigation and surveillance, where safety and accuracy are paramount. In extreme use cases, such as military applications, these factors are of...
详细信息
To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., object grounding) and graspi...
To perform household tasks, assistive robots receive commands in the form of user language instructions for tool manipulation. The initial stage involves selecting the intended tool (i.e., object grounding) and grasping it in a task-oriented manner (i.e., task grounding). Nevertheless, prior researches on visual-language grasping (VLG) focus on object grounding, while disregarding the fine-grained impact of tasks on object grasping. Task-incompatible grasping of a tool will inevitably limit the success of subsequent manipulation steps. Motivated by this problem, this paper proposes GraspCLIP, which addresses the challenge of task grounding in addition to object grounding to enable task-oriented grasp prediction with visual-language inputs. Evaluation on a custom dataset demonstrates that GraspCLIP achieves superior performance over established baselines with object grounding only. The effectiveness of the proposed method is further validated on an assistive robotic arm for grasping previously unseen kitchen tools given the task specification. Our presentation video is available at: https://***/watch?v=e1wfYQPeAXU.
Control barrier functions (CBFs) are widely used in safety-critical controllers. However, constructing a valid CBF is challenging, especially under nonlinear or non-convex constraints and for high relative degree syst...
详细信息
ISBN:
(数字)9781665467612
ISBN:
(纸本)9781665467629
Control barrier functions (CBFs) are widely used in safety-critical controllers. However, constructing a valid CBF is challenging, especially under nonlinear or non-convex constraints and for high relative degree systems. Meanwhile, finding a conservative CBF that only recovers a portion of the true safe set is usually possible. In this work, starting from a "conservative" handcrafted CBF (HCBF), we develop a method to find a CBF that recovers a reasonably larger portion of the safe set. Since the learned CBF controller is not guaranteed to be safe during training iterations, we use a model predictive controller (MPC) to ensure safety during training. Using the collected trajectory data containing safe and unsafe interactions, we train a neural network to estimate the difference between the HCBF and a CBF that recovers a closer solution to the true safe set. With our proposed approach, we can generate safe controllers that are less conservative and computationally more efficient. We validate our approach on two systems: a second-order integrator and a ball-on-beam.
Effectively classifying advertising images is crucial in targeting the right audience and maximizing marketing performance. To address this problem, this paper presents a multi-label advertising image classification s...
详细信息
Effectively classifying advertising images is crucial in targeting the right audience and maximizing marketing performance. To address this problem, this paper presents a multi-label advertising image classification study using popular deep-learning architectures. First, we compile a dedicated dataset for this task and evaluate the performance of traditional deep learning-based models based on the convolutional neural network (CNN) and vision transformer architectures. To ensure the quality of dataset annotations, we introduce an extended Krippendorf’s Alpha (α) method based on the Jaccard index to provide a reliable measure of inter-annotation agreement which can address the missing annotations and multiple labels to establish the dataset’s annotation consistency. Our results demonstrate that transformer-based architectures like ViT and Swin outperform the CNN-based model’s baseline and differential learning rate settings. Through the visualization analysis of saliency maps, we gain insights into the model’s decision-making processes and identify the factors influencing their predictions. Furthermore, we assess the impact of annotation quality on model performance, comparing models trained on different annotation reliability levels. Our results indicate that higher annotation consistency, as quantified by α-Jaccard, leads to improved model performance, emphasizing the importance of high-quality datasets in advertising image classification. Beyond traditional deep learning models, we explore the effectiveness of vision language models (VLMs) in this task by employing prompt engineering and comparing their performance with fine-tuned deep learning models. Our findings indicate that while VLMs provide richer contextual annotations, they suffer from over-classification tendencies, subjective biases, and significantly higher computational costs. In contrast, deep learning models remain a more efficient and scalable solution for structured, large-scale advertising classi
The investigation and development of space-based food production systems are essential to improve the reliability and availability of fresh sustenance for astronauts. With their compact size and low-cost production, C...
The investigation and development of space-based food production systems are essential to improve the reliability and availability of fresh sustenance for astronauts. With their compact size and low-cost production, CubeSats can provide a unique platform for plant-based in-space experiments. Additionally, the combination of CubeSats and computervision can allow for monitoring the health of plants during their growth cycles. This paper investigates the electronics and data handling of a crop growth module, otherwise referred to as an environmental monitoring and control subsystem (EMCS), as well as the integration of computervision techniques for plant growth and development. Using the Otsu thresholding and holistically-nested edge detection algorithms, image segmentation and edge detection were performed, respectively. A support vector machine (SVM) was also employed to classify foliage and provide feedback on the plant’s health. The results from the system show that the computervision approach can accurately predict the health of the plants based on color and texture. This study builds a foundation for future plant health monitoring research in deep space environments.
Using underwater robots instead of humans for the inspection of coastal piers can enhance efficiency while reducing risks. A key challenge in performing these tasks lies in achieving efficient and rapid path planning ...
详细信息
Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods i...
详细信息
In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target dom...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pretraining to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.
Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy...
详细信息
ISBN:
(数字)9798350377705
ISBN:
(纸本)9798350377712
Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy to capture spectral signatures from objects and register them with a 3D point cloud. Our methodology first takes a 3D scan of an object generated by a time-of-flight depth camera and decomposes the object into a series of planned viewpoints covering the surface. We generate motion plans for a robot manipulator and end-effector to visit these viewpoints while maintaining a fixed distance and surface normal. This process is enabled by the spherical motion of the end-effector and ensures maximal spectral signal quality. By continuously acquiring surface reflectance values as the end-effector scans the target object, the autonomous system develops a four-dimensional model of the target object: position in an R
3
coordinate frame, and a reflectance vector denoting the associated spectral signature. We demonstrate this system in building spectral-spatial object profiles of increasingly complex geometries. We show the proposed system and spectral acquisition planning produce more consistent spectral signals than naïve point scanning strategies. Our work represents a significant step towards high-resolution spectral-spatial sensor fusion for automated quality assessment.
暂无评论