Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.
Model-based approaches for planning and control for bipedal locomotion have a long history of success. It can provide stability and safety guarantees while being effective in accomplishing many locomotion tasks. Model...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Model-based approaches for planning and control for bipedal locomotion have a long history of success. It can provide stability and safety guarantees while being effective in accomplishing many locomotion tasks. Model-free reinforcement learning, on the other hand, has gained much popularity in recent years due to computational advancements. It can achieve high performance in specific tasks, but it lacks physical interpretability and flexibility in re-purposing the policy for a different set of tasks. For instance, we can initially train a neural network (NN) policy using velocity commands as inputs. However, to handle new task commands like desired hand or footstep locations at a desired walking velocity, we must retrain a new NN policy. In this work, we attempt to bridge the gap between these two bodies of work on a bipedal platform. We formulate a model-based reinforcement learning problem to learn a reduced-order model (ROM) within a model predictive control (MPC). Results show a 49% improvement in viable task region size and a 21% reduction in motor torque cost. All videos and code are available at https://***/view/ymchen/research/rl-for-roms.
Visual SLAM is an essential tool in diverse applications such as robot perception and extended reality, where feature-based methods are prevalent due to their accuracy and robustness. However, existing methods employ ...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Visual SLAM is an essential tool in diverse applications such as robot perception and extended reality, where feature-based methods are prevalent due to their accuracy and robustness. However, existing methods employ either hand-crafted or solely learnable point features and are thus limited by the feature attributes. In this paper, we propose incorporating hybrid point features efficiently into a single system. By integrating hand-crafted and learnable features, we seek to capitalize on their complementary attributes in both key-point identification and descriptor expressiveness. To this purpose, we design a pre-processing module, which includes extraction, inter-class processing, and post-processing of hybrid point features. We present an efficient matching approach to exclusively perform the data association within the same class of features. Moreover, we design a Hybrid Bag-of-Words (H-BoW) model to deal with hybrid point features in matching and loop-closure-detection. By integrating the proposed framework into a modern feature-based system, we introduce HPF-SLAM. We evaluate the system on EuRoC-MAV and TUM-RGBD benchmarks. The experimental results show that our method consistently surpasses the baseline at comparable speed.
A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination. In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered. See our project website(4) for prompts, videos, and code.
Wildfires not only pose a significant threat to human life and property but also have far-reaching impacts on communities and ecosystems. Effective prevention and mitigation strategies rely on accurate prediction of t...
详细信息
ISBN:
(纸本)9798331518509;9798331518493
Wildfires not only pose a significant threat to human life and property but also have far-reaching impacts on communities and ecosystems. Effective prevention and mitigation strategies rely on accurate prediction of the path of these fires. This paper proposes the utilization of data obtained from Unmanned Aerial Vehicles (UAVs) to develop predictive models for fire spread. A comprehensive dataset is presented that includes key environmental variables that have been meticulously captured using these advanced technologies. The dataset comprises images from which essential features for predicting fire spread have been extracted. The method detailed in this article has been used to identify and incorporate crucial factors such as plant density, wind direction and speed, humidity, and geographical features. These key factors are then used to predict the spread of fires using Machine Learning (ML) techniques. After thorough study and comparison, AdaBoost and Random Forest (RF) demonstrate superior predictive capabilities. Evaluation metrics such as Mean Absolute Error (MAE) and Mean Squared Error (MSE) confirm the high accuracy and reliability of the proposed approach, achieving R-squared (R-2) values above 0.98. By combining advanced technological tools with analytical methodologies, this approach has the potential to enhance fire suppression and management, safeguarding lives and assets.
When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a model precondition. Empirical real-world traj...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a model precondition. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques. More material can be found on our project website: https://***/view/active-mde.
Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigat...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigated. However, existing approaches encounter difficulties in long-horizon goal-conditioned tasks due to the intricate compositional structure, which requires decision-making for a sequence of sub-steps and understanding of inherent dynamics of goal-reaching tasks. In this paper, we propose a new learning-based framework by leveraging the strong reasoning capability of the GPT-based architecture to automate surgical robotic tasks. The key to our approach is developing a goal-conditioned decision transformer to achieve sequential representations with goal-aware future indicators in order to enhance temporal reasoning. Moreover, considering to exploit a general understanding of dynamics inherent in manipulations, thus making the model's reasoning ability to be task-agnostic, we also design a cross-task pretraining paradigm that uses multiple training objectives associated with data from diverse tasks. We have conducted extensive experiments on 10 tasks using the surgical robot learning simulator SurRoL [1]. The results show that our new approach achieves promising performance and task versatility compared to existing methods. The learned trajectories can be deployed on the da Vinci Research Kit (dVRK) for validating its practicality in real surgical robot settings. Our project website is at: https://***/SurRoL.
In minimally invasive endovascular interventional surgery, guidewire navigation is an indispensable process. However, even experienced physicians often encounter difficulties in manually manipulating the guidewire for...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
In minimally invasive endovascular interventional surgery, guidewire navigation is an indispensable process. However, even experienced physicians often encounter difficulties in manually manipulating the guidewire for branch selection, while also facing the risk of radiation exposure. In this study, we investigated robotic autonomous guidewire navigation methods. An electromagnetic system was used to track the real-time position and orientation of the guidewire tip, and a state space representing the guidewire within the vascular environment was constructed to guide the robot in precise guidewire manipulation. Experimental results demonstrated that the proposed trial-and-error and centerline-guided methods successfully completed navigation tasks in a static environment, outperforming human navigation performance in terms of trajectory smoothness, trajectory length, and incorrect branch entry counts. For dynamic environment navigation, dynamic time warping (DTW), a technique for measuring the similarity between two temporal sequences, was integrated into the centerline-guided method. The proposed approaches eliminate the need for visual feedback and thereby minimizing the risk of radiation exposure for both patients and medical staff present in the operating room during the procedure.
Object tracking is central to robot perception and scene understanding, allowing robots to parse a video stream in terms of moving objects with names. Tracking-by-detection has long been a dominant paradigm for object...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Object tracking is central to robot perception and scene understanding, allowing robots to parse a video stream in terms of moving objects with names. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories [1, 2]. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This raises the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking? In this paper, we combine an open-vocabulary detector [3], segmenter [4], and dense optical flow estimator [5], into a model that tracks and segments any object in 2D videos. Given a monocular video input, our method predicts object and part mask tracks with associated language descriptions, rebuilding the pipeline of Tractor [6] with modern large pre-trained models for static image detection and segmentation: we detect open-vocabulary object instances and propagate their boxes from frame to frame using a flow-based motion model, refine the propagated boxes with the box regression module of the visual detector, and prompt an open-world segmenter with the refined box to segment the objects. We decide the termination of an object track based on the objectness score of the propagated boxes as well as forward-backward optical flow consistency. We re-identify objects across occlusions using deep feature matching. We show that our model achieves strong performance on multiple established benchmarks [7, 8, 9, 10], and can produce reasonable tracks in manipulation data [11]. In particular, our model outperforms previous state-of-the-art in UVO and BURST, benchmarks for open-world object tracking and segmentation, despite never being explicitly trained for tracking. We hope that our approach can serve as a simple and extensible framework for future research and enable imitation learning from videos with unconventional objects.
In dynamic operational environments, particularly in collaborative robotics, the inevitability of failures necessitates robust and adaptable recovery strategies. Traditional automated recovery strategies, while effect...
详细信息
ISBN:
(纸本)9798350358513;9798350358520
In dynamic operational environments, particularly in collaborative robotics, the inevitability of failures necessitates robust and adaptable recovery strategies. Traditional automated recovery strategies, while effective for predefined scenarios, often lack the flexibility required for on-the-fly task management and adaptation to expected failures. Addressing this gap, we propose a novel approach that models recovery behaviors as adaptable robotic skills, leveraging the Behavior Trees and Motion Generators (BTMG) framework for policy representation. This approach distinguishes itself by employing reinforcement learning (RL) to dynamically refine recovery behavior parameters, enabling a tailored response to a wide array of failure scenarios with minimal human intervention. We assess our methodology through a series of progressively challenging scenarios within a peg-in-a-hole task, demonstrating the approach's effectiveness in enhancing operational efficiency and task success rates in collaborative robotics settings. We validate our approach using a dual-arm KUKA robot.
暂无评论