Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods i...
详细信息
In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target dom...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pretraining to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.
Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy...
详细信息
ISBN:
(数字)9798350377705
ISBN:
(纸本)9798350377712
Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy to capture spectral signatures from objects and register them with a 3D point cloud. Our methodology first takes a 3D scan of an object generated by a time-of-flight depth camera and decomposes the object into a series of planned viewpoints covering the surface. We generate motion plans for a robot manipulator and end-effector to visit these viewpoints while maintaining a fixed distance and surface normal. This process is enabled by the spherical motion of the end-effector and ensures maximal spectral signal quality. By continuously acquiring surface reflectance values as the end-effector scans the target object, the autonomous system develops a four-dimensional model of the target object: position in an R
3
coordinate frame, and a reflectance vector denoting the associated spectral signature. We demonstrate this system in building spectral-spatial object profiles of increasingly complex geometries. We show the proposed system and spectral acquisition planning produce more consistent spectral signals than naïve point scanning strategies. Our work represents a significant step towards high-resolution spectral-spatial sensor fusion for automated quality assessment.
The Segment Anything Model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost li...
详细信息
Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians’ ocular movements during image interpretatio...
详细信息
The intricate and multi-stage task in dynamic public spaces like luggage trolley collection in airports presents both a promising opportunity and an ongoing challenge for automated service robots. Previous research ha...
详细信息
Loop closure detection is a key technology for long-term robot navigation in complex environments. In this paper, we present a global descriptor, named Normal Distribution Descriptor (NDD), for 3D point cloud loop clo...
详细信息
Multi-modal large language models(MLLMs)have demonstrated impressive performance in vision-language tasks across a wide range of ***,the large model scale and associated high computational cost pose significant challe...
详细信息
Multi-modal large language models(MLLMs)have demonstrated impressive performance in vision-language tasks across a wide range of ***,the large model scale and associated high computational cost pose significant challenges for training and deploying MLLMs on consumer-grade GPUs or edge devices,thereby hindering their widespread *** this work,we introduce Mini-InternVL,a series of MLLMs with parameters ranging from 1 billion to 4 billion,which achieves 90% of the performance with only 5% of the *** significant improvement in efficiency and effectiveness makes our models more accessible and applicable in various real-world *** further promote the adoption of our models,we are developing a unified adaptation framework for Mini-InternVL,which enables our models to transfer and outperform specialized models in downstream tasks,including autonomous driving,medical image processing,and remote *** believe that our models can provide valuable insights and resources to advance the development of efficient and effective MLLMs.
This paper describes the development and implementation of IoBT-MAX, a multimodal analytics experimentation testbed designed to support research and evaluation of Internet of Battlefield Things (IoBT) technologies. Th...
This paper describes the development and implementation of IoBT-MAX, a multimodal analytics experimentation testbed designed to support research and evaluation of Internet of Battlefield Things (IoBT) technologies. The testbed consists of a distributed set of edge nodes with multimodal sensing and compute capabilities coupled with a high-precision GPS localization system, and a remote monitoring and control platform. The testbed is designed to support research on multiple analytic tasks including object classification, object detection, multi-object tracking, data compression, and communication efficient inference and scheduling. The testbed has been deployed at the roboticsresearch Collaboration Campus (R2C2), a DEVCOM Army researchlaboratory (ARL) facility, and is a key research instrumentation project of ARL’s Internet of Battlefield Things Collaborative research Alliance.
Human bones have formed the preferred configuration for high-strength and lightweight after long-time evolution. Taking human's longest and strongest bone - the femur - as an example, it is consist of two characte...
暂无评论