检索结果-内蒙古大学图书馆

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

学校读者我要写书评

暂无评论

Unsupervised Video Domain Adaptation with Masked Pre-Trainin...

Conference on computer vision and Pattern Recognition (CVPR)

作者： Arun Reddy William Paul Corban Rivera Ketul Shah Celso M. de Melo Rama Chellappa Johns Hopkins University Applied Physics Laboratory Department of Electrical & Computer Engineering Johns Hopkins University DEVCOM U.S. Army Research Laboratory

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pretraining to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.

关键词： Representation learning Adaptation models computer vision Computational modeling Collaboration Benchmark testing Data models

PROSPECT: Precision Robot Spectroscopy Exploration and Characterization Tool

学校读者我要写书评

暂无评论

PROSPECT: Precision Robot Spectroscopy Exploration and Chara...

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Nathaniel Hanson Gary Lvov Vedant Rautela Samuel Hibbard Ethan Holand Charles DiMarzio Taşkın Padır Institute for Experiential Robotics Lincoln Laboratory Massachusetts Institute of Technology Lexington Massachusetts USA Robotics Institute Carnegie Mellon University Pittsburgh Pennsylvania USA Electrical and Computer Engineering Department Northeastern University Boston Massachusetts USA

ISBN: (数字)9798350377705

ISBN: (纸本)9798350377712

Near Infrared (NIR) spectroscopy is widely used in industrial quality control and automation to test the purity and grade of items. In this research, we propose a novel sensorized end effector and acquisition strategy to capture spectral signatures from objects and register them with a 3D point cloud. Our methodology first takes a 3D scan of an object generated by a time-of-flight depth camera and decomposes the object into a series of planned viewpoints covering the surface. We generate motion plans for a robot manipulator and end-effector to visit these viewpoints while maintaining a fixed distance and surface normal. This process is enabled by the spherical motion of the end-effector and ensures maximal spectral signal quality. By continuously acquiring surface reflectance values as the end-effector scans the target object, the autonomous system develops a four-dimensional model of the target object: position in an R 3 coordinate frame, and a reflectance vector denoting the associated spectral signature. We demonstrate this system in building spectral-spatial object profiles of increasingly complex geometries. We show the proposed system and spectral acquisition planning produce more consistent spectral signals than naïve point scanning strategies. Our work represents a significant step towards high-resolution spectral-spatial sensor fusion for automated quality assessment.

关键词： Reflectivity Point cloud compression Geometry Spectroscopy Three-dimensional displays Robot kinematics Robot sensing systems End effectors Planning Surface treatment

Augmenting Efficient Real-time Surgical Instrument Segmentation in Video with Point Tracking and Segment Anything

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wu, Zijian Schmidt, Adam Kazanzides, Peter Salcudean, Septimiu E. Robotics and Control Laboratory Department of Electrical and Computer Engineering The University of British Columbia VancouverBCV6T 1Z4 Canada Department of Computer Science Johns Hopkins University BaltimoreMD21218 United States

The Segment Anything Model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically. In this study, we address these limitations by adopting lightweight SAM variants to meet the efficiency requirement and employing fine-tuning techniques to enhance their generalization in surgical scenes. Recent advancements in Tracking Any Point (TAP) have shown promising results in both accuracy and efficiency, particularly when points are occluded or leave the field of view. Inspired by this progress, we present a novel framework that combines an online point tracker with a lightweight SAM model that is fine-tuned for surgical instrument segmentation. Sparse points within the region of interest are tracked and used to prompt SAM throughout the video sequence, providing temporal consistency. The quantitative results surpass the state-of-the-art semi-supervised video object segmentation method XMem on the EndoVis 2015 dataset with 84.8 IoU and 91.0 Dice. Our method achieves promising performance that is comparable to XMem and transformer-based fully supervised segmentation methods on ex vivo UCL dVRK and in vivo CholecSeg8k datasets. In addition, the proposed method shows promising zero-shot generalization ability on the label-free STIR dataset. In terms of efficiency, we tested our method on a single GeForce RTX 4060/4090 GPU respectively, achieving an over 25/90 FPS inference speed. Code is available at: https://***/wuzijian1997/SIS-PT-SAM © 2024, CC BY.

关键词： Surgical equipment

GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Liu, Shaonan Chen, Wenting Liu, Jie Luo, Xiaoling Shen, Linlin Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University China Department of Electrical Engineering City University of Hong Kong Hong Kong AI Research Center for Medical Image Analysis and Diagnosis Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China

Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians’ ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical radiology report settings. To understand the attention allocation and cognitive behavior of radiologists during the medical image interpretation process, we propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns throughout the image interpretation process. It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching. Within the context-awareness module, we achieve intricate multimodal registration by establishing connections between medical reports and images. Subsequently, for a more accurate simulation of genuine visual search behavior patterns, we introduce a visual behavior graph structure, capturing such behavior through high-order relationships (edges) between gaze points (nodes). To maintain the authenticity of visual behavior, we devise a visual behavior-matching approach, adjusting the high-order relationships between them by matching the graph constructed from real and estimated gaze points. Extensive experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods and its strong generalizability, which also provides a new direction for the effective utilization of diverse modalities in medical image interpretation and enhances the interpretability of models in the field of medical imaging. https://***/Tiger-SN/GEM. Copyright © 2024, The Authors. All rights reserved.

关键词： Eye movements

Autonomous Multiple-Trolley Collection System with Nonholonomic Robots: Design, Control, and Implementation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Xie, Peijia Xia, Bingyi Hu, Anjun Zhao, Ziqi Meng, Lingxiao Sun, Zhirui Gao, Xuheng Wang, Jiankun Meng, Max Q.-H. Shenzhen Key Laboratory of Robotics Perception and Intelligence Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China The Jiaxing Research Institute Southern University of Science and Technology Jiaxing China The Department of Electronic Engineering The Chinese University of Hong Kong Hong Kong The Department of Electrical and Computer Engineering The University of Alberta Canada

The intricate and multi-stage task in dynamic public spaces like luggage trolley collection in airports presents both a promising opportunity and an ongoing challenge for automated service robots. Previous research has primarily focused on handling a single trolley or individual functional components, creating a gap in providing cost-effective and efficient solutions for practical scenarios. In this paper, we propose a mobile manipulation robot incorporated with an autonomy framework for the collection and transportation of multiple trolleys that can significantly enhance operational efficiency. We address the key challenges in the trolley collection problem through the novel design of the mechanical system and the vision-based control strategy. We design a lightweight manipulator and docking mechanism, optimized for the sequential stacking and transportation of multiple trolleys. Additionally, based on the Control Lyapunov Function and Control Barrier Function, we propose a novel vision-based control with the online Quadratic Programming which significantly improves the accuracy and efficiency of the collection process. The practical application of our system is demonstrated in real-world scenarios, where it successfully executes multiple-trolley collection tasks. Copyright © 2024, The Authors. All rights reserved.

关键词： Quadratic programming

NDD: A 3D Point Cloud Descriptor Based on Normal Distribution for Loop Closure Detection

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zhou, Ruihao He, Li Zhang, Hong Lin, Xubin Guan, Yisheng The Department of Electromechanical Engineering Guangdong University of Technology China The Department of Electronic and Electrical Engineering Southern University of Science and Technology China Shenzhen Key Laboratory of Robotics and Computer Vision China

Loop closure detection is a key technology for long-term robot navigation in complex environments. In this paper, we present a global descriptor, named Normal Distribution Descriptor (NDD), for 3D point cloud loop closure detection. The descriptor encodes both the probability density score and entropy of a point cloud as the descriptor. We also propose a fast rotation alignment process and use correlation coefficient as the similarity between descriptors. Experimental results show that our approach outperforms the state-of-the-art point cloud descriptors in both accuracy and efficency. The source code is available and can be integrated into existing LiDAR odometry and mapping (LOAM) systems. © 2022, CC BY.

关键词： Normal distribution

Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance

学校读者我要写书评

暂无评论

Visual Intelligence 2024年第1期2卷 392-408页

作者： Zhangwei Gao Zhe Chen Erfei Cui Yiming Ren Weiyun Wang Jinguo Zhu Hao Tian Shenglong Ye Junjun He Xizhou Zhu Lewei Lu Tong Lu Yu Qiao Jifeng Dai Wenhai Wang Shanghai AI Laboratory Shanghai200232China School of Electronic Information and Electrical Engineering Shanghai Jiao Tong UniversityShanghai200240China School of Computer Science Nanjing UniversityNanjing210023China Tsinghua Shenzhen International Graduate School Tsinghua UniversityShenzhen518055China School of Computer Science Fudan UniversityShanghai200433China SenseTime Research Shanghai200233China Department of Electronic Engineering Tsinghua UniversityBeijing100084China Beijing National Research Center for Information Science and Technology Department of AutomationTsinghua UniversityBeijing100084China Department of Information Engineering The Chinese University of Hong KongHong Kong999077China

Multi-modal large language models(MLLMs)have demonstrated impressive performance in vision-language tasks across a wide range of ***,the large model scale and associated high computational cost pose significant challenges for training and deploying MLLMs on consumer-grade GPUs or edge devices,thereby hindering their widespread *** this work,we introduce Mini-InternVL,a series of MLLMs with parameters ranging from 1 billion to 4 billion,which achieves 90% of the performance with only 5% of the *** significant improvement in efficiency and effectiveness makes our models more accessible and applicable in various real-world *** further promote the adoption of our models,we are developing a unified adaptation framework for Mini-InternVL,which enables our models to transfer and outperform specialized models in downstream tasks,including autonomous driving,medical image processing,and remote *** believe that our models can provide valuable insights and resources to advance the development of efficient and effective MLLMs.

关键词： Lightweight multi-modal large language model vision-language model Knowledge distillation Visual instruction tuning

IoBT-MAX: a Multimodal Analytics eXperimentation Testbed for IoBT research

学校读者我要写书评

暂无评论

IoBT-MAX: a Multimodal Analytics eXperimentation Testbed for...

MILCOM, Military Communications Conference

作者： Benjamin M. Marlin Niranjan Suri Shiwei Fang Mani B. Srivastiva Colin Samplawski Ziqi Wang Maggie Wigness College of Information and Computer Sciences University of Massachusetts Amherst DEVCOM Army Research Laboratory Department of Computer & Cyber Sciences Augusta University Department of Electrical and Computer Engineering University of California Los Angeles

This paper describes the development and implementation of IoBT-MAX, a multimodal analytics experimentation testbed designed to support research and evaluation of Internet of Battlefield Things (IoBT) technologies. The testbed consists of a distributed set of edge nodes with multimodal sensing and compute capabilities coupled with a high-precision GPS localization system, and a remote monitoring and control platform. The testbed is designed to support research on multiple analytic tasks including object classification, object detection, multi-object tracking, data compression, and communication efficient inference and scheduling. The testbed has been deployed at the robotics research Collaboration Campus (R2C2), a DEVCOM Army research laboratory (ARL) facility, and is a key research instrumentation project of ARL’s Internet of Battlefield Things Collaborative research Alliance.

关键词：

research on Biomimetic Design Methods for Humanoid Robot Thigh

学校读者我要写书评

暂无评论

Research on Biomimetic Design Methods for Humanoid Robot Thi...

2023 IEEE International Conference on robotics and Biomimetics, ROBIO 2023

作者： Nie, Daming Xie, Anhuan Kong, Lingyu Zhang, Yu Zheng, Gang Fu, Yili Gu, Jason Zhejiang Laboratory Intelligent Robot Research Center Hangzhou311100 China Zhejiang University of Science and Technology School of Mechanical and Energy Engineering Hangzhou311100 China Dalhousie University Department of Electrical and Computer Engineering HalifaxB3M 1A2 Canada

ISBN: (纸本)9798350325706

Human bones have formed the preferred configuration for high-strength and lightweight after long-time evolution. Taking human's longest and strongest bone - the femur - as an example, it is consist of two characteristic layers, i.e. the substantia compacta and the substantia spongiosd. This article innovatively imitates the structural characteristics of human femur, the thigh of humanoid robot is designed in form of "variable thickness shell + variable density lattice". The thickness of shell and the density of lattice are adjusted by the initial stress distribution individually. Results show that the weight of shell and lattice of the thigh structure can be reduced by 20% under reasonable mapping relationship of "stress - shell thickness"and "stress - lattice rod diameter", while the structural stiffness meets the application requirements. Finally, the limiting factors of the "variable thickness shell + variable density lattice"structure designing approach are analyzed, and potential measures for optimizing the design method of the humanoid robot thigh in the future are described. © 2023 IEEE.

关键词：