We introduce Riemannian Flow Matching Policies (RFMP), a novel model for learning and synthesizing robot sensorimotor policies. RFMP leverages the efficient training and inference capabilities of flow matching methods...
详细信息
Video moment retrieval and highlight detection are two highly valuable tasks in video understanding, but until recently they have been jointly studied. Although existing studies have made impressive advancement recent...
详细信息
This paper introduces an innovative multi-agent path finding (MAPF) system specifically designed for navigating multi-Ackerman robotic systems in intricate environments. The Mars Planner, the proposed solution, enhanc...
详细信息
ISBN:
(数字)9798331505929
ISBN:
(纸本)9798331505936
This paper introduces an innovative multi-agent path finding (MAPF) system specifically designed for navigating multi-Ackerman robotic systems in intricate environments. The Mars Planner, the proposed solution, enhances path planning by tackling collision-free path challenges encountered by groups of intelligent agents. Our contributions include the development of two key algorithms: the Fast Batch Path Finding (FBPF) and the Batch Spatio-Temporal Path Refinement (BSTPR). FBPF utilizes a hybrid A* approach to generate preliminary coarse paths within free configuration spaces, while BSTPR refines these paths using topological homotopy strategies to optimize time allocation and effectively resolve internal conflicts. Through simulations and physical experiments, we demonstrate significant enhancements in computational efficiency and path quality compared to existing methods. In conclusion, the Mars Planner stands as an efficient solution capable of managing large-scale complexity in real-world applications. It offers a robust and scalable framework suitable for diverse environments and scenarios.
Surveillance cameras play a pivotal role across various domains, encompassing public safety, crime deterrence, and facility maintenance. Nevertheless, these systems entail certain limitations, including high costs, se...
详细信息
Deep learning-based models are at the top of most driver observation benchmarks due to their remarkable accuracies but come with a high computational cost, while the resources are often limited in real-world driving s...
Deep learning-based models are at the top of most driver observation benchmarks due to their remarkable accuracies but come with a high computational cost, while the resources are often limited in real-world driving scenarios. This paper presents a lightweight framework for resource- efficient driver activity recognition. We enhance 3D MobileNet, a speed-optimized neural architecture for video classification, with two paradigms for improving the trade-off between model accuracy and computational efficiency: knowledge distillation and model quantization. Knowledge distillation prevents large drops in accuracy when reducing the model size by harvesting knowledge from a large teacher model (I3D) via soft labels instead of using the original ground truth. Quantization further drastically reduces the memory and computation requirements by representing the model weights and activations using lower precision integers. Extensive experiments on a public dataset for in-vehicle monitoring during autonomous driving show that our proposed framework leads to an 3- fold reduction in model size and 1.4-fold improvement in inference time compared to an already speed-optimized architecture. Our code is available at https://***/calvintanama/qd-driver-activity-reco.
High-precision simultaneous localization and mapping (SLAM) is one of the core technologies of unmanned driving. LiDAR-based SLAM algorithms are often complex and computationally intensive, and usually are deployed on...
High-precision simultaneous localization and mapping (SLAM) is one of the core technologies of unmanned driving. LiDAR-based SLAM algorithms are often complex and computationally intensive, and usually are deployed on high performance CPU or GPU computing architecture with high power consumption and low energy efficiency ratio, which is not conducive to vehicle-level applications. In this paper, we design and implement a low power CPU and FPGA hybrid computing architecture for accelerating the key algorithm of LiDAR-based localization scheme. More specifically, we propose a software and hardware co-design strategy: (1) we first propose chain representation as a new type of map representation, which uses the depth discontinuity region as the segmentation location to segment the point cloud data. Our method not only reduces noise issues for down-sampling operation in point cloud representation, but also has the same computational and storage overhead as point cloud representation. (2) We further exploit the inherent parallelism in the algorithms to design a pipeline hardware architecture, which can effectively improve the speed of the algorithm in the embedded platform. Deployed on the Xilinx ZCU102 platform, our system achieves 24.4x and 3.2x speedups compared to the ARM Cortex A53 processor and the Intel i7-10700 processor, respectively, at 4.204W power consumption without severely degrading the final output quality.
In the era of big data, data trading significantly enhances data-driven technologies by facilitating data sharing. Despite the clear advantages often experienced by data users when incorporating multiple sources, the ...
详细信息
Contrastive Learning (CL) has emerged as one of the most successful paradigms for unsupervised visual representation learning, yet it often depends on intensive manual data augmentations. With the rise of generative m...
详细信息
Auto data augmentation has emerged as a promising alternative to the laborious manual parameter tuning involved in data augmentation policies. However, the existing approaches have limitations in terms of their applic...
详细信息
Current Scene Change Detection(SCD) methods are widely used in various subject areas, with detection granularity mostly limited to pixel-level. However, for certain practical applications such as garbage detection and...
Current Scene Change Detection(SCD) methods are widely used in various subject areas, with detection granularity mostly limited to pixel-level. However, for certain practical applications such as garbage detection and traffic monitoring, the overall changes of object-level instances are more concerned so that fine-grained results may not be necessary, incurring excessive computational redundancy and insufficient real-time performance. To address the issue, we propose a one-stage object-level change detection framework named Siamese Center-Based Detector with Transformer and Feature Fusion (SCTF-Det), aiming at using less computing resources while still obtaining object-level change information, such as appearance or disappearance of objects. We adopt Siamese Vision Transformer to efficiently capture global semantic features and design differential feature fusion and multi-scale fusion to better fuse the features coming from image pairs. Instead of using a segmentation head like most SCD methods, we use a detection head to capture changed objects or regions. Moreover, we introduce a gating mechanism in image pairs and automatically mark the bounding box on the corresponding “Appear” change region. The experiments are conducted on VL-CMU-CD and CDNet2014 datasets, with Fl scores of 78.6% and 83.6% respectively. Our SCTF-Det substantially improves inference speed by 3–5 times compared to the existing methods.
暂无评论