Understanding actions of other agents increases the efficiency of autonomous mobile robots (AMRs) since they encompass intention and indicate future movements. We propose a new method that allows us to infer vehicle a...
详细信息
ISBN:
(数字)9798350384574
ISBN:
(纸本)9798350384581
Understanding actions of other agents increases the efficiency of autonomous mobile robots (AMRs) since they encompass intention and indicate future movements. We propose a new method that allows us to infer vehicle actions using a shallow image-based classification model. The actions are classified via bird’s-eye view scene crops, where we project the detections of a 3D object detection model onto a context map. We learn map context information and aggregate temporal sequence information without requiring object tracking. This results in a highly efficient classification model that can easily be deployed on embedded AMR hardware. To evaluate our approach, we create new large-scale synthetic datasets showing warehouse traffic based on real vehicle models and geometry.
While the majority of recent Multi-View Stereo Networks estimates a depth map per reference image, their performance is then only evaluated on the fused 3D model obtained from all images. This approach makes a lot of ...
While the majority of recent Multi-View Stereo Networks estimates a depth map per reference image, their performance is then only evaluated on the fused 3D model obtained from all images. This approach makes a lot of sense since ultimately the point cloud is the result we are mostly interested in. On the flip side, it often leads to a burdensome manual search for the right fusion parameters in order to score well on the public benchmarks. In this work, we tackle the aforementioned problem with HAMMER, a Hierarchical And Memory-efficient MVSNet with Entropy-filtered Reconstructions. We propose to learn a filtering mask based on entropy, which, in combination with a simple two-view geometric verification, is sufficient to generate high quality 3D models of any input scene. Distinct from existing works, a tedious manual parameter search for the fusion step is not required. Furthermore, we take several precautions to keep the memory requirements for our method very low in the training as well as in the inference phase. Our method only requires 6 GB of GPU memory during training, while 3.6 GB are enough to process 1920×1024 images during inference. Experiments show that HAMMER ranks amongst the top published methods on the DTU and Tanks and Temples benchmarks in the official metrics, especially when keeping the fusion parameters fixed.
State-of-the-art Multiple Object Tracking (MOT) approaches have shown remarkable performance when trained and evaluated on current benchmarks. However, these benchmarks primarily consist of clear weather scenarios, ov...
详细信息
Magnetic resonance imaging (MRI) is a potent diagnostic tool, but suffers from long examination times. To accelerate the process, modern MRI machines typically utilize multiple coils that acquire sub-sampled data in p...
详细信息
For efficient and safe autonomous driving, it is essential that autonomous vehicles can predict the motion of other traffic agents. While highly accurate, current motion prediction models often impose significant chal...
详细信息
Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on larg...
详细信息
The total generalized variation extends the total variation by incorporating higher-order smoothness. Thus, it can also suffer from similar discretization issues related to isotropy. Inspired by the success of novel d...
详细信息
Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computervision, with applications in areas such as augmented reality (AR) and medical imaging. This work explo...
详细信息
Stereo estimation has made many advancements in recent years with the introduction of deep-learning. However the traditional supervised approach to deep-learning requires the creation of accurate and plentiful ground-...
详细信息
Diffusion models have been successfully applied to many inverse problems, including MRI and CT reconstruction. Researchers typically re-purpose models originally designed for unconditional sampling without modificatio...
详细信息
暂无评论