In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alon...
详细信息
In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. To systematically generate such a physical-world attack, we propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. We evaluate our attack on MSF algorithms included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF algorithms. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system. We also evalu
Recently, significant progress has been achieved in analyzing the 3D point cloud with deeplearning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied ...
详细信息
ISBN:
(数字)9781728181288
ISBN:
(纸本)9781728181295
Recently, significant progress has been achieved in analyzing the 3D point cloud with deeplearning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deeplearning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.
Traditional neural architecture search (NAS) has a significant impact in computer vision by automatically designing network architectures for various tasks. In this paper, binarized neural architecture search (BNAS), ...
详细信息
Convolutional Neural Networks (CNNs) are producing state-of-the-art results in the object detection field. However, deep topologies of CNN are computationally intensive and typically require excessive resources (i.e. ...
详细信息
Convolutional Neural Networks (CNNs) are producing state-of-the-art results in the object detection field. However, deep topologies of CNN are computationally intensive and typically require excessive resources (i.e. high-end GPUs), which hinder their deployment on resource and power constrained UAVs. In this work, we present a high-throughput and power efficient quantized object detection network, QuantYOLO, which is based on the Tiny-YOLOv2 topology. We conduct a detailed exploration of precision and filter pruning vs. accuracy, throughput and power consumption trade-off for the object detection task. As a result of these explorations, we select a network with binarized weights and 4-bit activations (except the output layer), which is 21.8× smaller than the Tiny-YOLOv2 achieving a mean Average Precision (mAP) of 51.5% on the PASCAL-VOC dataset. Finally, we present an FPGA based accelerator, which achieves 1.6× higher throughput (FPS) and is 3.1× more power efficient as compared to prior FPGA architectures.
Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate depth completion as a one-stage end-to-end learning task, whic...
详细信息
Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance feature...
详细信息
Holistically understanding an object with its 3D movable parts is essential for visual models of a robot to interact with the world. For example, only by understanding many possible part dynamics of other vehicles (e....
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Holistically understanding an object with its 3D movable parts is essential for visual models of a robot to interact with the world. For example, only by understanding many possible part dynamics of other vehicles (e.g., door or trunk opening, taillight blinking for changing lane), a self-driving vehicle can be success in dealing with emergency cases. However, existing visual models tackle rarely on these situations, but focus on bounding box detection. In this paper, we fill this important missing piece in autonomous driving by solving two critical issues. First, for dealing with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to cars in real images. This allows us to directly edit the real images using the aligned 3D parts, yielding effective training data for learning robust deep neural networks (DNNs). Secondly, to benchmark the quality of 3D part understanding, we collected a large dataset in real driving scenario with cars in uncommon states (CUS), i.e. with door or trunk opened etc., which demonstrates that our trained network with edited images largely outperforms other baselines in terms of 2D detection and instance segmentation accuracy.
We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns...
详细信息
In this paper, we first present an arc based algorithm for fan-beam computed tomography (CT) reconstruction via applying Katsevich’s helical CT formula to 2D fan-beam CT reconstruction. Then, we propose a new weighti...
详细信息
Existing LiDAR-based 3D object detectors usually focus on the single-frame detection, while ignoring the spatiotemporal information in consecutive point cloud frames. In this paper, we propose an end-to-end online 3D ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Existing LiDAR-based 3D object detectors usually focus on the single-frame detection, while ignoring the spatiotemporal information in consecutive point cloud frames. In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences. The proposed model comprises a spatial feature encoding component and a spatiotemporal feature aggregation component. In the former component, a novel Pillar Message Passing Network (PMPNet) is proposed to encode each discrete point cloud frame. It adaptively collects information for a pillar node from its neighbors by iterative message passing, which effectively enlarges the receptive field of the pillar feature. In the latter component, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU) to aggregate the spatiotemporal information, which enhances the conventional ConvGRU with an attentive memory gating mechanism. AST-GRU contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module, which can emphasize the foreground objects and align the dynamic objects, respectively. Experimental results demonstrate that the proposed 3D video object detector achieves state-of-the-art performance on the large-scale nuScenes benchmark.
暂无评论