检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Cao, Yulong Wang, Ningfei Xiao, Chaowei Yang, Dawei Fang, Jin Yang, Ruigang Chen, Qi Alfred Liu, Mingyan Li, Bo University of California Irvine United States University of Michigan United States NVIDIA Research Arizona State University Inceptio Baidu Research and National Engineering Laboratory of Deep Learning Technology and Application China University of Illinois at Urbana-Champaign

In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. To systematically generate such a physical-world attack, we propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. We evaluate our attack on MSF algorithms included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF algorithms. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system. We also evalu

关键词： Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

RotPredictor: Unsupervised Canonical Viewpoint learning for Point Cloud Classification

RotPredictor: Unsupervised Canonical Viewpoint Learning for ...

引用

International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT)

作者： Jin Fang Dingfu Zhou Xibin Song Shengze Jin Ruigang Yang Liangjun Zhang Baidu Research National Engineering Laboratory of Deep Learning Technology and Application China ETH Zürich Switzerland University of Kentucky

ISBN: (数字)9781728181288

ISBN: (纸本)9781728181295

Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.

关键词： Three-dimensional displays Task analysis Robustness Convolution Two dimensional displays deep learning Solid modeling

来源：评论

学校读者我要写书评

暂无评论

Binarized neural architecture search for efficient object recognition

arXiv

引用

arXiv 2020年

作者： Chen, Hanlin an Zhuo, Li Zhang, Baochang Zheng, Xiawu Liu, Jianzhuang Ji, Rongrong Doermann, David Guo, Guodong Beihang University Beijing China Xiamen University Fujian China Shenzhen Institutes of Advanced Technology University at Buffalo Institute of Deep Learning Baidu Research National Engineering Laboratory for Deep Learning Technology and Application Shenzhen China

Traditional neural architecture search (NAS) has a significant impact in computer vision by automatically designing network architectures for various tasks. In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing. The BNAS calculation is more challenging than NAS due to the learning inefficiency caused by optimization requirements and the huge architecture space, and the performance loss when handling the wild data in various computing applications. To address these issues, we introduce operation space reduction and channel sampling into BNAS to significantly reduce the cost of searching. This is accomplished through a performance-based strategy that is robust to wild data, which is further used to abandon less potential operations. Furthermore, we introduce the Upper Confidence Bound (UCB) to solve 1-bit BNAS. Two optimization methods for binarized neural networks are used to validate the effectiveness of our BNAS. Extensive experiments demonstrate that the proposed BNAS achieves a comparable performance to NAS on both CIFAR and ImageNet databases. An accuracy of 96.53% vs. 97.22% is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a 40% faster search than the state-of-the-art PC-DARTS. Copyright © 2020, The Authors. All rights reserved.

关键词： Edge computing

来源：评论

学校读者我要写书评

暂无评论

QuantYOLO: A High-Throughput and Power-Efficient Object Detection Network for Resource and Power Constrained UAVs

QuantYOLO: A High-Throughput and Power-Efficient Object Dete...

引用

Proceedings of the Digital Image Computing: Technqiues and applications (DICTA)

作者： Muhammad Gohar Javed Minahil Raza Muhammad Mohsin Ghaffar Christian Weis Norbert Wehn Muhammad Shahzad Faisal Shafait School of Electrical Engineering and Computer Science National University of Sciences and Technology Pakistan Microelectronic Systems Design Research Group Technische Universität Kaiserslautern Germany Deep Learning Laboratory National Center of Artificial Intelligence (NCAI) Pakistan Data Science in Earth Observation TU Munich Germany

Convolutional Neural Networks (CNNs) are producing state-of-the-art results in the object detection field. However, deep topologies of CNN are computationally intensive and typically require excessive resources (i.e. high-end GPUs), which hinder their deployment on resource and power constrained UAVs. In this work, we present a high-throughput and power efficient quantized object detection network, QuantYOLO, which is based on the Tiny-YOLOv2 topology. We conduct a detailed exploration of precision and filter pruning vs. accuracy, throughput and power consumption trade-off for the object detection task. As a result of these explorations, we select a network with binarized weights and 4-bit activations (except the output layer), which is 21.8× smaller than the Tiny-YOLOv2 achieving a mean Average Precision (mAP) of 51.5% on the PASCAL-VOC dataset. Finally, we present an FPGA based accelerator, which achieves 1.6× higher throughput (FPS) and is 3.1× more power efficient as compared to prior FPGA architectures.

关键词： Quantization (signal) Power demand Network topology Object detection Computer architecture Throughput Topology

来源：评论

学校读者我要写书评

暂无评论

FCFR-Net: Feature fusion based coarse-to-fine residual learning for depth completion

arXiv

引用

arXiv 2020年

作者： Liu, Lina Song, Xibin Lyu, Xiaoyang Diao, Junwei Wang, Mengmeng Liu, Yong Zhang, Liangjun Institute of Cyber-Systems and Control Zhejiang University China Baidu Research China National Engineering Laboratory of Deep Learning Technology and Application China

Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i.e., a sparse-to-coarse stage and a coarse-to-fine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with a coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from the color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches. Copyright © 2020, The Authors. All rights reserved.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

iffDetector: Inference-aware feature filtering for object detection

arXiv

引用

arXiv 2020年

作者： Mao, Mingyuan Tian, Yuxin Zhang, Baochang Ye, Qixiang Liu, Wanquan Guo, Guodong Doermann, David Beihang University Beijing China University of Chinese Academy of Sciences Beijing China Curtin University Perth Australia Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application University at Buffalo Buffalo United States

Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors, resulting in our iffDetector. Unlike conventional open-loop feature calculation approaches without feedback, the IFF module performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features. By applying Fourier transform analysis, we demonstrate that the IFF module acts as a negative feedback that theoretically guarantees the stability of feature learning. IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that our iffDetector consistently outperforms state-of-the-art methods by significant margins1. Copyright © 2020, The Authors. All rights reserved.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

3D Part Guided Image Editing for Fine-Grained Object Understanding

3D Part Guided Image Editing for Fine-Grained Object Underst...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Zongdai Liu Feixiang Lu Peng Wang Hui Miao Liangjun Zhang Ruigang Yang Bin Zhou State Key Laboratory of Virtual Reality Technology and Systems Beihang University Robotics and Autonomous Driving Laboratory Baidu Research National Engineering Laboratory of Deep Learning Technology and Application China ByteDance Research University of Kentucky Peng Cheng Laboratory Shenzhen China

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

Holistically understanding an object with its 3D movable parts is essential for visual models of a robot to interact with the world. For example, only by understanding many possible part dynamics of other vehicles (e.g., door or trunk opening, taillight blinking for changing lane), a self-driving vehicle can be success in dealing with emergency cases. However, existing visual models tackle rarely on these situations, but focus on bounding box detection. In this paper, we fill this important missing piece in autonomous driving by solving two critical issues. First, for dealing with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to cars in real images. This allows us to directly edit the real images using the aligned 3D parts, yielding effective training data for learning robust deep neural networks (DNNs). Secondly, to benchmark the quality of 3D part understanding, we collected a large dataset in real driving scenario with cars in uncommon states (CUS), i.e. with door or trunk opened etc., which demonstrates that our trained network with edited images largely outperforms other baselines in terms of 2D detection and instance segmentation accuracy.

关键词： Three-dimensional displays Automobiles Solid modeling Two dimensional displays Image segmentation Vehicle dynamics Training

来源：评论

学校读者我要写书评

暂无评论

Interactive grounded language acquisition and generalization in a 2D world 6

Interactive grounded language acquisition and generalization...

引用

6th International Conference on learning Representations, ICLR 2018

作者： Yu, Haonan Zhang, Haichao Xu, Wei Baidu Research Sunnyvale United States National Engineering Laboratory for Deep Learning Technology and Applications Beijing China

We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix. © learning Representations, ICLR 2018 - Conference Track *** right reserved.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

A new weighting scheme for fan-beam and circle cone-beam CT reconstructions

arXiv

引用

arXiv 2021年

作者： Wang, Wei Xia, Xiang-Gen He, Chuanjiang Ren, Zemin Lu, Jian Wang, Tianfu Lei, Baiying The School of Biomedical Engineering Shenzhen University National-Regional Key Technology Engineering Laboratory for Medical Ultrasound Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging School of Biomedical Engineering Health Science Center Shenzhen University Shenzhen China The Department of Electrical and Computer Engineering University of Delaware NewarkDE19716 United States The College of Mathematics and Statistics Chongqing University Chongqing China The College of Mathematics and Physics Chongqing University of Science and Technology Chongqing China The Shenzhen Key Laboratory of Advanced Machine Learning and Applications Shenzhen University Shenzhen China

In this paper, we first present an arc based algorithm for fan-beam computed tomography (CT) reconstruction via applying Katsevich’s helical CT formula to 2D fan-beam CT reconstruction. Then, we propose a new weighting function to deal with the redundant projection data. By extending the weighted arc based fan-beam algorithm to circle cone-beam geometry, we also obtain a new FDK-similar algorithm for circle cone-beam CT reconstruction. Experiments show that our methods can obtain higher PSNR and SSIM compared to the Parker-weighted conventional fan-beam algorithm and the FDK algorithm for super-short-scan trajectories. Copyright © 2021, The Authors. All rights reserved.

关键词： Computerized tomography

来源：评论

学校读者我要写书评

暂无评论

LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

LiDAR-Based Online 3D Video Object Detection With Graph-Base...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Junbo Yin Jianbing Shen Chenye Guan Dingfu Zhou Ruigang Yang Beijing Lab of Intelligent Information Technology School of Computer Science Beijing Institute of Technology China Baidu Research Inception Institute of Artificial Intelligence UAE National Engineering Laboratory of Deep Learning Technology and Application China University of Kentucky Kentucky USA

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

Existing LiDAR-based 3D object detectors usually focus on the single-frame detection, while ignoring the spatiotemporal information in consecutive point cloud frames. In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences. The proposed model comprises a spatial feature encoding component and a spatiotemporal feature aggregation component. In the former component, a novel Pillar Message Passing Network (PMPNet) is proposed to encode each discrete point cloud frame. It adaptively collects information for a pillar node from its neighbors by iterative message passing, which effectively enlarges the receptive field of the pillar feature. In the latter component, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU) to aggregate the spatiotemporal information, which enhances the conventional ConvGRU with an attentive memory gating mechanism. AST-GRU contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module, which can emphasize the foreground objects and align the dynamic objects, respectively. Experimental results demonstrate that the proposed 3D video object detector achieves state-of-the-art performance on the large-scale nuScenes benchmark.

关键词： Three-dimensional displays Feature extraction Spatiotemporal phenomena Detectors Object detection Encoding Message passing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：