检索结果-内蒙古大学图书馆

Hierarchical Pyramid Diverse Attention Networks for Face Recognition

学校读者我要写书评

暂无评论

Hierarchical Pyramid Diverse Attention Networks for Face Rec...

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Qiangchang Wang Tianyi Wu He Zheng Guodong Guo West Virginia University Morgantown USA Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

deep learning has achieved a great success in face recognition (FR), however, few existing models take hierarchical multi-scale local features into consideration. In this work, we propose a hierarchical pyramid diverse attention (HPDA) network. First, it is observed that local patches would play important roles in FR when the global face appearance changes dramatically. Some recent works apply attention modules to locate local patches automatically without relying on face landmarks. Unfortunately, without considering diversity, some learned attentions tend to have redundant responses around some similar local patches, while neglecting other potential discriminative facial parts. Meanwhile, local patches may appear at different scales due to pose variations or large expression changes. To alleviate these challenges, we propose a pyramid diverse attention (PDA) to learn multi-scale diverse local representations automatically and adaptively. More specifically, a pyramid attention is developed to capture multi-scale features. Meanwhile, a diverse learning is developed to encourage models to focus on different local patches and generate diverse local features. Second, almost all existing models focus on extracting features from the last convolutional layer, lacking of local details or small-scale face parts in lower layers. Instead of simple concatenation or addition, we propose to use a hierarchical bilinear pooling (HBP) to fuse information from multiple layers effectively. Thus, the HPDA is developed by integrating the PDA into the HBP. Experimental results on several datasets show the effectiveness of the HPDA, compared to the state-of-the-art methods.

关键词： Face Feature extraction Face recognition Handheld computers Machine learning Fuses Computational modeling

GINet: Graph interaction network for scene parsing

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Wu, Tianyi Lu, Yu Zhu, Yu Zhang, Chuang Wu, Ming Ma, Zhanyu Guo, Guodong Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China Beijing University of Posts and Telecommunications Beijing China

Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorperate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff. Copyright © 2020, The Authors. All rights reserved.

关键词： Convolution

Invisible for both Camera and LiDAR: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Cao, Yulong Wang, Ningfei Xiao, Chaowei Yang, Dawei Fang, Jin Yang, Ruigang Chen, Qi Alfred Liu, Mingyan Li, Bo University of California Irvine United States University of Michigan United States NVIDIA Research Arizona State University Inceptio Baidu Research and National Engineering Laboratory of Deep Learning Technology and Application China University of Illinois at Urbana-Champaign

In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on camera- or LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. To systematically generate such a physical-world attack, we propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception. We evaluate our attack on MSF algorithms included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF algorithms. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system. We also evalu

关键词： Autonomous vehicles

RotPredictor: Unsupervised Canonical Viewpoint learning for Point Cloud Classification

学校读者我要写书评

暂无评论

RotPredictor: Unsupervised Canonical Viewpoint Learning for ...

International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT)

作者： Jin Fang Dingfu Zhou Xibin Song Shengze Jin Ruigang Yang Liangjun Zhang Baidu Research National Engineering Laboratory of Deep Learning Technology and Application China ETH Zürich Switzerland University of Kentucky

ISBN: (数字)9781728181288

ISBN: (纸本)9781728181295

Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.

关键词： Three-dimensional displays Task analysis Robustness Convolution Two dimensional displays deep learning Solid modeling

SFedXL: Semi-Synchronous Federated learning With Cross-Sharpness and Layer-Freezing

学校读者我要写书评

暂无评论

IEEE Internet of Things Journal 2025年

作者： Zhao, Mingxiong Zhao, Shihao Feng, Chenyuan Yang, Howard H. Niyato, Dusit Quek, Tony Q. S. Yunnan University National Pilot School of Software Kunming650500 China Ministry of Education Engineering Research Center of Integration and Application of Digital Learning Technology Beijing100039 China Ministry of Education Engineering Research Center of Cyberspace Kunming650504 China Yunnan University of Finance and Economics Yunnan Key Laboratory of Service Computing Kunming650221 China EURECOM Sophia Antipolis06410 France Zhejiang University University of Illinois Urbana-Champaign Institute Zhejiang University Haining314400 China Nanyang Technological University School of Computer Science and Engineering 639798 Singapore Singapore University of Technology and Design Information Systems Technology and Design Pillar 487372 Singapore

Federated learning (FL) emerges as a potential solution for enabling multiple terminal devices to collaboratively accomplish computational tasks within an Unmanned Aerial Vehicle (UAV) swarm. However, traditional FL approaches, predicated on synchronous data aggregation, are not feasible for a UAV swarm owing to the inherently variable and dynamic nature of their communication networks compared with terrestrial systems. Furthermore, the data procured by UAVs is often highly heterogeneous, attributable to disparities in deployment environments and device attributes. Considering the distinct flight paths and unique operational conditions encountered by different UAVs, a considerable amount of data remains unlabeled. To tackle the challenges associated with asynchronous operations and the prevalence of unlabeled data, we introduce a novel framework termed Semi-synchronous FL with Cross-Sharpness and Layer-Freezing (SFedXL), tailored for a UAV swarm. In particular, we devise a cross-sharpness model training strategy aimed at optimizing the utilization of both labeled and unlabeled datasets. Additionally, we propose an innovative semi-synchronous model aggregation protocol, complemented by client-specific layer-freezing and client cluster scheduling, designed to expedite the training process. Our simulation results indicate that the proposed algorithm surpasses current FL methods in terms of object recognition accuracy and communication efficiency, albeit with a trade-off of increased local computation latency. © 2014 IEEE.

关键词： Federated learning

Incremental learning of Object Detector with Limited Training Data

学校读者我要写书评

暂无评论

Incremental Learning of Object Detector with Limited Trainin...

Proceedings of the Digital Image Computing: Technqiues and applications (DICTA)

作者： Muhammad Abdullah Hafeez Adnan Ul-Hasan Faisal Shafait School of Electrical Engineering and Computer Science (SEECS) National University of Sciences and Technology (NUST) Islamabad Pakistan Deep Learning Laboratory National Center of Artificial Intelligence (NCAI) Islamabad Pakistan

State of the art deep learning models, despite being at par to the human level in some of the challenging tasks, still suffer badly when they are put in the condition where they have to learn with time. This open challenge problem of making deep learning model learn with time is referred in the literature as Lifelong learning, Incremental learning or Continual learning. In each increment, new classes/tasks are introduced to the existing model and trained on them while maintaining the accuracy of the previously learned classes/tasks. But accuracy of the deep learning model on the previously learned classes/tasks decreases with each increment. The main reason behind this accuracy drop is catastrophic forgetting, an inherent flaw in the deep learning models, where weights learned during the past increments, get disturbed while learning the new classes/tasks from new increment. Several approaches have been proposed to mitigate or avoid this catastrophic forgetting, such as the use of knowledge distillation, rehearsal over previous classes, or dedicated paths for different increments, etc. In this work, we have proposed a novel approach based on transfer learning methodology, which uses a combination of pre-trained shared and fixed network as a backbone, along with a dedicated network extension in incremental setting for the learning of new tasks incrementally. The results have shown that our approach has better performance in two ways. First, our model has significantly better overall incremental accuracy than that of the best in class model in different incremental configurations. Second, our approach achieves better results while maintaining properties of true incremental learning algorithm i.e. successful avoidance of the catastrophic forgetting issue and complete eradication of the need of saved exemplars or retraining phases, which are required by the current state of the art model to maintain performance.

关键词： deep learning Training Knowledge engineering Philosophical considerations Digital images Transfer learning Training data

Transformer-based Annotation Bias-aware Medical Image Segmentation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Liao, Zehui Xie, Yutong Hu, Shishuai Xia, Yong National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology School of Computer Science and Engineering Northwestern Polytechnical University Xi’an710072 China Australian Institute for Machine Learning The University of Adelaide AdelaideSA Australia Ningbo Institute of Northwestern Polytechnical University Ningbo315048 China Research and Development Institute Northwestern Polytechnical University in Shenzhen Shenzhen518057 China

Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It is unlikely that convolution blocks can effectively model the varying degrees of preference at the full resolution level. Additionally, the independent pixel-wise Gaussian distribution disregards pixel correlations, leading to a discontinuous boundary. This paper proposes a Transformer-based Annotation Bias-aware (TAB) medical image segmentation model, which tackles the annotator-related bias via modeling annotator preference and stochastic errors. TAB employs the Transformer with learnable queries to extract the different preference-focused features. This enables TAB to produce segmentation with various preferences simultaneously using a single segmentation head. Moreover, TAB takes the multivariant normal distribution assumption that models pixel correlations, and learns the annotation distribution to disentangle the stochastic error. We evaluated our TAB on an OD/OC segmentation benchmark annotated by six annotators. Our results suggest that TAB outperforms existing medical image segmentation models which take into account the annotator-related bias. © 2023, CC BY-NC-ND.

关键词： Gaussian distribution

FCFR-Net: Feature fusion based coarse-to-fine residual learning for depth completion

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Liu, Lina Song, Xibin Lyu, Xiaoyang Diao, Junwei Wang, Mengmeng Liu, Yong Zhang, Liangjun Institute of Cyber-Systems and Control Zhejiang University China Baidu Research China National Engineering Laboratory of Deep Learning Technology and Application China

Depth completion aims to recover a dense depth map from a sparse depth map with the corresponding color image as input. Recent approaches mainly formulate depth completion as a one-stage end-to-end learning task, which outputs dense depth maps directly. However, the feature extraction and supervision in one-stage frameworks are insufficient, limiting the performance of these approaches. To address this problem, we propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task, i.e., a sparse-to-coarse stage and a coarse-to-fine stage. First, a coarse dense depth map is obtained by a simple CNN framework. Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with a coarse depth map and color image as input. Specially, in the coarse-to-fine stage, a channel shuffle extraction operation is utilized to extract more representative features from the color image and coarse depth map, and an energy based fusion operation is exploited to effectively fuse these features obtained by channel shuffle operation, thus leading to more accurate and refined depth maps. We achieve SoTA performance in RMSE on KITTI benchmark. Extensive experiments on other datasets future demonstrate the superiority of our approach over current state-of-the-art depth completion approaches. Copyright © 2020, The Authors. All rights reserved.

关键词： Benchmarking

Binarized neural architecture search for efficient object recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Chen, Hanlin an Zhuo, Li Zhang, Baochang Zheng, Xiawu Liu, Jianzhuang Ji, Rongrong Doermann, David Guo, Guodong Beihang University Beijing China Xiamen University Fujian China Shenzhen Institutes of Advanced Technology University at Buffalo Institute of Deep Learning Baidu Research National Engineering Laboratory for Deep Learning Technology and Application Shenzhen China

Traditional neural architecture search (NAS) has a significant impact in computer vision by automatically designing network architectures for various tasks. In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing. The BNAS calculation is more challenging than NAS due to the learning inefficiency caused by optimization requirements and the huge architecture space, and the performance loss when handling the wild data in various computing applications. To address these issues, we introduce operation space reduction and channel sampling into BNAS to significantly reduce the cost of searching. This is accomplished through a performance-based strategy that is robust to wild data, which is further used to abandon less potential operations. Furthermore, we introduce the Upper Confidence Bound (UCB) to solve 1-bit BNAS. Two optimization methods for binarized neural networks are used to validate the effectiveness of our BNAS. Extensive experiments demonstrate that the proposed BNAS achieves a comparable performance to NAS on both CIFAR and ImageNet databases. An accuracy of 96.53% vs. 97.22% is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a 40% faster search than the state-of-the-art PC-DARTS. Copyright © 2020, The Authors. All rights reserved.

关键词： Edge computing