检索结果-内蒙古大学图书馆

Bayesian optimized 1-Bit CNNs

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gu, Jiaxin Zhao, Junhe Jiang, Xiaolong Zhang, Baochang Liu, Jianzhuang Guo, Guodong Ji, Rongrong Beihang University Beijing China Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Huawei Noah's Ark Lab China School of Information Science and Engineering Xiamen University Fujian China Peng Cheng Lab Shenzhen China

deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models. However, it is still a great challenge to achieve powerful DCNNs in resource-limited environments, such as on embedded devices and smart phones. Researchers have realized that 1-bit CNNs can be one feasible solution to resolve the issue;however, they are baffled by the inferior performance compared to the full-precision DCNNs. In this paper, we propose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs. We incorporate the prior distributions of full-precision kernels and features into the Bayesian framework to construct 1-bit CNNs in an end-to-end manner, which have not been considered in any previous related methods. The Bayesian losses are achieved with a theoretical support to optimize the network simultaneously in both continuous and discrete spaces, aggregating different losses jointly to improve the model capacity. Extensive experiments on the ImageNet and CIFAR datasets show that BONNs achieve the best classification performance compared to state-of-the-art 1-bit CNNs. Copyright © 2019, The Authors. All rights reserved.

关键词： deep neural networks

Interactive grounded language acquisition and generalization in a 2D world 6

学校读者我要写书评

暂无评论

Interactive grounded language acquisition and generalization...

6th International Conference on learning Representations, ICLR 2018

作者： Yu, Haonan Zhang, Haichao Xu, Wei Baidu Research Sunnyvale United States National Engineering Laboratory for Deep Learning Technology and Applications Beijing China

We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher’s language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. The proposed model significantly outperforms five comparison methods for interpreting zero-shot sentences. In addition, we demonstrate human-interpretable intermediate outputs of the model in the appendix. © learning Representations, ICLR 2018 - Conference Track *** right reserved.

关键词： Visual languages

Detailed human shape estimation from a single image by hierarchical mesh deformation

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhu, Hao Zuo, Xinxin Wang, Sen Cao, Xun Yang, Ruigang Nanjing University Nanjing China University of Kentucky LexingtonKY United States Northwestern Polytechnical University Xi'an China Baidu Inc. Beijing China National Engineering Laboratory of Deep Learning and Technology and Application China

This paper presents a novel framework to recover detailed human body shapes from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric based template that lacks the surface details. As such the resulting body shape appears to be without clothing. In this paper, we propose a novel learningbased framework that combines the robustness of parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. We are able to restore detailed human body shapes beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance. The code is available in https://***/zhuhao-nju/***. Copyright © 2019, The Authors. All rights reserved.

关键词： deep neural networks

CASIA-SURF: A Large-scale Multi-modal Benchmark for Face Anti-spoofing

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhang, Shifeng Liu, Ajian Wan, Jun Liang, Yanyan Guo, Guogong Escalera, Sergio Escalante, Hugo Jair Li, Stan Z. National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing China Macau University of Science and Technology Macau China Institute of Deep Learning Baidu Research and National Engineering Laboratory for Deep Learning Technology and Application Universitat de Barcelona Computer Vision Center Barcelona Catalonia Instituto Nacional de Astrofsica Ptica y Electrnica Puebla72840 Mexico

Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face antispoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIASURF, which is the largest publicly available dataset for face antispoofing in terms of both subjects and modalities. Specifically, it consists of 1;000 subjects with 21;000 videos and each sample has 3 modalities (i.e., RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at http://***/***/chalearnfacespoofingattackdete/. Copyright © 2019, The Authors. All rights reserved.

关键词： Face recognition

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent.

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Hu, Wenqing Zhu, Zhanxing Xiong, Haoyi Huan, Jun Department of Mathematics and Statistics Missouri University of Science and Technology University of Missouri Rolla Peking University Beijing Institute of Big Data Research Beijing China Big Data Lab Baidu Inc. National Engineering Laboratory of Deep Learning Application and Technology

We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in "implicit regularization", a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments. Copyright © 2019, The Authors. All rights reserved.

关键词： Stochastic systems

Rethinking Table Recognition using Graph Neural Networks

学校读者我要写书评

暂无评论

Rethinking Table Recognition using Graph Neural Networks

International Conference on Document Analysis and Recognition

作者： Shah Rukh Qasim Hassan Mahmood Faisal Shafait School of Electrical Engineering and Computer Science (SEECS) National University of Sciences and Technology (NUST) Islamabad Pakistan Deep Learning Laboratory National Center of Artificial Intelligence (NCAI) Islamabad Pakistan

Document structure analysis, such as zone segmentation and table recognition, is a complex problem in document processing and is an active area of research. The recent success of deep learning in solving various computer vision and machine learning problems has not been reflected in document structure analysis since conventional neural networks are not well suited to the input structure of the problem. In this paper, we propose an architecture based on graph networks as a better alternative to standard neural networks for table recognition. We argue that graph networks are a more natural choice for these problems, and explore two gradient-based graph neural networks. Our proposed architecture combines the benefits of convolutional neural networks for visual feature extraction and graph networks for dealing with the problem structure. We empirically demonstrate that our method outperforms the baseline by a significant margin. In addition, we identify the lack of large scale datasets as a major hindrance for deep learning research for structure analysis and present a new large scale synthetic dataset for the problem of table recognition. Finally, we open-source our implementation of dataset generation and the training framework of our graph networks to promote reproducible research in this direction.

关键词： Feature extraction Computer architecture Convolutional neural networks Text analysis deep learning Training

Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks

学校读者我要写书评

暂无评论

Table Structure Extraction with Bi-Directional Gated Recurre...

International Conference on Document Analysis and Recognition

作者： Saqib Ali Khan Syed Muhammad Daniyal Khalid Muhammad Ali Shahzad Faisal Shafait School of Electrical Engineering and Computer Science (SEECS) National University of Sciences and Technology (NUST) Islamabad Pakistan Deep Learning Laboratory National Center of Artificial Intelligence (NCAI) Islamabad Pakistan

Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.

关键词： Layout Bidirectional control Image segmentation deep learning Recurrent neural networks Logic gates Optical character recognition software

Multi-Task Neural learning Architecture for End-to-End Identification of Helpful Reviews

学校读者我要写书评

暂无评论

Multi-Task Neural Learning Architecture for End-to-End Ident...

International Conference on Advances in Social Network Analysis and Mining, ASONAM

作者： Miao Fan Yue Feng Mingming Sun Ping Li Haifeng Wang Jianmin Wang National Engineering Laboratory of Deep Learning Technology and Application China School of Software Engineering Tsinghua University

ISBN: (纸本)9781538660522

Helpful reviews play a pivotal role in recommending desirable goods and accelerating purchase decisions of customers in e-commercial services. Given a large proportion of product reviews with unknown helpfulness/unhelpfulness, the research on automatic identification of helpful reviews has drawn much attention in recent years. However, state-of-the-art approaches still rely heavily on extracting heuristic text features from reviews with domain-specific knowledge. In this paper, we first introduce a multi-task neural learning (MTNL) architecture for identifying helpful reviews. The end-to-end neural architecture can learn to reconstruct effective features upon the raw input of words and even characters, and the multi-task learning paradigm helps to make more accurate predictions of helpful reviews based on a secondary task which fits the star ratings of reviews. We also build two datasets containing helpful/unhelpful reviews from different product categories in Amazon, and compare the performance of MTNL with several mainstream methods on both datasets. Experimental results confirm that MTNL outperforms the state-of-the-art approaches by a significant margin.

关键词： Task analysis Feature extraction Encoding Semantics Training Computer architecture

Towards Making deep Transfer learning Never Hurt

学校读者我要写书评

暂无评论

Towards Making Deep Transfer Learning Never Hurt

IEEE International Conference on Data Mining (ICDM)

作者： Ruosi Wan Haoyi Xiong Xingjian Li Zhanxing Zhu Jun Huan Big Data Laboratory Baidu Inc. Beijing China School of Mathematical Sciences Peking University Beijing China National Engineering Laboratory for Deep Learning Technology and Applications Beijing China

Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.

关键词：