检索结果-内蒙古大学图书馆

Bayesian Optimized 1-Bit CNNs

学校读者我要写书评

暂无评论

Bayesian Optimized 1-Bit CNNs

International Conference on Computer Vision (ICCV)

作者： Jiaxin Gu Junhe Zhao Xiaolong Jiang Baochang Zhang Jianzhuang Liu Guodong Guo Rongrong Ji Beihang University Beijing China Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Huawei Noah’s Ark Lab China School of Information Science and Engineering Xiamen University Fujian China Peng Cheng Lab Shenzhen China

ISBN: (数字)9781728148038

ISBN: (纸本)9781728148045

deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models. However, it is still a great challenge to achieve powerful DCNNs in resource-limited environments, such as on embedded devices and smart phones. researchers have realized that 1-bit CNNs can be one feasible solution to resolve the issue; however, they are baffled by the inferior performance compared to the full-precision DCNNs. In this paper, we propose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs. We incorporate the prior distributions of full-precision kernels and features into the Bayesian framework to construct 1-bit CNNs in an end-to-end manner, which have not been considered in any previous related methods. The Bayesian losses are achieved with a theoretical support to optimize the network simultaneously in both continuous and discrete spaces, aggregating different losses jointly to improve the model capacity. Extensive experiments on the ImageNet and CIFAR datasets show that BONNs achieve the best classification performance compared to state-of-the-art 1-bit CNNs.

关键词： Bayes methods Kernel Quantization (signal) Convolution Training Frequency modulation Indexes

Bayesian optimized 1-Bit CNNs

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Gu, Jiaxin Zhao, Junhe Jiang, Xiaolong Zhang, Baochang Liu, Jianzhuang Guo, Guodong Ji, Rongrong Beihang University Beijing China Institute of Deep Learning Baidu Research Beijing China National Engineering Laboratory for Deep Learning Technology and Application Huawei Noah's Ark Lab China School of Information Science and Engineering Xiamen University Fujian China Peng Cheng Lab Shenzhen China

deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models. However, it is still a great challenge to achieve powerful DCNNs in resource-limited environments, such as on embedded devices and smart phones. researchers have realized that 1-bit CNNs can be one feasible solution to resolve the issue;however, they are baffled by the inferior performance compared to the full-precision DCNNs. In this paper, we propose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs. We incorporate the prior distributions of full-precision kernels and features into the Bayesian framework to construct 1-bit CNNs in an end-to-end manner, which have not been considered in any previous related methods. The Bayesian losses are achieved with a theoretical support to optimize the network simultaneously in both continuous and discrete spaces, aggregating different losses jointly to improve the model capacity. Extensive experiments on the ImageNet and CIFAR datasets show that BONNs achieve the best classification performance compared to state-of-the-art 1-bit CNNs. Copyright © 2019, The Authors. All rights reserved.

关键词： deep neural networks

Kham dialect speech synthesis based on deep learning

学校读者我要写书评

暂无评论

Kham dialect speech synthesis based on deep learning

2019 International Joint Conference on Information, Media, and engineering, IJCIME 2019

作者： Zhang, Weizhao Yang, Hongwu Bu, Xiaolong College of Physics and Electronic Engineering Engineering Research Center of Gansu Province for Intelligent Information Technology and Application Northwest Normal University Lanzhou China School of Educational Technology National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education College of Physics and Electronic Engineering Northwest Normal University Lanzhou China

ISBN: (纸本)9781728155869

In this paper, we constructed speech synthesis corpus of Kham dialect. At the same time, we designed SAMP-Kham machine-readable phonetic label of Kham dialect, and proposed a framework of Kham dialect speech synthesis based on deep learning. The deep learning architecture includes deep neural network (DNN), hybrid long short-term memory (LSTM) and hybrid Bidirectional long short-term memory (BLSTM). The objective and subjective evaluations test showed that quality of synthesized speech of Kham dialect was satisfied. © 2019 IEEE.

关键词： deep neural networks

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent.

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Hu, Wenqing Zhu, Zhanxing Xiong, Haoyi Huan, Jun Department of Mathematics and Statistics Missouri University of Science and Technology University of Missouri Rolla Peking University Beijing Institute of Big Data Research Beijing China Big Data Lab Baidu Inc. National Engineering Laboratory of Deep Learning Application and Technology

We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in "implicit regularization", a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments. Copyright © 2019, The Authors. All rights reserved.

关键词： Stochastic systems

The speech synthesis of yi language based on DNN

学校读者我要写书评

暂无评论

The speech synthesis of yi language based on DNN

2019 International Joint Conference on Information, Media, and engineering, IJCIME 2019

作者： Bu, Xiaolong Yang, Hongwu Zhang, Weizhao College of Physics and Electronic Engineering Eng. Research Center of Gansu Province for in Telligent Information Technology and Application Northwest Normal University Lanzhou China School of Educational Technology National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education Northwest Normal University Lanzhou China

ISBN: (纸本)9781728155869

This paper is mainly about a speech synthesis system based on deep Neural Network (DNN) model of Yi languages, a kind of minority language in china. The system is composed of relatively complete text analysis of Yi, model training and speech synthesis module. Especially in front-end, the word segmentation, pause handling, word-to-phoneme conversion and label processing are used to analysis text of Yi language. We designed the question set for decision tree of DNN model training and used vocoder: WORLD for synthesis. The system achieves a relatively good Mean Opinion Score (MOS) of 3.93 by Yi undergraduates as evaluators compared with a MOS of 4.58 of original speech. To investigate the factors affecting the quality of synthesized Yi speech, this paper also objectively evaluates the performance of different training set and DNN model. The system successfully synthesized Yi speech for the first time and synthesized speech is relatively good as the result of an only complete minority language speech synthesis system. © 2019 IEEE.

关键词： deep neural networks

Detailed human shape estimation from a single image by hierarchical mesh deformation

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhu, Hao Zuo, Xinxin Wang, Sen Cao, Xun Yang, Ruigang Nanjing University Nanjing China University of Kentucky LexingtonKY United States Northwestern Polytechnical University Xi'an China Baidu Inc. Beijing China National Engineering Laboratory of Deep Learning and Technology and Application China

This paper presents a novel framework to recover detailed human body shapes from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric based template that lacks the surface details. As such the resulting body shape appears to be without clothing. In this paper, we propose a novel learningbased framework that combines the robustness of parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. We are able to restore detailed human body shapes beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance. The code is available in https://***/zhuhao-nju/***. Copyright © 2019, The Authors. All rights reserved.

关键词： deep neural networks

CASIA-SURF: A Large-scale Multi-modal Benchmark for Face Anti-spoofing

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhang, Shifeng Liu, Ajian Wan, Jun Liang, Yanyan Guo, Guogong Escalera, Sergio Escalante, Hugo Jair Li, Stan Z. National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing China Macau University of Science and Technology Macau China Institute of Deep Learning Baidu Research and National Engineering Laboratory for Deep Learning Technology and Application Universitat de Barcelona Computer Vision Center Barcelona Catalonia Instituto Nacional de Astrofsica Ptica y Electrnica Puebla72840 Mexico

Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face antispoofing benchmarks have limited number of subjects (≤170) and modalities (≤2), which hinder the further development of the academic community. To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIASURF, which is the largest publicly available dataset for face antispoofing in terms of both subjects and modalities. Specifically, it consists of 1;000 subjects with 21;000 videos and each sample has 3 modalities (i.e., RGB, Depth and IR). We also provide comprehensive evaluation metrics, diverse evaluation protocols, training/validation/testing subsets and a measurement tool, developing a new benchmark for face anti-spoofing. Moreover, we present a novel multi-modal multi-scale fusion method as a strong baseline, which performs feature re-weighting to select the more informative channel features while suppressing the less useful ones for each modality across different scales. Extensive experiments have been conducted on the proposed dataset to verify its significance and generalization capability. The dataset is available at http://***/***/chalearnfacespoofingattackdete/. Copyright © 2019, The Authors. All rights reserved.

关键词： Face recognition

Relaxed 2-D principal component analysis by Lpnorm for face recognition

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Chen, Xiao Jia, Zhi-Gang Cai, Yunfeng Zhao, Mei-Xiang School of Mathematics and Statistics Jiangsu Key Laboratory of Education Big Data Science and Engineering Jiangsu Normal University Xuzhou221116 China Baidu Research National Engineering Laboratory for Deep Learning Technology and Applications Beijing100193 China

A relaxed two dimensional principal component analysis (R2DPCA) approach is proposed for face recognition. Different to the 2DPCA, 2DPCA-L1 and G2DPCA, the R2DPCA utilizes the label information (if known) of training samples to calculate a relaxation vector and presents a weight to each subset of training data. A new relaxed scatter matrix is defined and the computed projection axes are able to increase the accuracy of face recognition. The optimal Lp-norms are selected in a reasonable range. Numerical experiments on practical face databased indicate that the R2DPCA has high generalization ability and can achieve a higher recognition rate than state-of-the-art methods. Copyright © 2019, The Authors. All rights reserved.

关键词： Face recognition

DeLS-3D: deep localization and segmentation with a 3D semantic map

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Wang, Peng Yang, Ruigang Cao, Binbin Xu, Wei Lin, Yuanqing Baidu Research National Engineering Laboratory for Deep Learning Technology and Applications

For applications such as augmented reality, autonomous driving, self-localization/camera pose estimation and scene parsing are crucial technologies. In this paper, we propose a unified framework to tackle these two problems simultaneously. The uniqueness of our design is a sensor fusion scheme which integrates camera videos, motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robustness and efficiency of the system. Specifically, we first have an initial coarse camera pose obtained from consumer-grade GPS/IMU, based on which a label map can be rendered from the 3D semantic map. Then, the rendered label map and the RGB image are jointly fed into a pose CNN, yielding a corrected camera pose. In addition, to incorporate temporal information, a multi-layer recurrent neural network (RNN) is further deployed improve the pose accuracy. Finally, based on the pose from RNN, we render a new label map, which is fed together with the RGB image into a segment CNN which produces per-pixel semantic label. In order to validate our approach, we build a dataset with registered 3D point clouds and video camera images. Both the point clouds and the images are semantically-labeled. Each video frame has ground truth pose from highly accurate motion sensors. We show that practically, pose estimation solely relying on images like PoseNet [25] may fail due to street view confusion, and it is important to fuse multiple sensors. Finally, various ablation studies are performed, which demonstrate the effectiveness of the proposed system. In particular, we show that scene parsing and pose estimation are mutually beneficial to achieve a more robust and accurate system. Copyright © 2018, The Authors. All rights reserved.

关键词： Rendering (computer graphics)