检索结果-内蒙古大学图书馆

CNN-Based Audio Front End Processing on Speech recognition

学校读者我要写书评

暂无评论

CNN-Based Audio Front End Processing on Speech Recognition

International Conference on Audio, Language and Image Processing, ICALIP

作者： Ruchao Fan Gang Liu Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing China

Recently, Convolutional Neural Networks (CNNs) have become widely used in the field of speech recognition. Their role is to mitigate the spectral shifts caused by differences between the speakers and the environment. However, how to better use CNN for speech recognition has always been a problem worth exploring. This paper study the usage of CNN as the audio front end processing network based on the acoustic model architecture of Long Short-Term Memory and Connectionist Temporal Classification (LSTM-CTC). Firstly, by comparing the different inputs of the model, a Full-Mel Spectrogram feature that accords with the auditory characteristics is proposed. After determining the input features, this paper explores the principles of CNN parameters design and its role in speech recognition through the comparative experiments of different parameters. Finally, when decoding with phoneme-level language model, we achieve a test set error of 16.9% on the TIMIT phoneme recognition benchmark, which is the best known result of using the CTC as loss function.

关键词： Convolution Kernel Hidden Markov models Spectrogram Computer architecture Speech recognition Filter banks

Deep neural network for analysis of DNA methylation data

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Yu, Hong Ma, Zhanyu Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing China

Many researches demonstrated that the DNA methylation, which occurs in the context of a CpG, has strong correlation with diseases, including cancer. There is a strong interest in analyzing the DNA methylation data to find how to distinguish different subtypes of the tumor. However, the conventional statistical methods are not suitable for analyzing the highly dimensional DNA methylation data with bounded support. In order to explicitly capture the properties of the data, we design a deep neural network, which composes of several stacked binary restricted Boltzmann machines, to learn the low dimensional deep features of the DNA methylation data. Experiments show these features perform best in breast cancer DNA methylation data cluster analysis, comparing with some state-of-the-art methods. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

Classification of EEG signal based on non-Gaussian neutral vector

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Ma, Zhanyu Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing China

In the design of brain-computer interface systems, classification of Electroencephalogram (EEG) signals is the essential part and a challenging task. Recently, as the marginalized discrete wavelet transform (mDWT) representations can reveal features related to the transient nature of the EEG signals, the mDWT coefficients have been frequently used in EEG signal classification. In our previous work, we have proposed a super-Dirichlet distribution-based classifier, which utilized the nonnegative and sum-to-one properties of the mDWT coefficients. The proposed classifier performed better than the state-of-the-art support vector machine-based classifier. In this paper, we further study the neutrality of the mDWT coefficients. Assuming the mDWT vector coefficients to be a neutral vector, we transform them non-linearly into a set of independent scalar coefficients. Feature selection strategy is proposed on the transformed feature domain. Experimental results show that the feature selection strategy helps improving the classification accuracy. Copyright © 2018, The Authors. All rights reserved.

关键词： Electroencephalography

Histogram transform-based speaker identification

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Ma, Zhanyu Yu, Hong Pattern Recognition and Intelligent System Lab Beijing University of Posts and Telecommunications Beijing China

A novel text-independent speaker identification (SI) method is proposed. This method uses the Mel-frequency Cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature sets to capture speaker’s characteristics. In order to utilize dynamic information, we design super-MFCCs features by cascading three neighboring MFCCs frames together. The probability density function (PDF) of these super-MFCCs features is estimated by the recently proposed histogram transform (HT) method, which generates more training data by random transforms to realize the histogram PDF estimation and recedes the commonly occurred discontinuity problem in multivariate histograms computing. Compared to the conventional PDF estimation methods, such as Gaussian mixture models, the HT model shows promising improvement in the SI performance. Copyright © 2018, The Authors. All rights reserved.

关键词： Gaussian distribution

Language identification with deep bottleneck features

学校读者我要写书评

暂无评论

arXiv 2018年

作者： Ma, Zhanyu Yu, Hong Pattern Recognition and Intelligent System Lab Beijing University of Posts and Telecommunications Beijing China

In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligent vehicles. Features used for LSTM learning are generated by a transfer learning method. Bottle-neck features of a deep neural network (DNN) which are trained for mandarin acoustic-phonetic classification are used for LSTM training. In order to improve the SLD accuracy of short utterances a phase vocoder based time-scale modification(TSM) method is used to reduce and increase speech rated of the test utterance. By splicing the normal, speech rate reduced and increased utterances, we can extend length of test utterances so as to improved improved the performance of the SLD system. The experimental results on AP17-OLR database shows that the proposed methods can improve the performance of SLD, especially on short utterance with 1s and 3s durations. Copyright © 2018, The Authors. All rights reserved.

关键词： Deep neural networks

TA4REC: Recurrent Neural Networks with Time Attention Factors for Session-based Recommendations

学校读者我要写书评

暂无评论

TA4REC: Recurrent Neural Networks with Time Attention Factor...

International Joint Conference on Neural Networks

作者： Yu Sun Peize Zhao Honggang Zhang Pattern Recognition and Intelligent System lab Beijing University of Posts and Telecommunications Beijing China

Recommender systems show increasingly importance with the development of E-commerce, news and multimedia applications. Traditional recommendation algorithms such as collaborative-filtering-based methods and graph-based methods mainly use items' original attributes and relationships between items and users, ignoring items' chronological order in browsing sessions. In recent years, RNN-based methods show their superiority when dealing with the sequential data, and some modified RNN models have been proposed. However, these RNN models only use the sequence order of items and neglect items' browsing time information. It is widely accepted that users tend to spend more time on their interested items, and these interested items are always closely related to users' current target. Based on the above view, items' browsing time is an important feature in recommendations. In this paper, we propose a modified RNN-based recommender system called TA4Rec, which can recommend the probable Item that may be clicked in the next moment. Our main contribution is to introduce a method to calculate the time-attention factors from browsing items' duration time and add time-attention factors to the RNN-based model. We conduct experiments on RecSys Challenge 2015 dataset and the result shows that TA4Rec model has gained obvious improvement on session-based recommendations than the classic session-based recommender method.

关键词： Logic gates Training Recommender systems Standards Recurrent neural networks Logistics

Point adversarial self mining: A simple method for facial expression recognition

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Liu, Ping Lin, Yuewei Meng, Zibo Lu, Lu Deng, Weihong Zhou, Joey Tianyi Yang, Yi Institute of High Performance Computing Agency for Science Technology and Research Singapore Singapore Centre for Artificial Intelligence University of Technology Sydney Sydney Australia Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing China Brookhaven National Laboratory UptonNY United States InnoPeak Technology Inc. Palo AltoCA United States Key Laboratory of Medical Molecular Virology School of Basic Medical Sciences Fudan University Shanghai China

In this paper, we propose a simple yet effective approach, named Point Adversarial Self Mining (PASM), to improve the recognition accuracy in facial expression recognition. Unlike previous works focusing on designing specific architectures or loss functions to solve this problem, PASM boosts the network capability by simulating human learning processes: providing updated learning materials and guidance from more capable teachers. Specifically, to generate new learning materials, PASM leverages a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task, generating harder learning samples to refine the network. The searched position is highly adaptive since it considers both the statistical information of each sample and the teacher network capability. Other than being provided new learning materials, the student network also receives guidance from the teacher network. After the student network finishes training, the student network changes its role and acts as a teacher, generating new learning materials and providing stronger guidance to train a better student network. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively. Extensive experimental results validate the efficacy of our method over the existing state of the arts for facial expression recognition. Copyright © 2020, The Authors. All rights reserved.

关键词： Iterative methods

Image Caption via Visual Attention Switch on DenseNet

学校读者我要写书评

暂无评论

Image Caption via Visual Attention Switch on DenseNet

IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC)

作者： Yanlong Hao Jiyang Xie Zhiqing Lin Pattern Recognition and Intelligent System Lab. Beijing University of Posts and Telecommunications Beijing China

We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed densely convolutional neural network (DenseNet) to extract the feature maps. Meanwhile, the decoder uses the long short time memory (LSTM) to parse the feature maps to descriptions. We predict the next word of descriptions by taking the effective combination of feature maps with word embedding of current input word by “visual attention switch”. Finally, we compare the performance of the proposed model with other baseline models and achieve good results.

关键词： Feature extraction Decoding Visualization Switches Training Dictionaries Data models

Robust Back-Stepping Controller on SO(3) for a Quadrotor Attitude Tracking

学校读者我要写书评

暂无评论

Robust Back-Stepping Controller on SO(3) for a Quadrotor Att...

International Conference on Modelling, Identification and Control (ICMIC)

作者： Chao Liu Shengyi Yang Dacan Luo Jianqiu Zhou Faculty of key Laboratory of Pattern Recognition and Intelligent System of Guizhou Province Guizhou Minzu University

In this paper, a robust back-stepping controller evolved on SO(3) is developed for maneuvering attitude tracking of a quad rotor. The controller is developed on the configuration manifold SO(3), which avoids singularity and ambiguity existing in the traditional methods. In this controller, the back-stepping method is evolved on SO(3) to handle the nonlinearity and complexity of the quadrotor system. Besides, a sliding-mode method is adopted to defeat the external disturbances and uncertainties, which improves the robustness of the system. Besides, stability analyses are discussed and simulation results are provided to show the performance of the controller.

关键词： Mathematical model Manifolds Aerospace electronics Attitude control Uncertainty Lyapunov methods pattern recognition