Speech emotion recognition (SER) is a hot topic in academia. One of the key issues in improving the performance of SER systems is the choice of speech emotion features. In order to establish a robust speech emotion re...
详细信息
Speech emotion recognition (SER) is a hot topic in academia. One of the key issues in improving the performance of SER systems is the choice of speech emotion features. In order to establish a robust speech emotion recognitionsystem, it is essential to select the features which can be a perfect representation of speech emotion attributes. Researchers has done a lot of work, proposed a variety of emotional features and made great progress. Although each kind of features were proven to be effective, most of methods are based on a single type. In this paper, we proposed a method of feature fusion based on deep learning, combining spectral-based features and pitch-based hyper-prosodic features. The experiments show that this method improves the performance of speech emotion recognitionsystem.
Recently, Convolutional Neural Networks (CNNs) have become widely used in the field of speech recognition. Their role is to mitigate the spectral shifts caused by differences between the speakers and the environment. ...
详细信息
Recently, Convolutional Neural Networks (CNNs) have become widely used in the field of speech recognition. Their role is to mitigate the spectral shifts caused by differences between the speakers and the environment. However, how to better use CNN for speech recognition has always been a problem worth exploring. This paper study the usage of CNN as the audio front end processing network based on the acoustic model architecture of Long Short-Term Memory and Connectionist Temporal Classification (LSTM-CTC). Firstly, by comparing the different inputs of the model, a Full-Mel Spectrogram feature that accords with the auditory characteristics is proposed. After determining the input features, this paper explores the principles of CNN parameters design and its role in speech recognition through the comparative experiments of different parameters. Finally, when decoding with phoneme-level language model, we achieve a test set error of 16.9% on the TIMIT phoneme recognition benchmark, which is the best known result of using the CTC as loss function.
Many researches demonstrated that the DNA methylation, which occurs in the context of a CpG, has strong correlation with diseases, including cancer. There is a strong interest in analyzing the DNA methylation data to ...
详细信息
In the design of brain-computer interface systems, classification of Electroencephalogram (EEG) signals is the essential part and a challenging task. Recently, as the marginalized discrete wavelet transform (mDWT) rep...
详细信息
A novel text-independent speaker identification (SI) method is proposed. This method uses the Mel-frequency Cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature sets to capture sp...
详细信息
In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligen...
详细信息
Recommender systems show increasingly importance with the development of E-commerce, news and multimedia applications. Traditional recommendation algorithms such as collaborative-filtering-based methods and graph-base...
详细信息
Recommender systems show increasingly importance with the development of E-commerce, news and multimedia applications. Traditional recommendation algorithms such as collaborative-filtering-based methods and graph-based methods mainly use items' original attributes and relationships between items and users, ignoring items' chronological order in browsing sessions. In recent years, RNN-based methods show their superiority when dealing with the sequential data, and some modified RNN models have been proposed. However, these RNN models only use the sequence order of items and neglect items' browsing time information. It is widely accepted that users tend to spend more time on their interested items, and these interested items are always closely related to users' current target. Based on the above view, items' browsing time is an important feature in recommendations. In this paper, we propose a modified RNN-based recommender system called TA4Rec, which can recommend the probable Item that may be clicked in the next moment. Our main contribution is to introduce a method to calculate the time-attention factors from browsing items' duration time and add time-attention factors to the RNN-based model. We conduct experiments on RecSys Challenge 2015 dataset and the result shows that TA4Rec model has gained obvious improvement on session-based recommendations than the classic session-based recommender method.
In this paper, we propose a simple yet effective approach, named Point Adversarial Self Mining (PASM), to improve the recognition accuracy in facial expression recognition. Unlike previous works focusing on designing ...
详细信息
We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed dens...
详细信息
We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed densely convolutional neural network (DenseNet) to extract the feature maps. Meanwhile, the decoder uses the long short time memory (LSTM) to parse the feature maps to descriptions. We predict the next word of descriptions by taking the effective combination of feature maps with word embedding of current input word by “visual attention switch”. Finally, we compare the performance of the proposed model with other baseline models and achieve good results.
In this paper, a robust back-stepping controller evolved on SO(3) is developed for maneuvering attitude tracking of a quad rotor. The controller is developed on the configuration manifold SO(3), which avoids singulari...
详细信息
In this paper, a robust back-stepping controller evolved on SO(3) is developed for maneuvering attitude tracking of a quad rotor. The controller is developed on the configuration manifold SO(3), which avoids singularity and ambiguity existing in the traditional methods. In this controller, the back-stepping method is evolved on SO(3) to handle the nonlinearity and complexity of the quadrotor system. Besides, a sliding-mode method is adopted to defeat the external disturbances and uncertainties, which improves the robustness of the system. Besides, stability analyses are discussed and simulation results are provided to show the performance of the controller.
暂无评论