Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenotyping. Open Biological and Bi...
详细信息
A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlyi...
详细信息
ISBN:
(纸本)9781538656280;9781538656273
A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlying relation of emotional speech. In this work, a convolutional neural network (CNN) with audio word-based embedding is proposed for emotion modeling. In this study, vector quantization is first applied to convert the low level features of each speech frame into audio words using k-means algorithm. Word2vec is adopted to convert an input speech utterance into the corresponding audio word vector sequence. Finally, the audio word vector sequences of the training emotional speech data with emotion annotation are used to construct the CNN- based emotion model. The NCKU-ES database, containing seven emotion categories: happiness, boredom, anger, anxiety, sadness, surprise and disgust, was collected and five-fold cross validation was used to evaluate the performance of the proposed CNN-based method for speech emotion recognition. Experimental results show that the proposed method achieved an emotion recognition accuracy of 82.34%, improving by 8.7% compared to the Long Short Term Memory (LSTM)- based method, which faced the challenging issue of long input sequence. Comparing with raw features, the audio word-based embedding achieved an improvement of 3.4% for speech emotion recognition.
The Association for the Advancement of Artificial Intelligence 2020 Workshop program included twenty-three workshops covering a wide range of topics in artificial intelligence. This report contains the required report...
This study proposes a long-short term memory (LSTM)-based approach to text emotion recognition based on semantic word vector and emotional word vector of the input text. For each word in an input text, the semantic wo...
详细信息
ISBN:
(纸本)9781538653128
This study proposes a long-short term memory (LSTM)-based approach to text emotion recognition based on semantic word vector and emotional word vector of the input text. For each word in an input text, the semantic word vector is extracted from the word 2vec model. Besides, each lexical word is projected to all the emotional words defined in an affective lexicon to derive an emotional word vector. An autoencoder is then adopted to obtain the bottleneck features from the emotional word vector for dimensionality reduction. The autoencoder bottleneck features are then concatenated with the features in the semantic word vector to form the final textual features for emotion recognition. Finally, given the textual feature sequence of the entire sentence, the LSTM is used for emotion recognition by modeling the contextual emotion evolution of the input text. For evaluation, the NLPCC-MHMC-TE database containing seven emotion categories: anger, boredom, disgust, anxiety, happiness, sadness, and surprise was constructed and used. Five-fold cross-validation was employed to evaluate the performance of the proposed method. Experimental results show that the proposed LSTM-based method achieved a recognition accuracy of 70.66%, improving 5.33% compared with the CNN-based method. Besides, the proposed method based on integration of the semantic word vector and emotional word vector of the input text outperformed that using the individual feature vector.
In clinical diagnosis of mood disorder, a large portion of bipolar disorder patients (BDs) are misdiagnosed as unipolar depression (UDs). Clinicians have confirmed that BDs generally show "reduced affect''...
详细信息
ISBN:
(纸本)9781538653128
In clinical diagnosis of mood disorder, a large portion of bipolar disorder patients (BDs) are misdiagnosed as unipolar depression (UDs). Clinicians have confirmed that BDs generally show "reduced affect'' during clinical treatment. Thus, it is expected to build an objective and one-time diagnosis system for diagnosis assistance by using machine-learning techniques. In this study, facial expressions of BD, UD and control group (C) elicited by emotional video clips are collected for exploring temporal fluctuation characteristics of intensities of facial muscles expression among the three groups. The differences of facial expressions among mood disorders are investigated by observing macroscopic fluctuations. To deal with these problems, the corresponding methods for feature extraction and modeling are proposed. From the viewpoint of macroscopic facial expression, action unit (AU) is applied for describing the temporal transformation of muscles. Then, modulation spectrum is used for extracting short-term variation of AU. The multilayer perceptron (MLP)-based disorder prediction model is then applied to obtain the prediction results. For evaluation of the proposed method, 12 subjects for three group are included in the K-fold (K=12) cross validation experiments. The experiment results reached 61.1% classification accuracy, and outperformed the other baseline methods.
Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. M...
详细信息
Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications.
Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to endusers who are interested in analyzing clinical records. Although NLP has been widely used in extracting...
详细信息
Existing opinion analysis techniques rely on the clues within the sentence that focus on the sentiment analysis task itself. However, the sentiment analysis task is not isolated from other NLP tasks (co-reference reso...
详细信息
暂无评论