Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervis...
详细信息
ISBN:
(数字)9781728193205
ISBN:
(纸本)9781728193236
Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervised model training, a data augmentation method that creates artificial mixtures by combining tracks from different songs has been shown useful in recent works. Following this light, we examine further in this paper extended data augmentation methods that consider more sophisticated mixing settings employed in the modern music production routine, the relationship between the tracks to be combined, and factors of silence. As a case study, we consider the separation of violin and piano tracks in a violin piano ensemble, evaluating the performance in terms of common metrics, namely SDR, SIR, and SAR. In addition to examining the effectiveness of these new data augmentation methods, we also study the influence of the amount of training data. Our evaluation shows that the proposed mixing-specific data augmentation methods can help improve the performance of a deep learning-based model for source separation, especially in the case of small training data.
Blind music source separation has been a popular and active subject of research in both the music information retrieval and signal processing communities. To counter the lack of available multi-track data for supervis...
详细信息
To improve the vocabulary ability is very important in language learning. Thus, if we can learn and remember a word very effectively, then we will be able to master a language more quickly. Therefore, many scholars be...
ISBN:
(数字)9781728128207
ISBN:
(纸本)9781728128214
To improve the vocabulary ability is very important in language learning. Thus, if we can learn and remember a word very effectively, then we will be able to master a language more quickly. Therefore, many scholars began to propose related research. Due to the learning mechanism of human brain, sometimes when people learn a new knowledge they may forgot at a short time. In order to make the consideration more complete, after analyzing Hermann Ebbinghaus's forgetting curve experiment, we added two variables, one is the acceptance of each word by the same person, and the other is the ability of different people to remember vocabulary. With the above two parameters, we want to design a system to help user to review the vocabulary which may be forgetting before. The forgetting curve can be personalized, and it is more accurate to calculate the best time for each user to review the vocabulary.
This study proposes a long-short term memory (LSTM)-based approach to text emotion recognition based on semantic word vector and emotional word vector of the input text. For each word in an input text, the semantic wo...
详细信息
ISBN:
(纸本)9781538653128
This study proposes a long-short term memory (LSTM)-based approach to text emotion recognition based on semantic word vector and emotional word vector of the input text. For each word in an input text, the semantic word vector is extracted from the word 2vec model. Besides, each lexical word is projected to all the emotional words defined in an affective lexicon to derive an emotional word vector. An autoencoder is then adopted to obtain the bottleneck features from the emotional word vector for dimensionality reduction. The autoencoder bottleneck features are then concatenated with the features in the semantic word vector to form the final textual features for emotion recognition. Finally, given the textual feature sequence of the entire sentence, the LSTM is used for emotion recognition by modeling the contextual emotion evolution of the input text. For evaluation, the NLPCC-MHMC-TE database containing seven emotion categories: anger, boredom, disgust, anxiety, happiness, sadness, and surprise was constructed and used. Five-fold cross-validation was employed to evaluate the performance of the proposed method. Experimental results show that the proposed LSTM-based method achieved a recognition accuracy of 70.66%, improving 5.33% compared with the CNN-based method. Besides, the proposed method based on integration of the semantic word vector and emotional word vector of the input text outperformed that using the individual feature vector.
Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. M...
详细信息
Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to judge whether the paired data is synchronous/corresponding or not, which can constrain the latent space of generators in the GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound) from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve semi-supervised learning, which makes our model more flexible for practical applications.
In clinical diagnosis of mood disorder, a large portion of bipolar disorder patients (BDs) are misdiagnosed as unipolar depression (UDs). Clinicians have confirmed that BDs generally show "reduced affect''...
详细信息
ISBN:
(纸本)9781538653128
In clinical diagnosis of mood disorder, a large portion of bipolar disorder patients (BDs) are misdiagnosed as unipolar depression (UDs). Clinicians have confirmed that BDs generally show "reduced affect'' during clinical treatment. Thus, it is expected to build an objective and one-time diagnosis system for diagnosis assistance by using machine-learning techniques. In this study, facial expressions of BD, UD and control group (C) elicited by emotional video clips are collected for exploring temporal fluctuation characteristics of intensities of facial muscles expression among the three groups. The differences of facial expressions among mood disorders are investigated by observing macroscopic fluctuations. To deal with these problems, the corresponding methods for feature extraction and modeling are proposed. From the viewpoint of macroscopic facial expression, action unit (AU) is applied for describing the temporal transformation of muscles. Then, modulation spectrum is used for extracting short-term variation of AU. The multilayer perceptron (MLP)-based disorder prediction model is then applied to obtain the prediction results. For evaluation of the proposed method, 12 subjects for three group are included in the K-fold (K=12) cross validation experiments. The experiment results reached 61.1% classification accuracy, and outperformed the other baseline methods.
A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlyi...
详细信息
ISBN:
(纸本)9781538656280;9781538656273
A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlying relation of emotional speech. In this work, a convolutional neural network (CNN) with audio word-based embedding is proposed for emotion modeling. In this study, vector quantization is first applied to convert the low level features of each speech frame into audio words using k-means algorithm. Word2vec is adopted to convert an input speech utterance into the corresponding audio word vector sequence. Finally, the audio word vector sequences of the training emotional speech data with emotion annotation are used to construct the CNN- based emotion model. The NCKU-ES database, containing seven emotion categories: happiness, boredom, anger, anxiety, sadness, surprise and disgust, was collected and five-fold cross validation was used to evaluate the performance of the proposed CNN-based method for speech emotion recognition. Experimental results show that the proposed method achieved an emotion recognition accuracy of 82.34%, improving by 8.7% compared to the Long Short Term Memory (LSTM)- based method, which faced the challenging issue of long input sequence. Comparing with raw features, the audio word-based embedding achieved an improvement of 3.4% for speech emotion recognition.
This study explores the spatial-temporal patterns of particulate matter (PM) in Taiwan. Probability map of PM and daily patterns are discussed in this study. Data mining provides more detailed spatial-temporal informa...
详细信息
This study explores the spatial-temporal patterns of particulate matter (PM) in Taiwan. Probability map of PM and daily patterns are discussed in this study. Data mining provides more detailed spatial-temporal information for PM variations and trends. The proposed model will show that data mining provides a relatively high goodness of fit and sufficient space-time explanatory power, particularly air pollution frequency and affect areas. In the proposed model, a method using Dynamic Time Warping is proposed to analyse temporal similarity between stations. The proposed model can eliminate global effect on a single station through the performance of multiple stations. The proposed model will further be used for prediction of PM2.5. The prediction results will discuss the spatial-temporal relations between stations. This study will investigate the distribution of PM and its cyclicality.
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is r...
详细信息
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.
暂无评论