This paper investigates a new voice conversion technique using phone-aware Long Short-Term Memory Recurrent Neural Networks(LSTM-RNNs). Most existing voice conversion methods, including Joint Density Gaussian Mixtur...
详细信息
This paper investigates a new voice conversion technique using phone-aware Long Short-Term Memory Recurrent Neural Networks(LSTM-RNNs). Most existing voice conversion methods, including Joint Density Gaussian Mixture Models(JDGMMs), Deep Neural Networks(DNNs)and Bidirectional Long Short-Term Memory Recurrent Neural Networks(BLSTM-RNNs), only take acoustic information of speech as features to train models. We propose to incorporate linguistic information to build voice conversion system by using monophones generated by a speech recognizer as linguistic features. The monophones and spectral features are combined together to train LSTM-RNN based voice conversion models,reinforcing the context-dependency modelling of *** results of the 1st voice conversion challenge shows our system achieves significantly higher performance than baseline(GMM method) and was found among the most competitive scores in similarity test. Meanwhile, the experimental results show phone-aware LSTM-RNN method obtains lower Melcepstral distortion and higher MOS scores than the baseline LSTM-RNNs.
Artificial neural networks(ANN) have been used in many applications such like handwriting recognition and speech recognition. It is well-known that learning rate is a crucial value in the training procedure for artifi...
详细信息
Artificial neural networks(ANN) have been used in many applications such like handwriting recognition and speech recognition. It is well-known that learning rate is a crucial value in the training procedure for artificial neural networks. It is shown that the initial value of learning rate can confoundedly affect the final result and this value is always set manually in practice. A new parameter called beta stabilizer has been introduced to reduce the sensitivity of the initial learning rate. But this method has only been proposed for deep neural network(DNN) with sigmoid activation function. In this paper we extended beta stabilizer to long short-term memory(LSTM) and investigated the effects of beta stabilizer parameters on different models, including LSTM and DNN with relu activation *** is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM. However, it is shown that the effects of beta stabilizer on DNN with relu activation function and LSTM are fewer than the effects on DNN with sigmoid activation function.
In human-robot teaming, interpretation of human actions, recognition of new situations, and appropriate decision making are crucial abilities for cooperative robots ("co-robots") to interact intelligently wi...
详细信息
ISBN:
(纸本)9781467380270
In human-robot teaming, interpretation of human actions, recognition of new situations, and appropriate decision making are crucial abilities for cooperative robots ("co-robots") to interact intelligently with humans. Given an observation, it is important that human activities are interpreted the same way by co-robots as human peers so that robot actions can be appropriate to the activity at hand. A novel interpretability indicator is introduced to address this issue. When a robot encounters a new scenario, the pretrained activity recognition model, no matter how accurate in a known situation, may not produce the correct information necessary to act appropriately and safely in new situations. To effectively and safely interact with people, we introduce a new generalizability indicator that allows a co-robot to self-reflect and reason about when an observation falls outside the co-robot's learned model. Based on topic modeling and the two novel indicators, we propose a new Self-reflective Risk-aware Artificial Cognitive (SRAC) model, which allows a robot to make better decisions by incorporating robot action risks and identifying new situations. Experiments both using real-world datasets and on physical robots suggest that our SRAC model significantly outperforms the traditional methodology and enables better decision making in response to human behaviors.
BACKGROUND: The increasing amount of available data in digital working environments raise considerable usability challenges. Beyond the trend for automation of such processes, strategic decisions still depend on human...
详细信息
The development of the digital games industry has motivated game console makers to provide better gamepads for gamers. As gamepads provide the interaction between digital games and gamers, it is important to understan...
详细信息
We present a movable spatial augmented reality (SAR) system that can be easily installed in a user workspace. The proposed system aims to dynamically cover a wider projection area using a portable projector attached t...
详细信息
The CD-DNN-HMM system has became the state-of-art system for large vocabulary continuous speech recognition (LVCSR) tasks, in which deep neural networks (DNN) plays a key role. However, DNN training suffers from the v...
详细信息
ISBN:
(纸本)9781479999897
The CD-DNN-HMM system has became the state-of-art system for large vocabulary continuous speech recognition (LVCSR) tasks, in which deep neural networks (DNN) plays a key role. However, DNN training suffers from the vanishing gradient problem, limiting training of deep models. In this work, we address this problem by incorporating the successful long-short term memory (LSTM) structure, which has been proposed to help recurrent neural network (RNN) to remember long term dependencies, into DNN. Also, we propose a generalized formulation of the LSTM block, which we name general LSTM(GLSTM). In our experiments, it is shown that our proposed (G)LSTM-DNN scales well with more layers, and achieves 8.2% relative word error rate reduction on the 2000-hour Switchboard data set.
Most existing bilingual embedding methods for Statistical Machine Translation (SMT) suffer from two obvious drawbacks. First, they only focus on simple context such as word count and cooccurrence in document or slidin...
详细信息
Most existing bilingual embedding methods for Statistical Machine Translation (SMT) suffer from two obvious drawbacks. First, they only focus on simple context such as word count and cooccurrence in document or sliding window to build word embedding, ignoring latent useful information from selected context. Second, word sense but not word form is supposed to be the minimal semantic unit while most existing works are still for word representation. This paper presents Bilingual Graph-based Semantic Model (BGSM) to alleviate such shortcomings. By means of maximum complete sub-graph (clique) for context selection, BGSM is capable of effectively modeling word sense representation instead of the word form itself. The proposed model is applied to phrase pair translation probability estimation and generation for SMT. The empirical results show that BGSM can enhance SMT both in performance (up to +1.3 BLEU) and efficiency in comparison against existing methods.
This work aims at an evaluation of vehicle-to-infrastructure (V2X)-technology through the users' perspective. The technical opportunities of connected vehicles are affected by the acceptance of the technology and ...
详细信息
ISBN:
(纸本)9781509018222
This work aims at an evaluation of vehicle-to-infrastructure (V2X)-technology through the users' perspective. The technical opportunities of connected vehicles are affected by the acceptance of the technology and possible draw-backs on the privacy and data-security side. With a three-tiered research approach, this work identified beforehand argument lines in focus group discussions, which enabled a quantitative approach to evaluate positively and negatively perceived features of V2X-technology. Also gender related differences can be displayed. Further, the results of the second quantitative study indicate that although users who already have experience with driver assistance systems are more willing to share (personal) data to use V2X-technology, the overall sample is very reserved with respect to sharing driver-related data. Future research on user diversity and cultural differences is outlined.
暂无评论