Nonprehensile manipulation is necessary for robots to operate in humans' daily lives. As nonprehensile manipulation should satisfy both kinematics and dynamics requirements simultaneously, it is difficult to manip...
详细信息
Nonprehensile manipulation is necessary for robots to operate in humans' daily lives. As nonprehensile manipulation should satisfy both kinematics and dynamics requirements simultaneously, it is difficult to manipulate objects along given paths. Previous studies have considered the problems with sequence-to-sequence models, which are neural networks for time-series conversion. However, they did not consider nonlinear contact models, such as friction models. When we train the seq2seq models using end-to-end backpropagation, training losses vanish owing to static friction. In this letter, we realize sequence-to-sequence models for trajectory planning of nonprehensile manipulation including contact models between the robots and target objects. This letter proposes a training curriculum that commences training without contact models to bring the seq2seq models outside of the gradient-vanishing zone. This letter discusses sliding manipulation, which includes a friction model between objects and tools, such as frying pans fixed onto the robots. We validated the proposed curriculum through a simulation. In addition, we observed that the trained seq2seq models could handle parameter fluctuations that did not exist during training.
Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP). Recently, the sequence-to-sequence (seq2seq) model has been widely used in NLP task. Different from the general NLP task, 60% senten...
详细信息
ISBN:
(纸本)9783030161484;9783030161477
Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP). Recently, the sequence-to-sequence (seq2seq) model has been widely used in NLP task. Different from the general NLP task, 60% sentences in the NER task do not contain entities. Traditional seq2seq method cannot address this issue effectively. To solve the aforementioned problem, we propose a novel seq2seq model, named SC-NER, for NER task. We construct a classifier between the encoder and decoder. In particular, the classifier's input is the last hidden state of the encoder. Moreover, we present the restricted beam search to improve the performance of the proposed SC-NER. To evaluate our proposed model, we construct the patent documents corpus in the communications field, and conduct experiments on it. Experimental results show that our SC-NER model achieves better performance than other baseline methods.
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The netw...
详细信息
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.
Ensuring the health and safety of independent-living senior citizens is a growing societal concern. Researchers have developed sensor based systems to monitor senior citizens' Activity of Daily Living (ADL), a set...
详细信息
Ensuring the health and safety of independent-living senior citizens is a growing societal concern. Researchers have developed sensor based systems to monitor senior citizens' Activity of Daily Living (ADL), a set of daily activities that can indicate their self-caring ability. However, most ADL monitoring systems are designed for one specific sensor modality, resulting in less generalizable models that is not flexible to account variations in real life monitoring settings. Current classic machine learning and deep learning methods do not provide a generalizable solution to recognize complex ADLs for different sensor settings. This study proposes a novel sequence-to-sequence model based deep-learning framework to recognize complex ADLs leveraging an activity state representation. The proposed activity state representation integrated motion and environment sensor data without labor-intense feature engineering. We evaluated our proposed framework against several state-of-the-art machine leaming and deep learning benchmarks. Overall, our approach outperformed baselines in most performance metrics, accurately recognized complex ADLs from different types of sensor input. This framework can generalize to different sensor settings and provide a viable approach to understand senior citizen's daily activity patterns with smart home health monitoring systems.
Load forecasting plays a critical part in grid operation and planning. In particular, the importance of multistep load forecasting for individual power customer is increasingly prominent. Due to the strong volatility ...
详细信息
ISBN:
(纸本)9781728154145
Load forecasting plays a critical part in grid operation and planning. In particular, the importance of multistep load forecasting for individual power customer is increasingly prominent. Due to the strong volatility of individual consumers' electricity consumption behavior, traditional machine learning methods that cannot capture time dependence are difficult to obtain good prediction results. The recurrent neural network (RNN) can capture the time correlations existing in the load data, and the sequence to sequence (Seq2Seq) model combining two RNNs of the encoder and decoder is very suitable for multistep prediction. The temporal pattern attention mechanism can further capture the periodic change pattern in historical load data, which further improves time series modeling. We combined their advantages to propose a new type of multistep individual load forecasting framework, called the temporal pattern attention based sequence to sequence (TPA-Seq2Seq) model. This model can overcome the difficulty of multi-step prediction and further capture the load change pattern. The proposed framework was tested on real residential smart meter data, the results show that the proposed model has good prediction accuracy and is well suited for longer prediction sequences.
Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-ma...
详细信息
ISBN:
(纸本)9781538646588
Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-machine communication, searching audio archives for acoustic events, and abnormality detection based on sounds. This paper proposes a novel method for sound word generation from audio signals. The method is based on an end-to-end, sequence-to-sequence framework to solve the audio segmentation problem to find an appropriate segment of audio signals along time that corresponds to a sequence of phonemes, and the ambiguity problem, where multiple words may correspond to the same sound, depending on the situations or listeners. Our tests show that the method worked efficiently and achieved a 2.8 % mean phoneme error rate (MPER) and a 7.2 % word error rate (WER) in a sound word generation task.
A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge t...
详细信息
ISBN:
(纸本)9781538643341
A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the encoder and decoder modules and improves model performance in many tasks. In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module. First, we maintain the history of the location and the expected context from several previous time-steps. Second, we apply multiscale convolution from several previous attention vectors to the current decoder state. We utilized our proposed framework for sequence-to-sequence speech recognition and text-to-speech systems. The results reveal that our proposed extension can improve performance significantly compared to a standard attention baseline.
Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-ma...
详细信息
ISBN:
(纸本)9781538646595
Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-machine communication, searching audio archives for acoustic events, and abnormality detection based on sounds. This paper proposes a novel method for sound word generation from audio signals. The method is based on an end-to-end, sequence-to-sequence framework to solve the audio segmentation problem to find an appropriate segment of audio signals along time that corresponds to a sequence of phonemes, and the ambiguity problem, where multiple words may correspond to the same sound, depending on the situations or listeners. Our tests show that the method worked efficiently and achieved a 2.8% mean phoneme error rate (MPER) and a 7.2% word error rate (WER) in a sound word generation task.
Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed ...
详细信息
ISBN:
(数字)9781538646588
ISBN:
(纸本)9781538646595
Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately trained language model at each step of the beam search. In this work, we investigate the behavior of shallow fusion across a range of conditions: different types of language models, different decoding units, and different tasks. On Google Voice Search, we demonstrate that the use of shallow fusion with an neural LM with wordpieces yields a 9.1% relative word error rate reduction (WERR) over our competitive attention-based sequence-to-sequence model, obviating the need for second-pass rescoring.
Encouraged by recent waves of successful applications of deep learning, some researchers have demonstrated the effectiveness of applying convolutional neural networks (CNN) to time series classification problems. Howe...
详细信息
ISBN:
(纸本)9781509059102
Encouraged by recent waves of successful applications of deep learning, some researchers have demonstrated the effectiveness of applying convolutional neural networks (CNN) to time series classification problems. However, CNN and other traditional methods require the input data to be of the same dimension which prevents its direct application on data of various lengths and multi-channel time series with different sampling rates across channels. Long short-term memory (LSTM), another tool in the deep learning arsenal and with its design nature, is more appropriate for problems involving time series such as speech recognition and language translation. In this paper, we propose a novel model incorporating a sequence-to-sequence model that consists two LSTMs, one encoder and one decoder. The encoder LSTM accepts input time series of arbitrary lengths, extracts information from the raw data and based on which the decoder LSTM constructs fixed length sequences that can be regarded as discriminatory features. For better utilization of the raw data, we also introduce the attention mechanism into our model so that the feature generation process can peek at the raw data and focus its attention on the part of the raw data that is most relevant to the feature under construction. We call our model S2SwA, as the short for sequence-to-sequence with Attention. We test S2SwA on both uni-channel and multi-channel time series datasets and show that our model is competitive with the state-of-the-art in real world tasks such as human activity recognition.
暂无评论