Radar sounders (RSs) are nadir-looking sensors operating in high frequency (HF) or very high frequency (VHF) bands that profile subsurface targets to retrieve miscellaneous scientific information. Due to the complex e...
详细信息
Radar sounders (RSs) are nadir-looking sensors operating in high frequency (HF) or very high frequency (VHF) bands that profile subsurface targets to retrieve miscellaneous scientific information. Due to the complex electromagnetic interaction between backscattered returns, the interpretation of RS data is challenging. The investigations of ice-sheet subsurface structures require automatic techniques to account for both the sequential spatial distribution of subsurface targets and relevant statistical properties embedded in RS signals. Automatic techniques exist for characterizing these targets either related to probabilistic inference models or convolutional neural network (CNN) deep learning methods. Unfortunately, CNN-based methods capture local spatial context and merely model the global spatial context. In contrast to CNN, the transformer-based models are reliable architectures for capturing long-range sequence-to-sequence global spatial contextual prior. Motivated by the aforementioned fact, we propose a novel transformer-based semantic segmentation architecture named TransSounder to effectively encode the sequential structures of the RS signals. The TransSounder was constructed on a hybrid TransUNet-TransFuse architectural framework to systematically augment the modules from TransUNet and TransFuse architectures. Experimental results obtained using the Multichannel Coherent Radar Depth Sounder (MCoRDS) dataset confirms the robustness and capability of transformers to accurately characterize the different subsurface targets.
A chatbot is a software that is able to autonomously communicate with a human being through text and due to its usefulness, an increasing number of businesses are implementing such tools in order to provide timely com...
详细信息
ISBN:
(纸本)9783030182403;9783030182397
A chatbot is a software that is able to autonomously communicate with a human being through text and due to its usefulness, an increasing number of businesses are implementing such tools in order to provide timely communication to their clients. In the past, whilst literature has focused on implementing innovative chatbots and the evaluation of such tools, limited studies have been done to critically comparing such conversational systems. In order to address this gap, this study critically compares the Artificial Intelligence Mark-up Language (AIML), and sequence-to-sequence models for building chatbots. In this endeavor, two chatbots were developed to implement each model and were evaluated using a mixture of glass box and black box evaluation, based on 3 metrics, namely, user's satisfaction, the information retrieval rate, and the task completion rate of each chatbot. Results showed that the AIML chatbot ensured better user satisfaction, and task completion rate, while the sequence-to-sequence model had better information retrieval rate.
Neural sequence-to-sequence (seq2seq) models have been widely used in abstractive summarization tasks. One of the challenges of this task is redundant contents in the input document often confuses the models and leads...
详细信息
Neural sequence-to-sequence (seq2seq) models have been widely used in abstractive summarization tasks. One of the challenges of this task is redundant contents in the input document often confuses the models and leads to poor performance. An efficient way to solve this problem is to select salient information from the input document. In this paper, we propose an approach that incorporates word attention with multilayer convolutional neural networks (CNNs) to extend a standard seq2seq model for abstractive summarization. First, by concentrating on a subset of source words during encoding an input sentence, word attention is able to extract informative keywords in the input, which gives us the ability to interpret generated summaries. Second, these keywords are further distilled by multilayer CNNs to capture the coarse-grained contextual features of the input sentence. Thus, the combined word attention and multilayer CNNs modules provide a better-learned representation of the input document, which helps the model generate interpretable, coherent and informative summaries in an abstractive summarization task. We evaluate the effectiveness of our model on the English Gigaword, DUC2004 and Chinese summarization dataset LCSTS. Experimental results show the effectiveness of our approach.
In this paper, we propose a framework for environmental sound synthesis from onomatopoeic words. As one way of expressing an environmental sound, we can use an onomatopoeic word, which is a character sequence for phon...
详细信息
In this paper, we propose a framework for environmental sound synthesis from onomatopoeic words. As one way of expressing an environmental sound, we can use an onomatopoeic word, which is a character sequence for phonetically imitating a sound. An onomatopoeic word is effective for describing diverse sound features. Therefore, the use of onomatopoeic words as input for environmental sound synthesis will enable us to generate diverse sounds. To generate diverse sounds, we propose a method based on a sequence-to-sequence framework for synthesizing environmental sounds from onomatopoeic words. We also propose a method of environmental sound synthesis using onomatopoeic words and sound event labels. The use of sound event labels in addition to onomatopoeic words enables us to capture each sound event's feature depending on the input sound event label. Our subjective experiments show that our proposed methods achieve higher diversity and naturalness than conventional methods using sound event labels.
An important aspect of developing dialogue agents involves endowing a conversation system with emotion perception and interaction. Most existing emotion dialogue models lack the adaptability and extensibility of diffe...
详细信息
An important aspect of developing dialogue agents involves endowing a conversation system with emotion perception and interaction. Most existing emotion dialogue models lack the adaptability and extensibility of different scenes because of their limitation to require a specified emotion category or their reliance on a fixed emotional dictionary. To overcome these limitations, we propose a neural conversation generation with auxiliary emotional supervised model (nCG-FSM) comprising a sequence-to-sequence (Seq2Seq) generation model and an emotional classifier used as an auxiliary model. The emotional classifier was trained to predict the emotion distributions of the dialogues, which were then used as emotion supervised signals to guide the generation model to generate diverse emotional responses. The proposed nCG-ESM is flexible enough to generate responses with emotional diversity, including specified or unspecified emotions, which can be adapted and extended to different scenarios. We conducted extensive experiments on the popular dataset of Weibo post-response pairs. Experimental results showed that the proposed model was capable of producing more diverse, appropriate, and emotionally rich responses, yielding substantial gains in diversity scores and human evaluations.
Soft sensors attempt to predict the key quality variables that are infrequently available using the sensor and manipulated variables that are readily available. Since only limited amount of labeled data are available,...
详细信息
Soft sensors attempt to predict the key quality variables that are infrequently available using the sensor and manipulated variables that are readily available. Since only limited amount of labeled data are available, there is always the concern whether the underlying physics were captured so that the model can be reasonably extrapolated. A sequence-to-sequence model in the form of a nonlinear state-observer & x002F;encoder and predictor & x002F;decoder was proposed. The observer can be trained using a large amount of unlabeled data, but in a supervised manner in which the process dynamics is tracked. The encoder output and manipulated variables are used to train the quality predictor. The model is applied to the product impurity predictions of an industrial column. Results show that good predictions and excellent consistency in the sign of estimated gains can be achieved even with limited amount of data. These findings indicated that the proposed sequence-to-sequence data-driven approach is able to capture the underlying physics of the process.
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-seq...
详细信息
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manually annotated and complicated linguistic features such as part-of-speech tags and syntactic structures for system training. However, it must be carefully designed and well optimized so that it can implicitly extract useful linguistic features from the input features. In this paper we investigate under what conditions the neural sequence-to-sequence TTS can work well in Japanese and English along with comparisons with deep neural network (DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline systems also use neural autoregressive (AR) probabilistic modeling and a neural vocoder in the same way as the sequence-to-sequence systems do for a fair and deep analysis in this paper. We investigated systems from three aspects: a) model architecture, b) model parameter size, and c) language. For the model architecture aspect, we adopt modified Tacotron systems that we previously proposed and their variants using an encoder from Tacotron or Tacotron2. For the model parameter size aspect, we investigate two model parameter sizes. For the language aspect, we conduct listening tests in both Japanese and English to see if our findings can be generalized across languages. Our experiments on Japanese demonstrated that the Tacotron TTS systems with increased parameter size and input of phonemes and accentual type labels outperformed the DNN-based pipeline systems using the complicated linguistic features and that its encoder could learn to compensate for a lack of rich linguistic features. Our experiments on English demonstrated that, when using a suitable encoder, the Tacotron TTS system with characters as input can disambiguate pronunciations and produce natural speech as good as those of the systems using phoneme
Labanotation is an important notation system for recording dances. Automatically generating Labanotation scores from motion capture data has attracted more interest in recent years. Current methods usually focus on in...
详细信息
ISBN:
(纸本)9781509066315
Labanotation is an important notation system for recording dances. Automatically generating Labanotation scores from motion capture data has attracted more interest in recent years. Current methods usually focus on individual movement segments and generate Labanotation symbols one by one. This requires segmenting the captured data sequence in advance. Manual segmentation will consume a lot of time and effort, while automatic segmentation may not be reliable enough. In this paper, we propose a sequence-to-sequence approach that can generate Labanotation scores from unsegmented motion data sequences. First, we extract effective features from motion capture data based on body skeleton analysis. Then, we train a neural network under the encoder-decoder architecture to transform the motion feature sequences to corresponding Labanotation symbols. As such, the dance score is generated. Experiments show that the proposed method performs favorably against state-of-the-art algorithms in the automatic Labanotation generation task.
Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including speech synthesis. Training these models can be difficult. The standard approa...
详细信息
ISBN:
(纸本)9781713820697
Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including speech synthesis. Training these models can be difficult. The standard approach guides a model with the reference output history during training. However during synthesis the generated output history must be used. This mismatch can impact performance. Several approaches have been proposed to handle this, normally by selectively using the generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with the generated output history and reference attention. This approach reduces the training-evaluation mismatch without the need for a schedule or a classifier. Additionally, for standard training approaches, the frame rate is often reduced to prevent models from copying the output history. As attention forcing does not feed the reference output history to the model, it allows using a higher frame rate, which improves the speech quality. Finally, attention forcing allows the model to generate output sequences aligned with the references, which is important for some down-stream tasks such as training neural vocoders. Experiments show that attention forcing allows doubling the frame rate, and yields significant gain in speech quality.
Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model's initial success at a given task...
详细信息
ISBN:
(纸本)9781450367684
Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model's initial success at a given task does not necessarily mean it is well-suited for the task. In this work, we examine a common example of this phenomenon: the conceit that "software patching is like language translation". We demonstrate empirically that there are subtle, but critical distinctions between sequence-to-sequence models and translation model: while program repair benefits greatly from the former, general modeling architecture, it actually suffers from design decisions built into the latter, both in terms of translation accuracy and diversity. Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions. Our findings also lend strong support to the recent trend towards synthesizing edits of code conditional on the buggy context, to repair bugs. We implement such models ourselves as "proof-of-concept" tools and empirically confirm that they behave in a fundamentally different, more effective way than the studied translation-based architectures. Overall, our results demonstrate the merit of studying the intricacies of machine learned models in software engineering: not only can this help elucidate potential issues that may be overshadowed by increases in accuracy;it can also help innovate on these models to raise the state-of-the-art further. We will publicly release our replication data and materials at https://***/ARiSE- Lab/Patch-as-translation.
暂无评论