检索结果-内蒙古大学图书馆

Emotional dialog generation via multiple classifiers based on a generative adversarial network

Virtual Reality & Intelligent Hardware 2021年第1期3卷 18-32页

作者： Wei CHEN Xinmiao CHEN Xiao SUN School of Computer and Information Hefei University of TechnologyHefei 230601China

Background Human-machine dialog generation is an essential topic of research in the field of natural language *** high-quality,diverse,fluent,and emotional conversation is a challenging *** on continuing advancements in artificial intelligence and deep learning,new methods have come to the forefront in recent *** particular,the end-to-end neural network model provides an extensible conversation generation framework that has the potential to enable machines to understand semantics and automatically generate ***,neural network models come with their own set of questions and *** basic conversational model framework tends to produce universal,meaningless,and relatively"safe"*** Based on generative adversarial networks(GANs),a new emotional dialog generation framework called EMC-GAN is proposed in this study to address the task of emotional dialog *** proposed model comprises a generative and three discriminative *** generator is based on the basic sequence-to-sequence(Seq2Seq)dialog generation model,and the aggregate discriminative model for the overall framework consists of a basic discriminative model,an emotion discriminative model,and a fluency discriminative *** basic discriminative model distinguishes generated fake sentences from real sentences in the training *** emotion discriminative model evaluates whether the emotion conveyed via the generated dialog agrees with a pre-specified emotion,and directs the generative model to generate dialogs that correspond to the category of the pre-specified ***,the fluency discriminative model assigns a score to the fluency of the generated dialog and guides the generator to produce more fluent *** Based on the experimental results,this study confirms the superiority of the proposed model over similar existing models with respect to emotional accuracy,fluency,and *** The proposed EMC-

关键词： Emotional dialog generation sequence-to-sequence model Emotion classification Generative adversarial networks Multiple classifiers

来源：评论

学校读者我要写书评

暂无评论

Predicting drivers' route trajectories in last-mile delivery using a pair-wise attention-based pointer neural network

引用

TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW 2023年第1期175卷

作者： Mo, Baichuan Wang, Qingyi Guo, Xiaotong Winkenbach, Matthias Zhao, Jinhua MIT Dept Civil & Environm Engn Cambridge MA 02139 USA MIT Ctr Transportat & Logist Cambridge MA 02139 USA MIT Dept Urban Studies & Planning Cambridge MA 02139 USA

In last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last -mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers' historical delivery trajectory data. In addition to the commonly used encoder-decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair -wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon's last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization -based approaches and other machine learning methods (such as the Long Short-Term Memory encoder-decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.229 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%.

关键词： Route planning Trajectory prediction sequence-to-sequence model Last-mile delivery Pointer network Attention

来源：评论

学校读者我要写书评

暂无评论

Trajectory-as-a-sequence: A novel travel mode identification framework

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2023年 146卷

作者： Zeng, Jiaqi Yu, Yi Chen, Yong Di Yang Zhang, Lei Wang, Dianhai Zhejiang Univ Coll Civil Engn & Architecture B720 Anzhong BldgZijingang Campus Hangzhou 310058 Peoples R China Zhejiang Univ Ctr Balance Architecture Hangzhou 310058 Peoples R China Shanghai AI Lab Shanghai 200232 Peoples R China Zhejiang Univ Co Ltd Architectural Design & Res Inst Hangzhou 310058 Peoples R China Alibaba Grp Alibaba Cloud Intelligence Hangzhou 310056 Peoples R China Alibaba Grp Alibaba Cloud Intelligence Sunnyvale CA 94085 USA Alibaba Zhejiang Univ Joint Res Inst Frontier Tec Hangzhou 310007 Peoples R China

Identifying travel modes from GPS tracks, as an essential technique to understand the travel behavior of a population, has received widespread interest over the past decade. While most previous Travel Mode Identification (TMI) methods separately identify the mode of each track segment of a GPS trajectory, in this paper, we propose a sequence-based TMI framework that constructs a feature sequence for each GPS trajectory and sent it to a sequence-to-sequence (seq2seq) model to obtain the corresponding travel mode label sequence, named Trajectory-as-a-sequence (TaaS). The proposed seq2seq model consists of a Convolutional Encoder (CE) and a Recurrent Conditional Random Field (RCRF), where the CE extracts high-level features from the point-level trajectory features and the RCRF learns the context information of trajectories at both feature and label levels, thus outputting accurate and reasonable travel mode label sequences. To alleviate the lack of data, we adopted a two-stage model training strategy. Additionally, we design two novel bus-related features to assist the seq2seq model distinguishing different high-speed travel modes (i.e., bus, car, and railway) in the sequence. Besides the classical performance metrics such as accuracy, we propose a new metric that evaluates the rationality of the travel mode label sequence at the trajectory level. Comprehensive evaluations corresponding to the real-world TMI applications show that the sequence-based TaaS outperforms the segment-based models in practice. Furthermore, the results of ablation studies demonstrate that the elements integrated into the TaaS framework are helpful to improve the efficiency and accuracy of TMI.

关键词： Travel mode identification GPS data sequence-to-sequence model Deep learning GIS information

来源：评论

学校读者我要写书评

暂无评论

Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition

引用

ACOUSTICAL SCIENCE AND TECHNOLOGY 2021年第6期42卷 333-343页

作者： Ueno, Sei Mimura, Masato Sakai, Shinsuke Kawahara, Tatsuya Kyoto Univ Grad Sch Informat Sakyo Ku Kyoto 6068501 Japan

sequence-to-sequence (seq2seq) automatic speech recognition (ASR) recently achieves state-of-the-art performance with fast decoding and a simple architecture. On the other hand, it requires a large amount of training data and cannot use text-only data for training. In our previous work, we proposed a method for applying text data to seq2seq ASR training by leveraging text-to-speech (TTS). However, we observe the log Mel-scale filterbank (lmfb) features produced by Tacotron 2-based model are blurry, particularly on the time dimension. This problem is mitigated by introducing the WaveNet vocoder to generate speech of better quality or spectrogram of better time-resolution. This makes it possible to train waveform-input end-to-end ASR. Here we use CNN filters and apply a masking method similar to SpecAugment. We compare the waveform-input model with two kinds of lmfb-input models: (1) lmfb features are directly generated by TTS, and (2) lmfb features are converted from the waveform generated by TTS. Experimental evaluations show the combination of waveform-output TTS and the waveform-input end-to-end ASR model outperforms the lmfb-input models in two domain adaptation settings.

关键词： Speech recognition sequence-to-sequence model Attention-based encoder-decoder model Speech synthesis Data augmentation

来源：评论

学校读者我要写书评

暂无评论

Prediction of estimated time of arrival for multi-airport systems via "Bubble"mechanism

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2023年第1期149卷

作者： Wang, Lechen Mao, Jianfeng Li, Lishuai Li, Xuechun Tu, Yilei Chinese Univ Hong Kong Sch Data Sci Shenzhen Guangdong Peoples R China Chinese Univ Hong Kong Sch Sci & Engn Shenzhen Guangdong Peoples R China Shenzhen Res Inst Big Data Shenzhen Guangdong Peoples R China Chinese Univ Hong Kong Guangdong Prov Key Lab Big Data Comp Shenzhen Guangdong Peoples R China Delft Univ Technol Fac Aerosp Engn NL-2600 AA Delft Netherlands City Univ Hong Kong Sch Data Sci Hong Kong Peoples R China Chinese Univ Hong Kong Sch Data Sci 2001 Longxiang Blvd Shenzhen 518172 Guangdong Peoples R China

Predicting Estimated Time of Arrival (ETA) for a Multi-Airport System (MAS) is much more challenging than for a single airport system because of complex air route structure, dense air traffic volume and vagaries of traffic conditions in an MAS. In this work, we propose a novel "Bubble"mechanism to accurately predict medium-term ETA for a Multi-Airport System (MAS), in which the prediction of travel time of an origin-destination (OD) pair is decomposed into two stages, termed as out-MAS and in-MAS stages. For the out-MAS stage, Auto-Regressive Integrated Moving Average (ARIMA) is used to predict the travel time of a flight to reach the MAS boundary. For the in-MAS stage, we construct new spatio-temporal features based on clustering analysis of trajectory patterns facilitated by a novel data-driven hybrid polar sampling method. A sequence-to-sequence prediction model, Multi-variate Stacked Fully connected Bidirectional Long-Short Term Memory, is further developed to achieve multi-step-ahead predictions of in-MAS travel time for each trajectory pattern using the spatio-temporal features as input. Finally, the medium-term ETA prediction for an MAS is achieved by integrating the out-MAS and in-MAS prediction with the help of trajectory pattern prediction via random forest. A case study of predicting medium-term ETA for a typical MAS in China, Guangdong-Hong Kong-Macao Greater Bay Area, is conducted to demonstrate the usage and promising performance of the proposed method in comparison to several commonly used end-to-end learning methods.

关键词： Multi-airport systems Flight estimated time of arrival Medium-term prediction Trajectory pattern clustering sequence-to-sequence model Spatio-temporal features

来源：评论

学校读者我要写书评

暂无评论

DATA AUGMENTATION FOR ASR USING TTS VIA A DISCRETE REPRESENTATION

DATA AUGMENTATION FOR ASR USING TTS VIA A DISCRETE REPRESENT...

引用

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

作者： Ueno, Sei Mimura, Masato Sakai, Shinsuke Kawahara, Tatsuya Kyoto Univ Grad Sch Informat Sakyo Ku Kyoto Japan

ISBN: (纸本)9781665437394

While end-to-end automatic speech recognition (ASR) has achieved high performance, it requires a huge amount of paired speech and transcription data for training Recently, data augmentation methods have actively been investigated. One method is to use a text-to-speech (TTS) system to generate speech data from text-only data and use the generated speech for data augmentation, but it has been found that the synthesized log Mel-scale filterbank (lmfb) features could have a serious mismatch with the real speech features. In this study, we propose a data augmentation method via a discrete speech representation. The TTS model predicts discrete ID sequences instead of lmfb features, and the ASR also uses the ID sequences as training data. We expect that the use of a discrete representation based on vq-wav2vec not only makes TTS training easier but also mitigates the mismatch with real data. Experimental evaluations show that the proposed method outperforms the data augmentation method using the conventional TTS. We found that it reduces speaker dependency, and the generated features are distributed more closely to the real ones.

关键词： Speech recognition sequence-to-sequence model Data augmentation Vq-wav2vec Speech synthesis

来源：评论

学校读者我要写书评

暂无评论

An Empathetic Conversational Agent with Attentional Mechanism

An Empathetic Conversational Agent with Attentional Mechanis...

引用

11th International Conference of Computer Communication and Informatics (ICCCI)

作者： Goel, Raman Vashisht, Sachin Dhanda, Armaan Susan, Seba Delhi Technol Univ Bawana Rd Delhi 110042 India

ISBN: (纸本)9781728158754

The number of people suffering from mental health issues like depression and anxiety have spiked enormously in recent times. Conversational agents like chatbots have emerged as an effective way for users to express their feelings and anxious thoughts and in turn obtain some empathetic reply that would relieve their anxiety. In our work, we construct two types of empathetic conversational agent models based on sequence-to-sequence modeling with and without attention mechanism. We implement the attention mechanism proposed by Bandanau et al for neural machine translation models. We train our model on the benchmark Facebook Empathetic Dialogue dataset and the BLEU scores are computed. Our empathetic conversational agent model incorporating attention mechanism generates better quality empathetic responses and is better in capturing human feelings and emotions in the conversation.

关键词： empathetic conversational agent sequence-to-sequence model attention mechanism

来源：评论

学校读者我要写书评

暂无评论

A Comparative Study on Transformer Versus sequence to sequence in Machine Translation 1st

A Comparative Study on Transformer Versus Sequence to Sequen...

引用

1st International Conference on Industrial IoT, Big Data and Supply Chain (IIoTBDSC)

作者： Jiang, Hao Zhao, Su Fang, Da Zhang, Chao Duan, Jianjin Yunnan Normal Univ Kunming Yunnan Peoples R China

ISBN: (纸本)9789813361430;9789813361416;9789813361409

Machine translation is the process of translating one natural language into another natural language. In the experiment, machine translation tasks were performed on the English to German data set and the English to Thai data set through the sequence-to-sequence model, the sequence-to-sequence model with attention mechanism and the transformer model. Through the analysis of the experimental data, it is concluded that the transformer model is not only better than the first two models in the performance of machine translation, but also the structural characteristics of the transformer model. When the data set is a relatively rare English to Thai data set, the transformer model is collected, and the impact of the result is less than the first two, which proves that the transformer model improves the quality of machine translation.

关键词： Attention BLEU score sequence-to-sequence model Transformer

来源：评论

学校读者我要写书评

暂无评论

EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION model FOR ATYPICAL SPEECH

EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SP...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Doshi, Rohan Chen, Youzheng Jiang, Liyang Zhang, Xia Biadsy, Fadi Ramabhadran, Bhuvana Chu, Fang Rosenberg, Andrew Moreno, Pedro J. Google Mountain View CA 94043 USA

ISBN: (纸本)9781728176055

We present an extended Parrotron model: a single, end-to-end network that enables voice conversion and recognition simultaneously. Input spectrograms are transformed to output spectrograms in the voice of a predetermined target speaker while also generating hypotheses in a target vocabulary. We study the performance of this novel architecture, which jointly predicts speech and text, on atypical (e.g. dysarthric) speech. We show that with as little as an hour of atypical speech, speaker adaptation can yield a 77% relative reduction in Word Error Rate (WER), measured by ASR performance on the converted speech. We also show that data augmentation using a customized synthesizer built on atypical speech can provide an additional 10% relative improvement over the best speaker-adapted model. Finally, we show how these methods generalize across 8 types of atypical speech for a range of speech impairment severities.

关键词： speech normalization voice conversion speech impairments sequence-to-sequence model speech recognition

来源：评论

学校读者我要写书评

暂无评论

TEXTUAL ECHO CANCELLATION

TEXTUAL ECHO CANCELLATION

引用

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

作者： Ding, Shaojin Jia, Ye Hu, Ke Wang, Quan Google LLC Mountain View CA 94043 USA

ISBN: (纸本)9781665437394

In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo(1) from overlapping speech recordings. Such a system can largely improve speech recognition performance and user experience for intelligent devices such as smart speakers, as the user can talk to the device while the device is still playing the TI'S signal responding to the previous query. We implement this system by using a novel sequence-to-sequence model with multi-source attention that takes both the microphone mixture signal and source text of the TTS playback as inputs, and predicts the enhanced audio. Experiments show that the textual information of the TTS playback is critical to enhancement performance. Besides, the text sequence is much smaller in size compared with the raw acoustic signal of the TTS playback, and can be immediately transmitted to the device or ASR server even before the playback is synthesized. Therefore, our proposed approach effectively reduces Internet communication and latency compared with alternative approaches such as acoustic echo cancellation (AEC).

关键词： echo cancellation multi-source attention sequence-to-sequence model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：