检索结果-内蒙古大学图书馆

Deep Insights of Erroneous Bengali-English Code-Mixed Bilingual Language

IETE JOURNAL OF RESEARCH 2023年第6期69卷 3334-3345页

作者： Ganguli, Isha Bhowmick, Rajat Subhra Sil, Jaya Indian Inst Engn Sci & Technol Dept Comp Sci & Technol Sibpur India

Code-switching, the combination of more than one language within a single utterance is popular in social media sites for informal communication, generating a huge amount of data, though not fully analysed for knowledge extraction due to lack of documentation. Moreover, the code-mixed data is erroneous, and to detect and correct different types of errors, the model requires large erroneous code-mixed language data, not publicly available. This paper first defines generic rules to write Bengali-English code-mixed language in English script, considering the inherent complexity of the Bengali language. Different types of typographical and cognitive errors are induced to obtain the huge erroneous data, based on human behaviour and perception as consulted with the language experts. The errors considered here are applicable to other code-mixed Indic languages and would be beneficial for the researchers. To demonstrate the applicability of the model, we have included the errors in the Hindi-English code-mixed Indic language. An attention-based two-level deep network architecture (uses LSTM as a basic unit) is used for error detection, correction, and translation of code-mixed sentences into a monolingual sentence. Results are reported in terms of accuracy, ROUGE score, and BLEU scores at word level and sentence level for both Bengali-English and Hindi-English code-mixed languages.

关键词： attention-based encoder-decoder model Bengali-English code mixed language Code-switching language Hindi-English code mixed language LSTM Mixed language Word-level and sentence-level error

来源：评论

学校读者我要写书评

暂无评论

Gated Recurrent Context: Softmax-Free attention for Online encoder-decoder Speech Recognition

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2021年 29卷 710-719页

作者： Lee, Hyeonseung Kang, Woo Hyun Cheon, Sung Jun Kim, Hyeongju Kim, Nam Soo Seoul Natl Univ Inst New Media & Commun Dept Elect & Comp Engn Seoul 08826 South Korea

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attention approaches is that they introduce an additional hyperparameter related to the length of the attention window, requiring multiple trials of model training for tuning the hyperparameter. In order to deal with this problem, we propose a novel softmax-free attention method and its modified formulation for online attention, which does not need any additional hyperparameter at the training phase. Through a number of ASR experiments, we demonstrate the tradeoff between the latency and performance of the proposed online attention technique can be controlled by merely adjusting a threshold at the test phase. Furthermore, the proposed methods showed competitive performance to the conventional global and online attentions in terms of word-error-rates (WERs).

关键词： Hidden Markov models Decoding Training Predictive models Logic gates Speech recognition Tuning attention-based encoder-decoder model automatic speech recognition online speech recognition

来源：评论

学校读者我要写书评

暂无评论

Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition

引用

ACOUSTICAL SCIENCE AND TECHNOLOGY 2021年第6期42卷 333-343页

作者： Ueno, Sei Mimura, Masato Sakai, Shinsuke Kawahara, Tatsuya Kyoto Univ Grad Sch Informat Sakyo Ku Kyoto 6068501 Japan

Sequence-to-sequence (seq2seq) automatic speech recognition (ASR) recently achieves state-of-the-art performance with fast decoding and a simple architecture. On the other hand, it requires a large amount of training data and cannot use text-only data for training. In our previous work, we proposed a method for applying text data to seq2seq ASR training by leveraging text-to-speech (TTS). However, we observe the log Mel-scale filterbank (lmfb) features produced by Tacotron 2-based model are blurry, particularly on the time dimension. This problem is mitigated by introducing the WaveNet vocoder to generate speech of better quality or spectrogram of better time-resolution. This makes it possible to train waveform-input end-to-end ASR. Here we use CNN filters and apply a masking method similar to SpecAugment. We compare the waveform-input model with two kinds of lmfb-input models: (1) lmfb features are directly generated by TTS, and (2) lmfb features are converted from the waveform generated by TTS. Experimental evaluations show the combination of waveform-output TTS and the waveform-input end-to-end ASR model outperforms the lmfb-input models in two domain adaptation settings.

关键词： Speech recognition Sequence-to-sequence model attention-based encoder-decoder model Speech synthesis Data augmentation

来源：评论

学校读者我要写书评

暂无评论

encoder Transfer for attention-based Acoustic-to-word Speech Recognition 19

Encoder Transfer for Attention-based Acoustic-to-word Speech...

引用

19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018)

作者： Ueno, Sei Moriya, Takafumi Mimura, Masato Sakai, Shinsuke Shinohara, Yusuke Yamaguchi, Yoshikazu Aono, Yushi Kawahara, Tatsuya NTT Corp NTT Media Intelligence Labs Tokyo Japan Kyoto Univ Grad Sch Informat Sakyo Ku Kyoto Japan

ISBN: (纸本)9781510872219

Acoustic-to-word speech recognition based on attention-based encoder-decoder models achieves better accuracies with much lower latency than the conventional speech recognition systems. However, acoustic-to-word models require a very large amount of training data and it is difficult to prepare one for a new domain such as elderly speech. To address the problem, we propose domain adaptation based on transfer learning with layer freezing. Layer freezing first pre-trains a network with the source domain data, and then a part of parameters is re-trained for the target domain while the rest is fixed. In the attention based acoustic-to-word model, the encoder part is frozen to maintain the generality, and only the decoder part is re-trained to adapt to the target domain. This substantially allows for adaptation of the latent linguistic capability of the decoder to the target domain. Using a large-scale Japanese spontaneous speech corpus as source, the proposed method is applied to three target domains: a call center task and two voice search tasks by adults and by elderly. The models trained with the proposed method achieved better accuracy than the baseline models, which are trained from scratch or entirely re-trained with the target domain.

关键词： End-to-end speech recognition attention-based encoder-decoder model Adaptation Transfer learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：