检索结果-内蒙古大学图书馆

attention-based encoder-decoder End-to-End Neural Diarization With Embedding Enhancer

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2024年 32卷 1636-1649页

作者： Chen, Zhengyang Han, Bing Wang, Shuai Qian, Yanmin Shanghai Jiao Tong Univ Dept Comp Sci & Engn Auditory Cognit & Computat Acoust Lab Shanghai 200240 Peoples R China Shanghai Jiao Tong Univ AI Inst MoE Key Lab Artificial Intelligence Shanghai 200240 Peoples R China Chinese Univ Hong Kong Shenzhen Res Inst Big Data Shenzhen 518172 Peoples R China

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while target speaker voice activity detection (TS-VAD) systems tend to be overly complex. In this paper, we propose a simple attention-based encoder-decoder network for end-to-end neural diarization (AED-EEND). In our training process, we introduce a teacher-forcing strategy to address the speaker permutation problem, leading to faster model convergence. For evaluation, we propose an iterative decoding method that outputs diarization results for each speaker sequentially. Additionally, we propose an Enhancer module to enhance the frame-level speaker embeddings, enabling the model to handle scenarios with an unseen number of speakers. We also explore replacing the transformer encoder with a Conformer architecture, which better models local information. Furthermore, we discovered that commonly used simulation datasets for speaker diarization have a much higher overlap ratio compared to real data. We found that using simulated training data that is more consistent with real data can achieve an improvement in consistency. Extensive experimental validation demonstrates the effectiveness of our proposed methodologies. Our best system achieved a new state-of-the-art diarization error rate (DER) performance on all the CALLHOME(10.08%), DIHARD II (24.64%), and AMI(13.00%) evaluation benchmarks when overlap is considered and no oracle voice activity detection (VAD) is used. Beyond speaker diarization, our AED-EEND system also shows remarkable competitiveness as a speech type detection model.

关键词： Neural speaker diarization attention-based encoder-decoder CALLHOME AMI DIHARD iterative decoding

来源：评论

学校读者我要写书评

暂无评论

HYBRID attention-based encoder-decoder MODEL FOR EFFICIENT LANGUAGE MODEL ADAPTATION

HYBRID ATTENTION-BASED ENCODER-DECODER MODEL FOR EFFICIENT L...

引用

2024 Spoken Language Technology Workshop

作者： Ling, Shaoshi Ye, Guoli Zhao, Rui Gong, Yifan Microsoft Cloud & AI Redmond WA 98052 USA

ISBN: (纸本)9798350392265;9798350392258

The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effective, quick and inexpensive adaptation with text input has become a primary concern for deploying AED systems in the industry. To address this issue, we propose a novel model, the hybrid attention-based encoder-decoder (HAED) speech recognition model that preserves the modularity of conventional hybrid automatic speech recognition systems. Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques. We demonstrate that the proposed HAED model yields 23% relative Word Error Rate (WER) improvements when out-of-domain text data is used for language model adaptation, with only a minor degradation in WER on a general test set compared with the conventional AED model.

关键词： speech recognition attention-based encoder-decoder language modeling text adaptation

来源：评论

学校读者我要写书评

暂无评论

Investigating Methods to Improve Language Model Integration for attention-based encoder-decoder ASR Models 22

Investigating Methods to Improve Language Model Integration ...

引用

Interspeech Conference

作者： Zeineldeen, Mohammad Glushko, Aleksandr Michel, Wilfried Zeyer, Albert Schlueter, Ralf Ney, Hermann Rhein Westfal TH Aachen Comp Sci Dept Human Language Technol & Pattern Recognit D-52074 Aachen Germany AppTek GmbH D-52062 Aachen Germany

ISBN: (纸本)9781713836902

attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

关键词： speech recognition language model integration attention-based encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Enhancing E-commerce recommendations with sentiment analysis using MLA-EDTCNet and collaborative filtering

引用

SCIENTIFIC REPORTS 2025年第1期15卷 1-16页

作者： Krishna, E. S. Phalguna Ramu, T. Bhargava Chaitanya, R. Krishna Ram, M. Sitha Balayesu, Narasimhula Gandikota, Hari Prasad Jagadesh, B. N. GITAM Univ GITAM Sch Technol Dept Comp Sci & Engn Bengaluru Campus Bengaluru India MLR Inst Technol Dept Elect & Elect Engn Hyderabad 500043 Telangana India SRKR Engn Coll Dept ECE Bhimavaram India SRM Univ Sch Engn & Sci Dept Comp Sci & Engn Amaravati Andhra Pradesh India Vasireddy Venkatadri Inst Technol Dept Comp Sci & Engn AIML Guntur India Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Hyderabad 500075 Telangana India VIT AP Univ Sch Comp Sci & Engn Vijayawada 522237 India

The rapid growth of e-commerce has made product recommendation systems essential for enhancing customer experience and driving business success. This research proposes an advanced recommendation framework that integrates sentiment analysis (SA) and collaborative filtering (CF) to improve recommendation accuracy and user satisfaction. The methodology involves feature-level sentiment analysis with a multi-step pipeline: data preprocessing, feature extraction using a log-term frequency-based modified inverse class frequency (LFMI) algorithm, and sentiment classification using a Multi-Layer attention-based encoder-decoder Temporal Convolution Neural Network (MLA-EDTCNet). To address class imbalance issues, a Modified Conditional Generative Adversarial Network (MCGAN) generates balanced oversamples. Furthermore, the Ocotillo Optimization Algorithm (OcOA) fine-tunes the model parameters to ensure optimal performance by balancing exploration and exploitation during training. The integrated system predicts sentiment polarity-positive, negative, or neutral-and combines these insights with CF to provide personalized product recommendations. Extensive experiments conducted on an Amazon product dataset demonstrate that the proposed approach outperforms state-of-the-art models in accuracy, precision, recall, F1-score, and AUC. By leveraging SA and CF, the framework delivers recommendations tailored to user preferences while enhancing engagement and satisfaction. This research highlights the potential of hybrid deep learning techniques to address critical challenges in recommendation systems, including class imbalance and feature extraction, offering a robust solution for modern e-commerce platforms.

关键词： Sentiment analysis Modified conditional generative adversarial network Modified inverse class frequency algorithm Ocotillo optimization algorithm attention-based encoder-decoder Collaborative filtering

来源：评论

学校读者我要写书评

暂无评论

An End-to-End Transformer-based Automatic Speech Recognition for Qur’an Reciters

引用

Computers, Materials & Continua 2023年第2期74卷 3471-3487页

作者： Mohammed Hadwan Hamzah A.Alsayadi Salah AL-Hagree Department of Information Technology College of ComputerQassim UniversityBuraydah51452Saudi Arabia Department of Computer Science College of Applied SciencesTaiz UniversityTaiz6803Yemen Computer Science Department Faculty of Computer and Information SciencesAin Shams UniversityCairo11566Egypt Computer Science Department Faculty of SciencesIbb UniversityYemen Department of Computer Sciences&Information Ibb UniversityYemen

The attention-based encoder-decoder technique,known as the trans-former,is used to enhance the performance of end-to-end automatic speech recognition(ASR).This research focuses on applying ASR end-toend transformer-based models for the Arabic language,as the researchers’community pays little attention to *** Muslims Holy Qur’an book is written using Arabic diacritized *** this paper,an end-to-end transformer model to building a robust Qur’an *** is *** acoustic model was built using the transformer-based model as deep learning by the PyTorch framework.A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic *** filter bank is used for feature *** build a language model(LM),the Recurrent Neural Network(RNN)and Long short-term memory(LSTM)were used to train an n-gram word-based *** a part of this research,a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model,consisting of 10 h *** recitations performed by 60 *** experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate(CER)of 1.98%and a word error rate(WER)of 6.16%.We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.

关键词： attention-based encoder-decoder recurrent neural network long short-term memory qur’an reciters recognition diacritized arabic text

来源：评论

学校读者我要写书评

暂无评论

An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text

引用

IEEE ACCESS 2023年 11卷 58406-58421页

作者： Nguyen, Quoc-Dung Phan, Nguyet-Minh Kromer, Pavel Le, Duc-Anh Van Lang Univ Sch Technol Fac Mech Elect & Comp Engn Ho Chi Minh City 700000 Vietnam Saigon Univ Fac Informat Technol Chi Minh City 749000 Vietnam VSB Tech Univ Ostrava Dept Comp Sci Ostrava 70800 Czech Republic Inst Stat Math Tokyo 1018430 Japan Nguyen Tat Thanh Univ NTT Hitech Inst Ho Chi Minh City 70000 Vietnam

Different types of OCR errors often occur in OCR texts due to the low quality of scanned document images or limitations in OCR software. In this paper, we propose a novel unsupervised approach for OCR error correction. Correction candidates for OCR errors are generated and explored in their neighborhoods using correction character edits controlled by an adapted hill-climbing algorithm. Correction characters are extracted from only original ground truth texts, which do not depend on OCR texts in training data. A weighted objective function used to score and rank correction candidates is heuristically tested to find optimal weight combinations. The proposed model is evaluated on an OCR text dataset originating from the Vietnamese handwritten database in the ICFHR 2018 Vietnamese online handwritten text recognition competition. The proposed model is also verified concerning its stability and complexity. The experimental results show that our model achieves competitive performance compared to the other models in the ICFHR 2018 competition.

关键词： Optical character recognition Error correction Computational modeling Adaptation models Optimization Linguistics Training data Encoding OCR character edit error correction attention-based encoder-decoder hill climbing

来源：评论

学校读者我要写书评

暂无评论

Alignment Knowledge Distillation for Online Streaming attention-based Speech Recognition

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2023年 31卷 1371-1385页

作者： Inaguma, Hirofumi Kawahara, Tatsuya Kyoto Univ Grad Sch Informat Kyoto 6068501 Japan

This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems. AED models have achieved competitive performance in offline scenarios by jointly optimizing all components. They have recently been extended to an online streaming framework via models such as monotonie chunkwise attention (MoChA). However, the elaborate attention calculation process is not robust against long-form speech utterances. Moreover, the sequence-level training objective and time-restricted streaming encoder cause a nonnegligible delay in token emission during inference. To address these problems, we propose CTC synchronous training (CTC-ST), in which CTC alignments are leveraged as a reference for token boundaries to enable a MoChA model to learn optimal monotonie input-output alignments. We formulate a purely end-to-end training objective to synchronize the boundaries of MoChA to those of CTC. The CTC model shares an encoder with the MoChA model to enhance the encoder representation. Moreover, the proposed method provides alignment information learned in the CTC branch to the attention-based decoder. Therefore, CTC-ST can be regarded as self-distillation of alignment knowledge from CTC to MoChA. Experimental evaluations on a variety of benchmark datasets show that the proposed method significantly reduces recognition errors and emission latency simultaneously. The robustness to long-form and noisy speech is also demonstrated. We compare CTC-ST with several methods that distill alignment knowledge from a hybrid ASR system and show that the CTC-ST can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information.

关键词： Decoding Training Context modeling Transducers Complexity theory Speech processing Predictive models attention-based encoder-decoder connectionist temporal classification knowledge distillation monotonic chunkwise attention streaming automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

Dynamic Network Slice Scaling Assisted by attention-based Prediction in 5G Core Network

引用

IEEE ACCESS 2022年 10卷 72955-72972页

作者： Chien-Nguyen Nhu Park, Minho Soongsil Univ Dept Informat Commun Convergence Technol Seoul 156743 South Korea Soongsil Univ Sch Elect Engn Seoul 156743 South Korea

Network slicing is a key technology in fifth-generation (5G) networks that allows network operators to create multiple logical networks over a shared physical infrastructure to meet the requirements of diverse use cases. Among core functions to implement network slicing, resource management and scaling are difficult challenges. Network operators must ensure the Service Level Agreement (SLA) requirements for latency, bandwidth, resources, etc for each network slice while utilizing the limited resources efficiently, i.e., optimal resource assignment and dynamic resource scaling for each network slice. Existing resource scaling approaches can be classified into reactive and proactive types. The former makes a resource scaling decision when the resource usage of virtual network functions (VNFs) exceeds a predefined threshold, and the latter forecasts the future resource usage of VNFs in network slices by utilizing classical statistical models or deep learning models. However, both have a trade-off between assurance and efficiency. For instance, the lower threshold in the reactive approach or more marginal prediction in the proactive approach can meet the requirements more certainly, but it may cause unnecessary resource wastage. To overcome the trade-off, we first propose a novel and efficient proactive resource forecasting algorithm. The proposed algorithm introduces an attention-based encoder-decoder model for multivariate time series forecasting to achieve high short-term and long-term prediction accuracies. It helps network slices be scaled up and down effectively and reduces the costs of SLA violations and resource overprovisioning. Using the attention mechanism, the model attends to every hidden state of the sequential input at every time step to select the most important time steps affecting the prediction results. We also designed an automated resource configuration mechanism responsible for monitoring resources and automatically adding or removing VNF instances

关键词： Forecasting Predictive models Time series analysis Network slicing Costs 5G mobile communication Data models Network slicing auto scaling resource prediction deep learning attention-based encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Integration of Frame- and Label-synchronous Beam Search for Streaming encoder-decoder Speech Recognition 24

Integration of Frame- and Label-synchronous Beam Search for ...

引用

Interspeech Conference

作者： Tsunoo, Emiru Futami, Hayato Kashiwagi, Yosuke Arora, Siddhant Watanabe, Shinji Sony Grp Corp Tokyo Japan Carnegie Mellon Univ Pittsburgh PA USA

Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning. Conversely, label-based attention encoder-decoder mitigates this issue using soft attention to the input, while it tends to overestimate labels biased towards its training domain, unlike CTC. We exploit these complementary attributes and propose to integrate the frame- and label-synchronous (F-/L-Sync) decoding alternately performed within a single beam-search scheme. F-Sync decoding leads the decoding for block-wise processing, while L-Sync decoding provides the prioritized hypotheses using look-ahead future frames within a block. We maintain the hypotheses from both decoding methods to perform effective pruning. Experiments demonstrate that the proposed search algorithm achieves lower error rates compared to the other search methods, while being robust against out-of-domain situations.

关键词： speech recognition beam search attention-based encoder-decoder CTC

来源：评论

学校读者我要写书评

暂无评论

A Comparative Analysis of Generative Neural attention-based Service Chatbot

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2022年第8期13卷 742-751页

作者： Suhaili, Sinarwati Mohamad Salim, Naomie Jambli, Mohamad Nazim Pre Univ Kota Samarahan Sarawak Malaysia Univ Teknol Malaysia Fac Comp Skudai 81310 Johor Malaysia Univ Teknol Malaysia Ibnu Sina Inst Sci & Ind Res UTM Big Data Ctr Skudai 81310 Johor Malaysia Univ Malaysia Sarawak Fac Comp Sci & Informat Technol Kota Samarahan Sarawak Malaysia

Companies constantly rely on customer support to deliver pre-and post-sale services to their clients through websites, mobile devices or social media platforms such as Twitter. In assisting customers, companies employ virtual service agents (chatbots) to provide support via communication devices. The primary focus is to automate the generation of conversational chat between a computer and a human by constructing vir-tual service agents that can predict appropriate and automatic responses to customers' queries. This paper aims to present and implement a seq2seq-based learning task model based on encoder-decoder architectural solutions by training generative chatbots on customer support Twitter datasets. The model is based on deep Recurrent Neural Networks (RNNs) structures which are uni-directional and bi-directional encoder types of Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). The RNNs are augmented with an attention layer to focus on important information between input and output sequences. Word level embedding such as Word2Vec, GloVe, and FastText are employed as input to the model. Incorporating the base architecture, a comparative analysis is applied where baseline models are compared with and without the use of attention as well as different types of input embedding for each experi-ment. Bilingual Evaluation Understudy (BLEU) was employed to evaluate the model's performance. Results revealed that while biLSTM performs better with Glove, biGRU operates better with FastText. Thus, the finding significantly indicated that the attention-based, bi-directional RNNs (LSTM or GRU) model significantly outperformed baseline approaches in their BLEU score as a promising use in future works.

关键词： Sequence-to-sequence encoder-decoder service chatbot attention-based encoder-decoder Recurrent Neural Network (RNN) Long Short-Term Memory (LSTM) Gated Recurrent Unit (GRU) word embedding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：