检索结果-内蒙古大学图书馆

25th International Conference on Pattern Recognition (ICPR)

作者： Yang, Chen Wang, Qing Du, Jun Zhang, Jianshu Wu, Changjie Wang, Jiaming Univ Sci & Technol China Natl Engn Lab Speech & Language Informat Proc Hefei Anhui Peoples R China

ISBN: (纸本)9781728188089

Recently, a novel radical analysis network (RAN) has the capability of effectively recognizing unseen Chinese character classes and largely reducing the requirement of training data by treating a Chinese character as a hierarchical composition of radicals rather than a single character class. However, when dealing with more challenging issues, such as the recognition of complicated characters, low-frequency character categories, and characters in natural scenes, RAN still has a lot of room for improvement. In this paper, we explore options to further improve the structure generalization and robustness capability of RAN with the Transformer architecture, which has achieved start-of-the-art results for many sequence-to-sequence tasks. More specifically, we propose to replace the original attention module in RAN with the transformer decoder, which is named as a transformer-based radical analysis network (RTN). The experimental results show that the proposed approach can significantly outperform the RAN on both printed Chinese character database and natural scene Chinese character database. Meanwhile, further analysis proves that RTN can be better generalized to complex samples and low-frequency characters, and has better robustness in recognizing Chinese characters with different attributes.

关键词： Chinese characters Transformer network encoder-decoder Attention

来源：评论

学校读者我要写书评

暂无评论

Short-Term Traffic Forecasting using LSTM-based Deep Learning Models 7

Short-Term Traffic Forecasting using LSTM-based Deep Learnin...

引用

Moratuwa Engineering Research Conference (MERCon) / 7th International Multidisciplinary Engineering Research Conference

作者： Haputhanthri, Dilantha Wijayasiri, Adeesha Univ Moratuwa Dept Comp Sci & Engn Moratuwa Sri Lanka

ISBN: (纸本)9781665437530

Accurate short-term traffic volume forecasting has become a component with growing importance in traffic management in intelligent transportation systems (ITS). A significant amount of related works on short-term traffic forecasting has been proposed based on traditional learning approaches, and deep learning-based approaches have also made significant strides in recent years. In this paper, we explore several deep learning models that are based on long-short term memory (LSTM) networks to automatically extract inherent features of traffic volume data for forecasting. A simple LSTM model, LSTM encoder-decoder model, CNN-LSTM model and a Conv-LSTM model were designed and evaluated using a real-world traffic volume dataset for multiple prediction horizons. Finally, the experimental results are analyzed, and the Conv-LSTM model produced the best performance with a MAPE of 9.03% for the prediction horizon of 15 minutes. Also, the paper discusses the behavior of the models with the traffic volume anomalies due to the Covid-19 pandemic.

关键词： CNN-LSTM Conv-LSTM encoder-decoder LSTM traffic volume forecasting

来源：评论

学校读者我要写书评

暂无评论

Attention-Based End-to-End Named Entity Recognition from Speech 24th

Attention-Based End-to-End Named Entity Recognition from Spe...

引用

24th Annual International Conference on Text, Speech, and Dialogue (TSD)

作者： Porjazovski, Dejan Leinonen, Juho Kurimo, Mikko Aalto Univ Dept Signal Proc & Acoust Espoo Finland

ISBN: (纸本)9783030835262;9783030835279

Named entities are heavily used in the field of spoken language understanding, which uses speech as an input. The standard way of doing named entity recognition from speech involves a pipeline of two systems, where first the automatic speech recognition system generates the transcripts, and then the named entity recognition system produces the named entity tags from the transcripts. In such cases, automatic speech recognition and named entity recognition systems are trained independently, resulting in the automatic speech recognition branch not being optimized for named entity recognition and vice versa. In this paper, we propose two attention-based approaches for extracting named entities from speech in an end-to-end manner, that show promising results. We compare both attention-based approaches on Finnish, Swedish, and English data sets, underlining their strengths and weaknesses.

关键词： Named entity recognition Automatic speech recognition End-to-end encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Long-sequence voltage series forecasting for internal short circuit early detection of lithium-ion batteries

引用

PATTERNS 2023年第6期4卷 100732页

作者： Cui, Binghan Wang, Han Li, Renlong Xiang, Lizhi Du, Jiannan Zhao, Huaian Li, Sai Zhao, Xinyue Yin, Geping Cheng, Xinqun Ma, Yulin Huo, Hua Zuo, Pengjian Han, Guokang Du, Chunyu Harbin Inst Technol Sch Chem & Chem Engn MIIT Key Lab Crit Mat Technol New Energy Convers & Harbin 150001 Peoples R China

Accurate early detection of internal short circuits (ISCs) is indispensable for safe and reliable application of lithium-ion batteries (LiBs). However, the major challenge is finding a reliable standard to judge whether the battery suffers from ISCs. In this work, a deep learning approach with multi-head attention and a multi-scale hierarchical learning mechanism based on encoder-decoder architecture is developed to accurately forecast voltage and power series. By using the predicted voltage without ISCs as the standard and detecting the consistency of the collected and predicted voltage series, we develop a method to detect ISCs quickly and accurately. In this way, we achieve an average percentage accuracy of 86% on the dataset, including different batteries and the equivalent ISC resistance from 1,000 U to 10 U, indicating successful application of the ISC detection method.

关键词： lithium-ion battery internal short circuit detection voltage prediction power prediction time-series forecasting deep learning encoder-decoder attention model

来源：评论

学校读者我要写书评

暂无评论

Low-dose CT image denoising method based on generative adversarial network

引用

Journal of Measurement Science and Instrumentation 2024年第4期15卷 490-498页

作者： JIAO Fengyuan YANG Zhixiu SHI Shaojie CAO Weiguo Shanxi Provincial Key Laboratory for Biomedical Imaging and Big Data North University of ChinaTaiyuan 030051China School of Information and Communication Engineering North University of ChinaTaiyuan 030051China School of Environment and Safety Engineering North University of ChinaTaiyuan 030051China

In order to solve the problems of artifacts and noise in low-dose computed tomography(CT)images in clinical medical diagnosis,an improved image denoising algorithm under the architecture of generative adversarial network(GAN)was ***,a noise model based on style GAN2 was constructed to estimate the real noise distribution,and the noise information similar to the real noise distribution was generated as the experimental noise data ***,a network model with encoder-decoder architecture as the core based on GAN idea was constructed,and the network model was trained with the generated noise data set until it reached the optimal ***,the noise and artifacts in low-dose CT images could be removed by inputting low-dose CT images into the denoising *** experimental results showed that the constructed network model based on GAN architecture improved the utilization rate of noise feature information and the stability of network training,removed image noise and artifacts,and reconstructed image with rich texture and realistic visual effect.

关键词： low-dose CT image generative adversarial network noise and artifacts encoder-decoder atrous spatial pyramid pooling(ASPP)

来源：评论

学校读者我要写书评

暂无评论

Mixed Diagnostics for Longitudinal Properties of Electron Bunches in a Free-Electron Laser

引用

FRONTIERS IN PHYSICS 2022年 10卷

作者： Zhu, J. Lockmann, N. M. Czwalinna, M. K. Schlarb, H. Deutsch Elektronen Synchrotron DESY Hamburg Germany

Longitudinal properties of electron bunches are critical for the performance of a wide range of scientific facilities. In a free-electron laser, for example, the existing diagnostics only provide very limited longitudinal information of the electron bunch during online tuning and optimization. We leverage the power of artificial intelligence to build a neural network model using experimental data, in order to bring the destructive longitudinal phase space (LPS) diagnostics online virtually and improve the existing current profile online diagnostics which uses a coherent transition radiation (CTR) spectrometer. The model can also serve as a digital twin of the real machine on which algorithms can be tested efficiently and effectively. We demonstrate at the FLASH facility that the encoder-decoder model with more than one decoder can make highly accurate predictions of megapixel LPS images and coherent transition radiation spectra concurrently for electron bunches in a bunch train with broad ranges of LPS shapes and peak currents, which are obtained by scanning all the major control knobs for LPS manipulation. Furthermore, we propose a way to significantly improve the CTR spectrometer online measurement by combining the predicted and measured spectra. Our work showcases how to combine virtual and real diagnostics in order to provide heterogeneous and reliable mixed diagnostics for scientific facilities.

关键词： free-electron laser (FEL) longitudinal phase space machine learning encoder-decoder mixed diagnostics

来源：评论

学校读者我要写书评

暂无评论

Robust Shadow Detection by Exploring Effective Shadow Contexts 21

Robust Shadow Detection by Exploring Effective Shadow Contex...

引用

29th ACM International Conference on Multimedia (MM)

作者： Fang, Xianyong He, Xiaohao Wang, Linbo Shen, Jianbing Anhui Univ Hefei Peoples R China Univ Macau Macau Peoples R China

ISBN: (纸本)9781450386517

Effective contexts for separating shadows from non-shadow objects can appear in different scales due to different object sizes. This paper introduces a new module, Effective-Context Augmentation (ECA), to utilize these contexts for robust shadow detection with deep structures. Taking regular deep features as global references, ECA enhances the discriminative features from the parallelly computed fine-scale features and, therefore, obtains robust features embedded with effective object contexts by boosting them. We further propose a novel encoder-decoder style of shadow detection method where ECA acts as the main building block of the encoder to extract strong feature representations and the guidance to the classification process of the decoder. Moreover, the networks are optimized with only one loss, which is easy to train and does not have the instability caused by extra losses superimposed on the intermediate features among existing popular studies. Experimental results show that the proposed method can effectively eliminate fake ***, our method outperforms state-of-the-arts methods and improves over 13.97% and 34.67% on the challenging SBU and UCF datasets respectively in balance error rate.

关键词： Shadow detection deep learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

On the limit of English conversational speech recognition 22

On the limit of English conversational speech recognition

引用

Interspeech Conference

作者： Tuske, Zoltan Saon, George Kingsbury, Brian IBM Res AI Yorktown Hts NY 10598 USA

ISBN: (纸本)9781713836902

In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000. Through use of an improved optimizer, speaker vector embeddings, and alternative speech representations we reduce the recognition errors of our LSTM system on Switchboard-300 by 4% relative. Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5.9% and 11.5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models. Our study also considers the recently proposed conformer, and more advanced self-attention based language models. Overall, the conformer shows similar performance to the LSTM;nevertheless, their combination and decoding with an improved LM reaches a new record on Switchboard-300, 5.0% and 10.0% WER on SWB and CHM. Our findings are also confirmed on Switchboard-2000, and a new state of the art is reported, practically reaching the limit of the benchmark.

关键词： encoder-decoder attention speech recognition AdamW Switchboard i-vector

来源：评论

学校读者我要写书评

暂无评论

Transformer Based End-to-End Mispronunciation Detection and Diagnosis 22

Transformer Based End-to-End Mispronunciation Detection and ...

引用

Interspeech Conference

作者： Wu, Minglin Li, Kun Leung, Wai-Kim Meng, Helen Chinese Univ Hong Kong Dept Syst Engn & Engn Management Human Comp Commun Lab Hong Kong Peoples R China SpeechX Ltd Hong Kong Peoples R China Ctr Perceptual & Interact Intelligence CPII Ltd Hong Kong Peoples R China

ISBN: (纸本)9781713836902

This paper introduces two Transformer-based architectures for Mispronunciation Detection and Diagnosis (MDD). The first Transformer architecture (T-1) is a standard setup with an encoder, a decoder, a projection part and the Cross Entropy (CE) loss. T-1 takes in Mel-Frequency Cepstral Coefficients (MFCC) as input. The second architecture (T-2) is based on wav2vec 2.0, a pretraining framework. T-2 is composed of a CNN feature encoder, several Transformer blocks capturing contextual speech representations, a projection part and the Connectionist Temporal Classification (CTC) loss. Unlike T-1, T-2 takes in raw audio data as input. Both models are trained in an end-to-end manner. Experiments are conducted on the CU-CHLOE corpus, where T-1 achieves a Phone Error Rate (PER) of 8.69% and F-measure of 77.23%;and T-2 achieves a PER of 5.97% and F-measure of 80.98 %. Both models significantly outperform the previously proposed AGPM and CNN-RNN-CTC models, with PERs at 11.1% and 12.1% respectively, and F-measures at 72.61% and 74.65 % respectively.

关键词： Mispronunciation Detection and Diagnosis (MDD) Transformer encoder-decoder wav2vec 2.0 CNN feature encoder

来源：评论

学校读者我要写书评

暂无评论

Reducing Streaming ASR Model Delay with Self Alignment 22

Reducing Streaming ASR Model Delay with Self Alignment

引用

Interspeech Conference

作者： Kim, Jaeyoung Lu, Han Tripathi, Anshuman Zhang, Qian Sak, Hasim Google Inc Mountain View CA 94043 USA

ISBN: (纸本)9781713836902

Reducing prediction delay for streaming end-to-end ASR models with minimal performance regression is a challenging problem. Constrained alignment is a well-known existing approach that penalizes predicted word boundaries using external low-latency acoustic models. On the contrary, recently proposed FastEmit is a sequence-level delay regularization scheme encouraging vocabulary tokens over blanks without any reference alignments. Although all these schemes are successful in reducing delay, ASR word error rate (WER) often severely degrades after applying these delay constraining schemes. In this paper, we propose a novel delay constraining method, named self alignment. Self alignment does not require external alignment models. Instead, it utilizes Viterbi forced-alignments from the trained model to find the lower latency alignment direction. From LibriSpeech evaluation, self alignment outperformed existing schemes: 25% and 56% less delay compared to FastEmit and constrained alignment at the similar word error rate. For Voice Search evaluation, 12% and 25% delay reductions were achieved compared to FastEmit and constrained alignment with more than 2% WER improvements.

关键词： Transformer RNN-T sequence-to-sequence encoder-decoder end-to-end speech recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：