Recently, a novel radical analysis network (RAN) has the capability of effectively recognizing unseen Chinese character classes and largely reducing the requirement of training data by treating a Chinese character as ...
详细信息
ISBN:
(纸本)9781728188089
Recently, a novel radical analysis network (RAN) has the capability of effectively recognizing unseen Chinese character classes and largely reducing the requirement of training data by treating a Chinese character as a hierarchical composition of radicals rather than a single character class. However, when dealing with more challenging issues, such as the recognition of complicated characters, low-frequency character categories, and characters in natural scenes, RAN still has a lot of room for improvement. In this paper, we explore options to further improve the structure generalization and robustness capability of RAN with the Transformer architecture, which has achieved start-of-the-art results for many sequence-to-sequence tasks. More specifically, we propose to replace the original attention module in RAN with the transformer decoder, which is named as a transformer-based radical analysis network (RTN). The experimental results show that the proposed approach can significantly outperform the RAN on both printed Chinese character database and natural scene Chinese character database. Meanwhile, further analysis proves that RTN can be better generalized to complex samples and low-frequency characters, and has better robustness in recognizing Chinese characters with different attributes.
Accurate short-term traffic volume forecasting has become a component with growing importance in traffic management in intelligent transportation systems (ITS). A significant amount of related works on short-term traf...
详细信息
ISBN:
(纸本)9781665437530
Accurate short-term traffic volume forecasting has become a component with growing importance in traffic management in intelligent transportation systems (ITS). A significant amount of related works on short-term traffic forecasting has been proposed based on traditional learning approaches, and deep learning-based approaches have also made significant strides in recent years. In this paper, we explore several deep learning models that are based on long-short term memory (LSTM) networks to automatically extract inherent features of traffic volume data for forecasting. A simple LSTM model, LSTM encoder-decoder model, CNN-LSTM model and a Conv-LSTM model were designed and evaluated using a real-world traffic volume dataset for multiple prediction horizons. Finally, the experimental results are analyzed, and the Conv-LSTM model produced the best performance with a MAPE of 9.03% for the prediction horizon of 15 minutes. Also, the paper discusses the behavior of the models with the traffic volume anomalies due to the Covid-19 pandemic.
Named entities are heavily used in the field of spoken language understanding, which uses speech as an input. The standard way of doing named entity recognition from speech involves a pipeline of two systems, where fi...
详细信息
ISBN:
(纸本)9783030835262;9783030835279
Named entities are heavily used in the field of spoken language understanding, which uses speech as an input. The standard way of doing named entity recognition from speech involves a pipeline of two systems, where first the automatic speech recognition system generates the transcripts, and then the named entity recognition system produces the named entity tags from the transcripts. In such cases, automatic speech recognition and named entity recognition systems are trained independently, resulting in the automatic speech recognition branch not being optimized for named entity recognition and vice versa. In this paper, we propose two attention-based approaches for extracting named entities from speech in an end-to-end manner, that show promising results. We compare both attention-based approaches on Finnish, Swedish, and English data sets, underlining their strengths and weaknesses.
Accurate early detection of internal short circuits (ISCs) is indispensable for safe and reliable application of lithium-ion batteries (LiBs). However, the major challenge is finding a reliable standard to judge wheth...
详细信息
Accurate early detection of internal short circuits (ISCs) is indispensable for safe and reliable application of lithium-ion batteries (LiBs). However, the major challenge is finding a reliable standard to judge whether the battery suffers from ISCs. In this work, a deep learning approach with multi-head attention and a multi-scale hierarchical learning mechanism based on encoder-decoder architecture is developed to accurately forecast voltage and power series. By using the predicted voltage without ISCs as the standard and detecting the consistency of the collected and predicted voltage series, we develop a method to detect ISCs quickly and accurately. In this way, we achieve an average percentage accuracy of 86% on the dataset, including different batteries and the equivalent ISC resistance from 1,000 U to 10 U, indicating successful application of the ISC detection method.
In order to solve the problems of artifacts and noise in low-dose computed tomography(CT)images in clinical medical diagnosis,an improved image denoising algorithm under the architecture of generative adversarial netw...
详细信息
In order to solve the problems of artifacts and noise in low-dose computed tomography(CT)images in clinical medical diagnosis,an improved image denoising algorithm under the architecture of generative adversarial network(GAN)was ***,a noise model based on style GAN2 was constructed to estimate the real noise distribution,and the noise information similar to the real noise distribution was generated as the experimental noise data ***,a network model with encoder-decoder architecture as the core based on GAN idea was constructed,and the network model was trained with the generated noise data set until it reached the optimal ***,the noise and artifacts in low-dose CT images could be removed by inputting low-dose CT images into the denoising *** experimental results showed that the constructed network model based on GAN architecture improved the utilization rate of noise feature information and the stability of network training,removed image noise and artifacts,and reconstructed image with rich texture and realistic visual effect.
Longitudinal properties of electron bunches are critical for the performance of a wide range of scientific facilities. In a free-electron laser, for example, the existing diagnostics only provide very limited longitud...
详细信息
Longitudinal properties of electron bunches are critical for the performance of a wide range of scientific facilities. In a free-electron laser, for example, the existing diagnostics only provide very limited longitudinal information of the electron bunch during online tuning and optimization. We leverage the power of artificial intelligence to build a neural network model using experimental data, in order to bring the destructive longitudinal phase space (LPS) diagnostics online virtually and improve the existing current profile online diagnostics which uses a coherent transition radiation (CTR) spectrometer. The model can also serve as a digital twin of the real machine on which algorithms can be tested efficiently and effectively. We demonstrate at the FLASH facility that the encoder-decoder model with more than one decoder can make highly accurate predictions of megapixel LPS images and coherent transition radiation spectra concurrently for electron bunches in a bunch train with broad ranges of LPS shapes and peak currents, which are obtained by scanning all the major control knobs for LPS manipulation. Furthermore, we propose a way to significantly improve the CTR spectrometer online measurement by combining the predicted and measured spectra. Our work showcases how to combine virtual and real diagnostics in order to provide heterogeneous and reliable mixed diagnostics for scientific facilities.
Effective contexts for separating shadows from non-shadow objects can appear in different scales due to different object sizes. This paper introduces a new module, Effective-Context Augmentation (ECA), to utilize thes...
详细信息
ISBN:
(纸本)9781450386517
Effective contexts for separating shadows from non-shadow objects can appear in different scales due to different object sizes. This paper introduces a new module, Effective-Context Augmentation (ECA), to utilize these contexts for robust shadow detection with deep structures. Taking regular deep features as global references, ECA enhances the discriminative features from the parallelly computed fine-scale features and, therefore, obtains robust features embedded with effective object contexts by boosting them. We further propose a novel encoder-decoder style of shadow detection method where ECA acts as the main building block of the encoder to extract strong feature representations and the guidance to the classification process of the decoder. Moreover, the networks are optimized with only one loss, which is easy to train and does not have the instability caused by extra losses superimposed on the intermediate features among existing popular studies. Experimental results show that the proposed method can effectively eliminate fake ***, our method outperforms state-of-the-arts methods and improves over 13.97% and 34.67% on the challenging SBU and UCF datasets respectively in balance error rate.
In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results f...
详细信息
ISBN:
(纸本)9781713836902
In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000. Through use of an improved optimizer, speaker vector embeddings, and alternative speech representations we reduce the recognition errors of our LSTM system on Switchboard-300 by 4% relative. Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5.9% and 11.5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models. Our study also considers the recently proposed conformer, and more advanced self-attention based language models. Overall, the conformer shows similar performance to the LSTM;nevertheless, their combination and decoding with an improved LM reaches a new record on Switchboard-300, 5.0% and 10.0% WER on SWB and CHM. Our findings are also confirmed on Switchboard-2000, and a new state of the art is reported, practically reaching the limit of the benchmark.
This paper introduces two Transformer-based architectures for Mispronunciation Detection and Diagnosis (MDD). The first Transformer architecture (T-1) is a standard setup with an encoder, a decoder, a projection part ...
详细信息
ISBN:
(纸本)9781713836902
This paper introduces two Transformer-based architectures for Mispronunciation Detection and Diagnosis (MDD). The first Transformer architecture (T-1) is a standard setup with an encoder, a decoder, a projection part and the Cross Entropy (CE) loss. T-1 takes in Mel-Frequency Cepstral Coefficients (MFCC) as input. The second architecture (T-2) is based on wav2vec 2.0, a pretraining framework. T-2 is composed of a CNN feature encoder, several Transformer blocks capturing contextual speech representations, a projection part and the Connectionist Temporal Classification (CTC) loss. Unlike T-1, T-2 takes in raw audio data as input. Both models are trained in an end-to-end manner. Experiments are conducted on the CU-CHLOE corpus, where T-1 achieves a Phone Error Rate (PER) of 8.69% and F-measure of 77.23%;and T-2 achieves a PER of 5.97% and F-measure of 80.98 %. Both models significantly outperform the previously proposed AGPM and CNN-RNN-CTC models, with PERs at 11.1% and 12.1% respectively, and F-measures at 72.61% and 74.65 % respectively.
Reducing prediction delay for streaming end-to-end ASR models with minimal performance regression is a challenging problem. Constrained alignment is a well-known existing approach that penalizes predicted word boundar...
详细信息
ISBN:
(纸本)9781713836902
Reducing prediction delay for streaming end-to-end ASR models with minimal performance regression is a challenging problem. Constrained alignment is a well-known existing approach that penalizes predicted word boundaries using external low-latency acoustic models. On the contrary, recently proposed FastEmit is a sequence-level delay regularization scheme encouraging vocabulary tokens over blanks without any reference alignments. Although all these schemes are successful in reducing delay, ASR word error rate (WER) often severely degrades after applying these delay constraining schemes. In this paper, we propose a novel delay constraining method, named self alignment. Self alignment does not require external alignment models. Instead, it utilizes Viterbi forced-alignments from the trained model to find the lower latency alignment direction. From LibriSpeech evaluation, self alignment outperformed existing schemes: 25% and 56% less delay compared to FastEmit and constrained alignment at the similar word error rate. For Voice Search evaluation, 12% and 25% delay reductions were achieved compared to FastEmit and constrained alignment with more than 2% WER improvements.
暂无评论