This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis sy...
详细信息
This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis systems as they describe how words are pronounced. Traditional, rule-based models do not perform well in unseen contexts. We propose to adopt a neural machine translation (NMT) model to solve the G2P problem. We used gated recurrent units (GRU) recurrent neural network (RNN) to build our model. In contrast to joint-sequence based G2P models, our encoder-decoder based model has the flexibility of not requiring explicit grapheme-to-phoneme alignment which are not straight forward to perform. We trained our model on a pronunciation dictionary of (approximately) 135,000 entries and obtained a word error rate (WER) of 12.49% which is a significant improvement from the existing rule-based and machine-learning based Bangla G2P models.
State-of-charge (SOC) estimation of lithium-ion batteries based on deep learning techniques has been receiving considerable attention. However, most deep-learning-based methods focus on SOC estimation at fixed ambient...
详细信息
State-of-charge (SOC) estimation of lithium-ion batteries based on deep learning techniques has been receiving considerable attention. However, most deep-learning-based methods focus on SOC estimation at fixed ambient temperatures and cannot provide useful indications for battery state in real-world scenarios because batteries usually experience varying temperatures during operation. In this study, an encoder-decoder with bidirectional long short-term memory (LSTM) is proposed for estimating the SOC at different temperature conditions. This end-to-end model can learn sequential information from the measurement sequences to characterize battery dynamics for sequence estimation. Introducing the bidirectional LSTMs into the encoder-decoder enables the model to capture the long-term dependencies of the measurement sequences from both past and future directions to increase the estimation accuracy. The proposed method is evaluated on public battery datasets under dynamic loading profiles. Validation with an experimental dataset shows that this method of considering the sequential contexts and bidirectional dependencies of battery measurement data can accurately estimate the SOC at different ambient temperatures. In particular, the mean absolute errors are as low as 1.07% at varying temperatures. The proposed method can improve the reliability and availability of battery management systems for monitoring the battery state under varying ambient conditions.
Cultivated land extraction is essential for sustainable development and *** this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultiv...
详细信息
Cultivated land extraction is essential for sustainable development and *** this paper,the network we propose is based on the encoder-decoder structure,which extracts the semantic segmentation neural network of cultivated land from satellite images and uses it for agricultural automation *** encoder consists of two part:the first is the modified Xception,it can used as the feature extraction network,and the second is the atrous convolution,it can used to expand the receptive field and the context information to extract richer feature *** decoder part uses the conventional upsampling operation to restore the original *** addition,we use the combination of BCE and Loves-hinge as a loss function to optimize the Intersection over Union(IoU).Experimental results show that the proposed network structure can solve the problem of cultivated land extraction in Yinchuan City.
A novel encoder-decoder model based on deep neural networks is proposed for the prediction of remaining useful life (RUL) in this work. The proposed model consists of an encoder and a decoder. In the encoder, the Bi-d...
详细信息
ISBN:
(纸本)9781728119854
A novel encoder-decoder model based on deep neural networks is proposed for the prediction of remaining useful life (RUL) in this work. The proposed model consists of an encoder and a decoder. In the encoder, the Bi-directional Long Short-Term Memory Networks (Bi-LSTM) and Convolutional Neural Networks (CNN) are used to capture the long-term temporal dependencies and important local features from the sequential data, respectively. Besides, single 1*1 convolution filter in the last convolutional layer is used for dimensionality reduction. In the decoder, the fully connected networks are employed to decode the feature information to predict RUL. In addition, the proposed data-driven method can achieve end-to-end prediction, which does not need feature engineering. To evaluate the proposed model, experimental verification is carried out on a commonly used aero-engine C-MAPSS dataset. Compared with other state-of-the-art approaches on the same dataset, the effectiveness and superiority of the proposed framework are demonstrated. For example, the scoring function value of the second subset is reduced by up to 64.99% compared with the best existing result.
Segmentation which is identification of regions of interest (ROIs) in medical images is a very important step for image analysis in computer-aided diagnosis systems. Accurate segmentation of skin lesions images plays ...
详细信息
ISBN:
(纸本)9783030283773;9783030283766
Segmentation which is identification of regions of interest (ROIs) in medical images is a very important step for image analysis in computer-aided diagnosis systems. Accurate segmentation of skin lesions images plays a vital role in efficient diagnosis of melanoma skin cancer. Diagnosis of melanoma cancer through the segmentation of skin lesions is a challenging task due to possible presence of noise and artefacts such as hairs, air or oil bubbles on the skin lesion images. Skin lesions images are also sometimes characterized with weak edges, irregular and fuzzy borders, marks, dark corners, skin lines and blood vessels on skin lesions. Recently, segmentation methods based on Fully Convolutional encoder-decoder Architecture (FCEDA) have achieved great success in medical images. This work presents automatic skin lesion segmentation method that is based on Fully Convolutional encoder-decoder Architecture. Two types of FCEDA namely U-Net and SegNet architectures, have been examined and utilized for segmentation of skin lesion images. The performance analysis of the two architectures have been conducted. Evaluation and comparison of these two architectures were also carried out. This work finds out and proposes possible improvements of these methods on the segmentation of skin lesions. It is also a systematic comparison of U-Net and SegNet models on the segmentation of skin lesion images. The paper discovers how deep learning methods can be explored using a supervised approach to get accurate results with less complexity possible. The models were evaluated on skin lesion challenge dataset in ISIC 2018 dermoscopic images archives.
Chinese couplets, as one of the traditional Chinese culture, is the treasure of Chinese civilization and the inheritance of Chinese history. Given a sentence (namely an antecedent clause), people reply with another se...
详细信息
ISBN:
(纸本)9781728111988
Chinese couplets, as one of the traditional Chinese culture, is the treasure of Chinese civilization and the inheritance of Chinese history. Given a sentence (namely an antecedent clause), people reply with another sentence (namely a subsequent clause) equal in length. Because of the complexity of the semantic and grammatical rules of couplet, it is not easy to create a suitable couplet that meets the requirements of sentence pattern, context, and flatness. In this paper, given the issued antecedent clause, we can automatically generate the subsequent clause by encoder-decoder model. Moreover, to satisfy special characteristics of couplets, we incorporate the attention mechanism into the encoding-decoding process, which greatly improves the accuracy of couplets generated automatically.
Neural systems are complicated networks connected by a large number of neurons through gap junctions and synapse. At present, for electron microscopy connectomics research, neuron structure recognition algorithms most...
详细信息
ISBN:
(纸本)9781538613115
Neural systems are complicated networks connected by a large number of neurons through gap junctions and synapse. At present, for electron microscopy connectomics research, neuron structure recognition algorithms mostly focus on synapses, dendrites, axons and mitochondria, etc. However, effective methods for automatic recognition of neuronal cell bodies are rare. In this paper, we proposed an effective encoder-decoder network, which extracted segmentation features of neural cell bodies and cell nucleus by the modified residual network and pyramid module. The framework is capable of merging multi-scale contextual information and generating efficient segmentation results by integrating multilevel features. We applied this proposed network on two segmentation tasks for electron microscope (EM) images and compared it with other promising methods as U-Net and deeplab v3+. The results demonstrated that our method achieved the state-of-the-art performance on quality metrics. Finally, we visualized two intact neural cell bodies and cell nucleus to provide a close look into these fine structures.
Neural networks have made significant achievements in the field of image restoration. To efficiently repair facial images with large areas damaged, a decoder-encoder structured convolutional neural network is used as ...
详细信息
Neural networks have made significant achievements in the field of image restoration. To efficiently repair facial images with large areas damaged, a decoder-encoder structured convolutional neural network is used as a generative model and skip-connection is added between some of its layers to enhance the structure prediction ability of the generated model and well suppressed the problem that the repair network is easy to over-fitting. The global discrimination network mostly uses the image’s edge structure and feature information to ensure that the repaired image, which is the output from the repair network, conforms to visual connectivity, while the local discriminators, not only recognize local consistency but also optimize more details. The network structure proposed in this paper combines the encoder-decoder, skip-connection, and dual discriminator networks to improve the effect of face completion. The experimental results on the CelebA show that the proposed method is superior to other methods in repairing images with large areas of damage.
All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural n...
详细信息
ISBN:
(纸本)9783030110185;9783030110178
All the existing image steganography methods use manually crafted features to hide binary payloads into cover images. This leads to small payload capacity and image distortion. Here we propose a convolutional neural network based encoder-decoder architecture for embedding of images as payload. To this end, we make following three major contributions: (i) we propose a deep learning based generic encoder-decoder architecture for image steganography;(ii) we introduce a new loss function that ensures joint end-to-end training of encoder-decoder networks;(iii) we perform extensive empirical evaluation of proposed architecture on a range of challenging publicly available datasets (MNIST, CIFAR10, PASCAL-VOC12, ImageNet, LFW) and report state-of-the-art payload capacity at high PSNR and SSIM values.
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interferenc...
详细信息
ISBN:
(纸本)9781728132488
Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.
暂无评论