End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded...
详细信息
ISBN:
(纸本)9781510848764
End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches. We present experiments on conversational speech recognition where we use lower-level tasks, such as phoneme recognition, in a multitask training approach with an encoder-decoder model for direct character transcription. We compare multiple types of lower-level tasks and analyze the effects of the auxiliary tasks. Our results on the Switchboard corpus show that this approach improves recognition accuracy over a standard encoder-decoder model on the Eva12000 test set.
NASA Technical Reports Server (Ntrs) 19850019880: a Software Simulation Study of a (255,223) Reed-Solomon encoder-decoder by NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 19850019880: a Software Simulation Study of a (255,223) Reed-Solomon encoder-decoder by NASA Technical Reports Server (Ntrs); published by
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (...
详细信息
ISBN:
(纸本)9781509041183
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the attention mechanism cannot provide the exact alignment. To address this limitation, we propose a novel focus mechanism for encoder-decoder framework. Experiments on the standard ATIS dataset showed that BLSTM-LSTM with focus mechanism defined the new state-of-the-art by outperforming standard BLSTM and attention based encoder-decoder. Further experiments also show that the proposed model is more robust to speech recognition errors.
We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurre...
详细信息
ISBN:
(纸本)9783319464480;9783319464473
We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets.
Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as in the standard hybrid framework. The r...
详细信息
ISBN:
(纸本)9781479999880
Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as in the standard hybrid framework. The recurrent neural network (RNN) encoder-decoder is such a model, performing sequence to sequence mapping without any predefined alignment. This model first transforms the input sequence into a fixed length vector representation, from which the decoder recovers the output sequence. In this paper, we extend our previous work on this model for large vocabulary end-to-end speech recognition. We first present a more effective stochastic gradient decent (SGD) learning rate schedule that can significantly improve the recognition accuracy. We then extend the decoder with long memory by introducing another recurrent layer that performs implicit language modelling. Finally, we demonstrate that using multiple recurrent layers in the encoder can reduce the word error rate. Our experiments were carried out on the Switchboard corpus using a training set of around 300 hours of transcribed audio data, and we have achieved significantly higher recognition accuracy, thereby reduced the gap compared to the hybrid baseline.
We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTMencoder-decoder. We trained our model on 3 million, ...
详细信息
ISBN:
(纸本)9781450340694
We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTMencoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.
Automatic and accurate segmentation of the optic disk (OD) region has practical applications in the medical field. In this study, a novel encoder-decoder network is proposed to segment the ODs automatically and accura...
详细信息
Automatic and accurate segmentation of the optic disk (OD) region has practical applications in the medical field. In this study, a novel encoder-decoder network is proposed to segment the ODs automatically and accurately. The encoder consists of three parts: 1) low-level feature extraction module composed of a dense connectivity block (Dense Block) which can output rich low-level features;2) high-resolution block (HR Block) which can extract sufficient semantic information while reducing parameters;and 3) atrous spatial pyramid pooling (ASPP) module is used to obtain high-level features. Therefore, the network is named DHANet. The proposed decoder takes advantage of the multiscale features from the encoder to predict OD regions. By comparing it with existing classic models such as U-Net, CE-Net, and DeepLabv3+, as well as the latest excellent U-Net++, Attention U-Net, and CrackSegNet, it has been proven that the proposed method can generally achieve better segmentation performance at a lower cost. The ablation studies proved the influence of each module on the segmentation performance and explained the network structure reasonably. In the case of fewer network parameters, DHANet achieves better prediction performance on intersection over union (IoU), dice similarity coefficient (DSC), and other evaluation metrics. DHANet is lightweight relatively and can use multiscale features to predict OD regions.
In froth flotation, the tailings grade and concentrate grade are the two key performance indexes. At present, the monitoring models of these two key grades mostly use the froth image or video from a flotation cell. Ho...
详细信息
In froth flotation, the tailings grade and concentrate grade are the two key performance indexes. At present, the monitoring models of these two key grades mostly use the froth image or video from a flotation cell. However, flotation cells are closely related and coupled seriously. It is difficult to use a froth image or video from a flotation cell to represent the concentrate or tailings grade. Therefore, an encoder-decoder and Siamese time series network (ES-net) is proposed. First, an encoder-decoder (ED) model is designed to predict target grade (i.e., the zinc tailings or concentrate grade) by the video feature sequence of the first rougher and the measured target grade sequence. Meanwhile, a Siamese time series and difference network (STS-D net) is constructed to predict the target grade by the video feature sequences of target flotation cell (i.e., the last scavenger or cleaner) at current and previous moments and the previously measured target grade. After that, a multitask learning strategy is proposed to integrate the ED model and STS-D net. Experiments show that the proposed ES-net can effectively integrate multiple froth visual features from different flotation cells and obtain more accurate concentrate and tailings grades than the existing models.
Electrical impedance tomography (EIT) is a noninvasive and radiation-free imaging method. As a "softfield" imaging technique, in EIT, the target signal in the center of the measured field is frequently swamp...
详细信息
Electrical impedance tomography (EIT) is a noninvasive and radiation-free imaging method. As a "softfield" imaging technique, in EIT, the target signal in the center of the measured field is frequently swamped by the target signal at the edge, which restricts its further application. To alleviate this problem, this study presents an enhanced encoder-decoder (EED) method with an atrous spatial pyramid pooling (ASPP) module. The proposed method enhances the ability to detect central weak targets by constructing an ASPP module that integrates multiscale information in the encoder. The multilevel semantic features are fused in the decoder to improve the boundary reconstruction accuracy of the center target. The average absolute error of the imaging results by the EED method reduced by 82.0%, 83.6%, and 36.5% in simulation experiments and 83.0%, 83.2%, and 36.1% in physical experiments compared with the errors of the damped least-squares algorithm, Kalman filtering method, and U-Net-based imaging method, respectively. The average structural similarity improved by 37.3%, 42.9%, and 3.6%, and 39.2%, 45.2%, and 3.8% in the simulation and physical experiments, respectively. The proposed method provides a practical and reliable means of extending the application of EIT by solving the problem of weak central target reconstruction under the effect of strong edge targets in EIT.
Accurate customer churn prediction are increasingly crucial in improving customer retention and corporate revenue. The collected customer churn data generally exhibits the classical multimodal property, i.e., differen...
详细信息
Accurate customer churn prediction are increasingly crucial in improving customer retention and corporate revenue. The collected customer churn data generally exhibits the classical multimodal property, i.e., different types of user behaviors. However, existing customer churn prediction methods fail to capture more meaningful details of multimodal interaction resulting in unideal customer churn prediction accuracy. Specifically, to better deal with the heterogeneity and consistency problems in the acquired multimodal data, in this paper we propose a multimodal autoencoder-decoder framework for customer churn prediction model, which is referred to as MFCCP. By using Chat-GPT to analyze detailed data predicted as lost customers, we aim to customize targeted solutions to recover ***, the features under numerical and textual characteristics that reflect user behavior cues are characterized by a feature encoding network (FE-Net) module to condense the most relevant information for each modality. We then construct a multimodal fusion network (MF-Net) that effectively captures the cross-modal interactions to integrate modality-specific representations. Finally, the multimodal feature reconstruction network (MFR-Net) is selected to decode the fused representations into target modalities, ensuring that the reconstructed results closely resemble the original ones. The experimental results show that the proposed method has higher accuracy and better generalization compared with current customer prediction *** Chat-GPT into the MFCCP framework enables businesses to make informed decisions and take proactive measures to retain valuable customers, ultimately driving revenue growth.
暂无评论