检索结果-内蒙古大学图书馆

18th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2017)

作者： Toshniwal, Shubham Tang, Hao Lu, Liang Livescu, Karen Toyota Technol Inst Chicago IL 60637 USA

ISBN: (纸本)9781510848764

End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches. We present experiments on conversational speech recognition where we use lower-level tasks, such as phoneme recognition, in a multitask training approach with an encoder-decoder model for direct character transcription. We compare multiple types of lower-level tasks and analyze the effects of the auxiliary tasks. Our results on the Switchboard corpus show that this approach improves recognition accuracy over a standard encoder-decoder model on the Eva12000 test set.

关键词： speech recognition multitask learning encoder-decoder CTC LSTM

来源：评论

学校读者我要写书评

暂无评论

NASA Technical Reports Server (Ntrs) 19850019880: a Software Simulation Study of a (255,223) Reed-Solomon encoder-decoder

引用

2017年

NASA Technical Reports Server (Ntrs) 19850019880: a Software Simulation Study of a (255,223) Reed-Solomon encoder-decoder by NASA Technical Reports Server (Ntrs); published by

关键词： (ntrs) 19850019880: algorithms coders computer programs computerized simulation decoders encoder-decoder error detection codes fast fourier transformations information theory nasa technical reports server (ntrs) pollara, f. reed-solomon voyager project

来源：评论

学校读者我要写书评

暂无评论

encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Encoder-decoder with focus-mechanism for sequence labelling ...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Su Zhu Kai Yu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab Department of Computer Science and Engineering Brain Science and Technology Research Center Shanghai Jiao Tong University China

ISBN: (纸本)9781509041183

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the attention mechanism cannot provide the exact alignment. To address this limitation, we propose a novel focus mechanism for encoder-decoder framework. Experiments on the standard ATIS dataset showed that BLSTM-LSTM with focus mechanism defined the new state-of-the-art by outperforming standard BLSTM and attention based encoder-decoder. Further experiments also show that the proposed model is more robust to speech recognition errors.

关键词： Spoken language understanding encoder-decoder focus-mechanism robustness spoken language understanding Word Packaging Short-Term Memory Robustness Speech recognition

来源：评论

学校读者我要写书评

暂无评论

A Recurrent encoder-decoder Network for Sequential Face Alignment 14th

A Recurrent Encoder-Decoder Network for Sequential Face Alig...

引用

14th European Conference on Computer Vision (ECCV)

作者： Peng, Xi Feris, Rogerio S. Wang, Xiaoyu Metaxas, Dimitris N. Rutgers State Univ Piscataway NJ 08854 USA IBM TJ Watson Res Ctr Yorktown Hts NY USA Snapchat Res Venice CA USA

ISBN: (纸本)9783319464480;9783319464473

We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets.

关键词： Recurrent learning encoder-decoder Face alignment

来源：评论

学校读者我要写书评

暂无评论

ON TRAINING THE RECURRENT NEURAL NETWORK encoder-decoder FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION 41

ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR...

引用

41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Lu, Liang Zhang, Xingxing Renals, Steve Univ Edinburgh Ctr Speech Technol Res Edinburgh Midlothian Scotland Univ Edinburgh Inst Language Cognit & Computat Edinburgh Midlothian Scotland

ISBN: (纸本)9781479999880

Recently, there has been an increasing interest in end-to-end speech recognition using neural networks, with no reliance on hidden Markov models (HMMs) for sequence modelling as in the standard hybrid framework. The recurrent neural network (RNN) encoder-decoder is such a model, performing sequence to sequence mapping without any predefined alignment. This model first transforms the input sequence into a fixed length vector representation, from which the decoder recovers the output sequence. In this paper, we extend our previous work on this model for large vocabulary end-to-end speech recognition. We first present a more effective stochastic gradient decent (SGD) learning rate schedule that can significantly improve the recognition accuracy. We then extend the decoder with long memory by introducing another recurrent layer that performs implicit language modelling. Finally, we demonstrate that using multiple recurrent layers in the encoder can reduce the word error rate. Our experiments were carried out on the Switchboard corpus using a training set of around 300 hours of transcribed audio data, and we have achieved significantly higher recognition accuracy, thereby reduced the gap compared to the hybrid baseline.

关键词： end-to-end speech recognition deep neural networks recurrent neural networks encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM encoder-decoder 16

Tweet2Vec: Learning Tweet Embeddings Using Character-level C...

引用

39th International ACM SIGIR conference on Research and Development in Information Retrieval

作者： Vosoughi, Soroush Vijayaraghavan, Prashanth Roy, Deb MIT Media Lab Cambridge MA 02139 USA

ISBN: (纸本)9781450340694

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTMencoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.

关键词： Twitter Embedding Tweet Convolutional Neural Networks CNN LSTM Tweet2Vec encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

DHANet: An encoderdecoder Network With Multiscale Features Fusing for Optic Disk Segmentation

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2025年 74卷

作者： Zheng, Xuan Jiang, Yanglin He, Yi Yuan, Huaqing Xu, Yanbin Du, Peng Tianjin Univ Sch Elect & Informat Engn Tianjin Key Lab Intelligent Unmanned Swarm Techno Tianjin 300072 Tianjin Peoples R China Tianjin Eye Hosp Tianjin Key Lab Ophthalmol & Visual Sci Tianjin 300020 Peoples R China

Automatic and accurate segmentation of the optic disk (OD) region has practical applications in the medical field. In this study, a novel encoder-decoder network is proposed to segment the ODs automatically and accurately. The encoder consists of three parts: 1) low-level feature extraction module composed of a dense connectivity block (Dense Block) which can output rich low-level features;2) high-resolution block (HR Block) which can extract sufficient semantic information while reducing parameters;and 3) atrous spatial pyramid pooling (ASPP) module is used to obtain high-level features. Therefore, the network is named DHANet. The proposed decoder takes advantage of the multiscale features from the encoder to predict OD regions. By comparing it with existing classic models such as U-Net, CE-Net, and DeepLabv3+, as well as the latest excellent U-Net++, Attention U-Net, and CrackSegNet, it has been proven that the proposed method can generally achieve better segmentation performance at a lower cost. The ablation studies proved the influence of each module on the segmentation performance and explained the network structure reasonably. In the case of fewer network parameters, DHANet achieves better prediction performance on intersection over union (IoU), dice similarity coefficient (DSC), and other evaluation metrics. DHANet is lightweight relatively and can use multiscale features to predict OD regions.

关键词： Training Image segmentation Data mining Artificial intelligence Optical imaging Mirrors Feature extraction Accuracy Sensitivity Retina Convolutional neural network encoder-decoder medical image segmentation multiscale features optic disk (OD)

来源：评论

学校读者我要写书评

暂无评论

ES-net: An Integration Model Based on encoderdecoder and Siamese Time Series Difference Network for Grade Monitoring of Zinc Tailings and Concentrate

引用

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2023年第11期70卷 11819-11830页

作者： Zhang, Hu Tang, Zhaohui Xie, Yongfang Yin, Zeyang Gui, Weihua Changsha Univ Coll Comp Sci & Engn Changsha 410022 Peoples R China Cent South Univ Sch Automat Changsha 410083 Peoples R China

In froth flotation, the tailings grade and concentrate grade are the two key performance indexes. At present, the monitoring models of these two key grades mostly use the froth image or video from a flotation cell. However, flotation cells are closely related and coupled seriously. It is difficult to use a froth image or video from a flotation cell to represent the concentrate or tailings grade. Therefore, an encoder-decoder and Siamese time series network (ES-net) is proposed. First, an encoder-decoder (ED) model is designed to predict target grade (i.e., the zinc tailings or concentrate grade) by the video feature sequence of the first rougher and the measured target grade sequence. Meanwhile, a Siamese time series and difference network (STS-D net) is constructed to predict the target grade by the video feature sequences of target flotation cell (i.e., the last scavenger or cleaner) at current and previous moments and the previously measured target grade. After that, a multitask learning strategy is proposed to integrate the ED model and STS-D net. Experiments show that the proposed ES-net can effectively integrate multiple froth visual features from different flotation cells and obtain more accurate concentrate and tailings grades than the existing models.

关键词： Visualization Monitoring Zinc Feature extraction Manufacturing processes Time series analysis Current measurement encoder-decoder froth flotation froth image froth video grade monitoring Siamese time series network

来源：评论

学校读者我要写书评

暂无评论

Effective Electrical Impedance Tomography Based on Enhanced encoderdecoder Using Atrous Spatial Pyramid Pooling Module

引用

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS 2023年第7期27卷 3282-3291页

作者： Tian, Xiang Liu, Xuechao Zhang, Tao Ye, Jian'an Zhang, Weirui Zhang, Liangliang Shi, Xuetao Fu, Feng Li, Zhongyu Xu, Canhua Fourth Mil Med Univ Dept Biomed Engn Xian 710032 Peoples R China Shaanxi Key Lab Bioelectromagnet Detect & Intelli Xian 710032 Peoples R China Xining Joint Logist Support Ctr Drug & Instrument Supervis & Inspect Stn Lanzhou 730050 Peoples R China Xi An Jiao Tong Univ Sch Software Engn Xian 710049 Peoples R China

Electrical impedance tomography (EIT) is a noninvasive and radiation-free imaging method. As a "softfield" imaging technique, in EIT, the target signal in the center of the measured field is frequently swamped by the target signal at the edge, which restricts its further application. To alleviate this problem, this study presents an enhanced encoder-decoder (EED) method with an atrous spatial pyramid pooling (ASPP) module. The proposed method enhances the ability to detect central weak targets by constructing an ASPP module that integrates multiscale information in the encoder. The multilevel semantic features are fused in the decoder to improve the boundary reconstruction accuracy of the center target. The average absolute error of the imaging results by the EED method reduced by 82.0%, 83.6%, and 36.5% in simulation experiments and 83.0%, 83.2%, and 36.1% in physical experiments compared with the errors of the damped least-squares algorithm, Kalman filtering method, and U-Net-based imaging method, respectively. The average structural similarity improved by 37.3%, 42.9%, and 3.6%, and 39.2%, 45.2%, and 3.8% in the simulation and physical experiments, respectively. The proposed method provides a practical and reliable means of extending the application of EIT by solving the problem of weak central target reconstruction under the effect of strong edge targets in EIT.

关键词： Deep learning electrical impedance tomography encoder-decoder image reconstruction

来源：评论

学校读者我要写书评

暂无评论

A deep multimodal autoencoder-decoder framework for customer churn prediction incorporating chat-GPT

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第41期83卷 89563-89589页

作者： Li, Yun Xia, Guoen Wang, Su Li, Ying Guangxi Univ Finance & Econ Sch Big Data & Artificial Intelligence Mingxiu West Road Nanning 530000 Guangxi Peoples R China Guangxi Univ Business Sch Univ East Rd Nanning 530000 Guangxi Peoples R China Guangxi Minzu Univ Coll Elect Informat Univ East Rd Nanning 530000 Guangxi Peoples R China

Accurate customer churn prediction are increasingly crucial in improving customer retention and corporate revenue. The collected customer churn data generally exhibits the classical multimodal property, i.e., different types of user behaviors. However, existing customer churn prediction methods fail to capture more meaningful details of multimodal interaction resulting in unideal customer churn prediction accuracy. Specifically, to better deal with the heterogeneity and consistency problems in the acquired multimodal data, in this paper we propose a multimodal autoencoder-decoder framework for customer churn prediction model, which is referred to as MFCCP. By using Chat-GPT to analyze detailed data predicted as lost customers, we aim to customize targeted solutions to recover ***, the features under numerical and textual characteristics that reflect user behavior cues are characterized by a feature encoding network (FE-Net) module to condense the most relevant information for each modality. We then construct a multimodal fusion network (MF-Net) that effectively captures the cross-modal interactions to integrate modality-specific representations. Finally, the multimodal feature reconstruction network (MFR-Net) is selected to decode the fused representations into target modalities, ensuring that the reconstructed results closely resemble the original ones. The experimental results show that the proposed method has higher accuracy and better generalization compared with current customer prediction *** Chat-GPT into the MFCCP framework enables businesses to make informed decisions and take proactive measures to retain valuable customers, ultimately driving revenue growth.

关键词： Customer churn prediction Deep learning Multimodal fusion encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：