检索结果-内蒙古大学图书馆

LegoNN: Building Modular encoder-decoder models

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2023年 31卷 3112-3126页

作者： Dalmia, Siddharth Okhonko, Dmytro Lewis, Mike Edunov, Sergey Watanabe, Shinji Metze, Florian Zettlemoyer, Luke Mohamed, Abdelrahman Carnegie Mellon Univ Pittsburgh PA 15213 USA Meta Platforms Inc Menlo Pk CA 94025 USA Rembrand Inc Palo Alto CA 94062 USA Samaya AI Mountain View CA 94040 USA

State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals;one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.

关键词： End-to-end encoder-decoder models modularity speech recognition machine translation

来源：评论

学校读者我要写书评

暂无评论

Alpha matting for portraits using encoder-decoder models

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2022年第10期81卷 14517-14528页

作者： Srivastava, Akshat Raghu, Srivatsav Thyagarajan, Abitha K. Vaidyaraman, Jayasri Kothandaraman, Mohanaprasad Sudheendra, Pavan Goel, Avinav VIT Univ Sch Elect Engn SENSE Chennai 600127 Tamil Nadu India Samsung R&D Inst Bangalore 560037 Karnataka India

Image matting is a technique used to extract the foreground and background from a given image. In the past, classical algorithms based on sampling, propagation, or a combination of the two were used to perform image matting;however, most of these have produced poor results when applied to images with complex backgrounds. They are also unable to extract with high accuracy foreground images that are comprised of thin objects. In this context, the use of deep learning to solve the image matting problem has gained increasing popularity. In this paper, an encoder-decoder model for alpha matting of human portraits using deep learning is proposed. The model used comprises two parts: the first is an encoder-decoder model, which is a deep convolutional network that has 11 convolutional layers and 5 max-pooling layers in the encoder stage and 11 convolutional layers and 5 unpooling layers in the decoder stage. This portion of the model takes the image and trimap as input produces the coarse alpha matte as the output. The second part is the refinement stage with four convolutional layers, responsible for further refining the coarse alpha matte that was produced by the encoder-decoder stage to obtain an alpha matte of high accuracy. The model was trained using 43,100 images. When tested using the dataset, our model's output was comparable to the industry standard, yielding an average MSE of 0.023 and an average SAD loss of 66.5.

关键词： Alpha matting Image segmentation Deep learning encoder-decoder models

来源：评论

学校读者我要写书评

暂无评论

Exploiting recurrent graph neural networks for suffix prediction in predictive monitoring

引用

COMPUTING 2024年第9期106卷 3085-3111页

作者： Rama-Maneiro, Efren Vidal, Juan C. Lama, Manuel Monteagudo-Lago, Pablo Univ Santiago de Compostela Ctr Singular Invest Tecnoloxias Intelixentes CiTIU Santiago De Compostela Spain Univ Santiago de Compostela Dept Elect & Computat Santiago De Compostela Spain

Predictive monitoring is a subfield of process mining that aims to predict how a running case will unfold in the future. One of its main challenges is forecasting the sequence of activities that will occur from a given point in time -suffix prediction-. Most approaches to the suffix prediction problem learn to predict the suffix by learning how to predict the next activity only, while also disregarding structural information present in the process model. This paper proposes a novel architecture based on an encoder-decoder model with an attention mechanism that decouples the representation learning of the prefixes from the inference phase, predicting only the activities of the suffix. During the inference phase, this architecture is extended with a heuristic search algorithm that selects the most probable suffix according to both the structural information extracted from the process model and the information extracted from the log. Our approach has been tested using 12 public event logs against 6 different state-of-the-art proposals, showing that it significantly outperforms these proposals.

关键词： Process mining Predictive monitoring Recurrent neural networks encoder-decoder models Attention mechanisms

来源：评论

学校读者我要写书评

暂无评论

FLANEC: EXPLORING FLAN-T5 FOR POST-ASR ERROR CORRECTION

FLANEC: EXPLORING FLAN-T5 FOR POST-ASR ERROR CORRECTION

引用

2024 Spoken Language Technology Workshop

作者： La Quatra, Moreno Salerno, Valerio Mario Tsao, Yu Siniscalchi, Sabato Marco Kore Univ Enna Enna Italy Acad Sinica Taipei Taiwan Univ Palermo Palermo Italy

ISBN: (纸本)9798350392265;9798350392258

In this paper, we present an encoder-decoder model leveraging Flan-T5 for post-Automatic Speech Recognition (ASR) Generative Speech Error Correction (GenSEC), and we refer to it as FlanEC. We explore its application within the GenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a single output sentence. By utilizing n-best lists from ASR models, we aim to improve the linguistic correctness, accuracy, and grammaticality of final ASR transcriptions. Specifically, we investigate whether scaling the training data and incorporating diverse datasets can lead to significant improvements in post-ASR error correction. We evaluate FlanEC using the HyPoradise dataset, providing a comprehensive analysis of the model's effectiveness in this domain. Furthermore, we assess the proposed approach under different settings to evaluate model scalability and efficiency, offering valuable insights into the potential of instruction-tuned encoder-decoder models for this task.

关键词： Flan-T5 Post-ASR Error Correction Generative Error Correction encoder-decoder models

来源：评论

学校读者我要写书评

暂无评论

Code-Mixed Text-to-Speech Synthesis Under Low-Resource Constraints 25th

Code-Mixed Text-to-Speech Synthesis Under Low-Resource Const...

引用

25th International Conference on Speech and Computer (SPECOM)

作者： Joshi, Raviraj Garera, Nikesh Flipkart Bengaluru India

ISBN: (纸本)9783031483110;9783031483127

Text-to-speech (TTS) systems are an important component in voice-based e-commerce applications. These applications include end-to-end voice assistant and customer experience (CX) voice bot. Code-mixed TTS is also relevant in these applications since the product names are commonly described in English while the surrounding text is in a regional language. In this work, we describe our approaches for production quality code-mixed Hindi-English TTS systems built for e-commerce applications. We propose a data-oriented approach by utilizing monolingual data sets in individual languages. We leverage a transliteration model to convert the Roman text into a common Devanagari script and then combine both datasets for training. We show that such single script bi-lingual training without any code-mixing works well for pure code-mixed test sets. We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better. These approaches are also coupled with transfer learning and decoder-only fine-tuning to improve performance. We compare these approaches with the Google TTS and report a positive CMOS score of 0.02 with the proposed transfer learning approach. We also perform low-resource voice adaptation experiments to show that a new voice can be onboarded with just 3 hrs of data. This highlights the importance of our pre-trained models in resource-constrained settings. This subjective evaluation is performed on a large number of out-of-domain pure code-mixed sentences to demonstrate the high quality of the systems.

关键词： Code-mixed Text-to-speech encoder-decoder models Tacotron2 Waveglow Transfer learning

来源：评论

学校读者我要写书评

暂无评论

Image Segmentation Using Deep Learning: A Survey

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022年第7期44卷 3523-3542页

作者： Minaee, Shervin Boykov, Yuri Y. Porikli, Fatih Plaza, Antonio J. Kehtarnavaz, Nasser Terzopoulos, Demetri Snapchat Machine Learning Res Venice CA 90405 USA Univ Waterloo Waterloo ON N21 3G1 Canada Australian Natl Univ Canberra ACT 0200 Australia Huawei San Diego CA 92121 USA Univ Extremadura Badajoz 06006 Spain Univ Texas Dallas Richardson TX 75080 USA Univ Calif Los Angeles Los Angeles CA 90095 USA

Image segmentation is a key task in computer vision and image processing with important applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among others, and numerous segmentation algorithms are found in the literature. Against this backdrop, the broad success of deep learning (DL) has prompted the development of new image segmentation approaches leveraging DL models. We provide a comprehensive review of this recent literature, covering the spectrum of pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the relationships, strengths, and challenges of these DL-based segmentation models, examine the widely used datasets, compare performances, and discuss promising research directions.

关键词： Image segmentation Computer architecture Semantics Deep learning Computational modeling Generative adversarial networks Logic gates Image segmentation deep learning convolutional neural networks encoder-decoder models recurrent models generative models semantic segmentation instance segmentation panoptic segmentation medical image segmentation

来源：评论

学校读者我要写书评

暂无评论

Large-scale End-of-Life Prediction of Hard Disks in Distributed Datacenters 9

Large-scale End-of-Life Prediction of Hard Disks in Distribu...

引用

IEEE International Conference on Smart Computing (SMARTCOMP)

作者： Mohapatra, Rohan Coursey, Austin Sengupta, Saptarshi San Jose State Univ Dept Comp Sci San Jose CA 95192 USA Vanderbilt Univ Dept Comp Sci Nashville TN USA

ISBN: (纸本)9798350322811

On a daily basis, data centers process huge volumes of data backed by the proliferation of inexpensive hard disks. Data stored in these disks serve a range of critical functional needs from financial, and healthcare to aerospace. As such, premature disk failure and consequent loss of data can be catastrophic. To mitigate the risk of failures, cloud storage providers perform condition-based monitoring and replace hard disks before they fail. By estimating the remaining useful life of hard disk drives, one can predict the time-to-failure of a particular device and replace it at the right time, ensuring maximum utilization whilst reducing operational costs. In this work, large-scale predictive analyses are performed using severely skewed health statistics data by incorporating customized feature engineering and a suite of sequence learners. Past work suggests using LSTMs as an excellent approach to predicting remaining useful life. To this end, we present an encoder-decoder LSTM model where the context gained from understanding health statistics sequences aid in predicting an output sequence of the number of days remaining before a disk potentially fails. The models developed in this work are trained and tested across an exhaustive set of all of the 10 years of S.M.A.R.T. health data in circulation from Backblaze and on a wide variety of disk instances. It closes the knowledge gap on what full-scale training achieves on thousands of devices and advances the state-of-the-art by providing tangible metrics for evaluation and generalization for practitioners looking to extend their workflow to all years of health data in circulation across disk manufacturers. The encoder-decoder LSTM posted an RMSE of 0.83 during training and 0.86 during testing over the exhaustive 10-year data while being able to generalize competitively over other drives from the Seagate family.

关键词： Failure Prediction Remaining Useful Life Long Short-Term Memory Hard Drive Health encoder-decoder models SMART

来源：评论

学校读者我要写书评

暂无评论

Deep Reinforcement Learning and Hybrid Approaches to Solve Multi-Vehicle Combinatorial Optimization Problems

Deep Reinforcement Learning and Hybrid Approaches to Solve M...

引用

作者： Sankaran, Prashant Rochester Institute of Technology

学位级别：Ph.D., Doctor of Philosophy

Combinatorial optimization problems are an important class of problems often encountered in the real world involving a combinatorially growing set of feasible solutions as the problem size increases. Since exact approaches can be computationally expensive, practitioners often use approximate approaches such as metaheuristics. However, sophisticated approximate methods that yield high-quality solutions require expert help to handcraft or fine-tune the solution process to suit a given problem distribution. In recent years, artificial intelligence (AI) approaches that involve learning from data without being explicitly programmed have shown tremendous success at various challenging tasks, like natural language processing and autonomous driving. Therefore, solving combinatorial optimization problems is an ideal use case for AI approaches. In this dissertation, we find answers to two key questions considering recent AI developments. 1) How to use deep reinforcement learning (DRL) approaches to solve complex multi-vehicle combinatorial optimization problems. 2) Can combining machine learning, metaheuristics, and mixed integer-linear optimization solvers under a hybrid framework help quickly obtain certifiable high-quality solutions for combinatorial optimization problems? The answer to these questions broadly builds on two key directions: DRL and hybrid approaches to tackle challenging multi-vehicle combinatorial optimization problems considering the recent advancements, gaps, and drawbacks. Specifically, in Part I of this dissertation, DRL-based approximate approaches are developed to learn from complex edge features, reason over uncertain edges, and handle multi-vehicle decoding and collaboration to solve complex multi-vehicle combinatorial optimization problems. Additionally, we develop approaches to generate large-scale complex data on the fly for training. Upon experimental evaluation, we learn that DRL-based approaches can quickly generate high-quality solutions to

关键词： Combinatorial optimization Deep reinforcement learning encoder-decoder models Hybrid learning-optimization Multi-agent collaboration Multi-vehicle routing problems

来源：评论

学校读者我要写书评

暂无评论

Cycle-level traffic conflict prediction at signalized intersections with LiDAR data and Bayesian deep learning

引用

ACCIDENT ANALYSIS AND PREVENTION 2023年第1期192卷 107268页

作者： Wu, Peijie Wei, Wei Zheng, Lai Hu, Zhenlin Essa, Mohamed Chongqing Jiaotong Univ Sch Traff & Transportat 66 Xuefu Ave Chongqing 400074 Peoples R China Harbin Inst Technol Sch Transportat Sci & Engn 73 Huanghe St Harbin 150090 Peoples R China

Real-time safety prediction models are vital in proactive road safety management strategies. This study develops models to predict traffic conflicts at signalized intersections at the signal cycle level, using advanced Bayesian deep learning techniques and efficient LiDAR points. The modeling framework contains three phases, which are data preprocessing, base deep learning model development, and Bayesian deep learning model development. The core of the framework is the long short-term memory (LSTM) employed to predict the conflict frequency of a cycle by using traffic features of the previous five cycles (e.g., dynamic traffic parameters, traffic conflict frequency). Four Bayesian deep learning models were developed, including Bayesian-Standard LSTM, BayesianHybrid-LSTM, Bayesian-Stacked-LSTM encoder-decoder, and Bayesian-Multi-head Stacked-LSTM encoderdecoder. The developed models were applied to traffic conflicts extracted from LiDAR points that were collected from a signalized intersection in Harbin, China with a total duration of seven days. Traffic conflicts, measured by the modified time-to-collision conflict indicator, were identified using the peak over threshold approach. The models were thoroughly evaluated from the aspects of reliability, transferability, sensitivity, and robustness. The results show that the developed four models can predict traffic conflict frequency per cycle per lane simultaneously with its uncertainty. Moreover, the two Bayesian encoder-decoder models perform better than Bayesian-Standard LSTM and Bayesian-Hybrid-LSTM in the four tests. Bayesian-Multi-head Stacked-LSTM encoder-decoder is suggested as the optimal model for its high reliability under uncertainty, good transferability in three scenarios, low sensitivity to different parameters, and sound robustness against small noise. The proposed framework could benefit studies on the state-of-the-art data-driven approach for real-time safety prediction.

关键词： Real-time safety prediction Traffic conflict Bayesian deep learning encoder-decoder models Signalized intersection

来源：评论

学校读者我要写书评

暂无评论

DeepObfusCode: Source Code Obfuscation through Sequence-to-Sequence Networks

<i>DeepObfusCode</i>: Source Code Obfuscation through Sequen...

引用

Computing Conference

作者： Datta, Siddhartha Univ Oxford Oxford England

ISBN: (纸本)9783030801267;9783030801250

The paper explores a novel methodology in source code obfuscation through the application of text-based recurrent neural network (RNN) encoder-decoder models in ciphertext generation and key generation. Sequence-to-sequence models are incorporated into the model architecture to generate obfuscated code, generate the deobfuscation key, and live execution. Quantitative benchmark comparison to existing obfuscation methods indicate significant improvement in stealth and execution cost for the proposed solution, and experiments regarding the model's properties yield positive results regarding its character variation, dissimilarity to the original codebase, and consistent length of obfuscated code.

关键词： Code obfuscation encoder-decoder models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：