检索结果-内蒙古大学图书馆

CapNet: An encoder-decoder based Neural Network Model for Automatic Bangla Image Caption Generation

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2022年第8期13卷 752-759页

作者： Rahman, Rashik Saha, Aloke Kumar Murad, Hasan Al Masud, Shah Murtaza Rashid Rahman, Nakiba Nuren Momtaz, A. S. Zaforullah Univ Asia Pacific Comp Sci & Engn Dhaka Bangladesh Chittagong Univ Engn & Technol Comp Sci & Engn Chattogram Bangladesh

Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visually impaired people by converting the caption to speech to have a better understanding of their surrounding. Though significant amount of research has been conducted for automatic caption generation in other languages, far too little effort has been devoted to Bangla image caption generation. In this paper, we propose an encoder-decoder based model which takes an image as input and generates the corresponding Bangla caption as output. The encoder network consists of a pretrained image feature extractor called ResNet-50, while the decoder network consists of Bidirectional LSTMs for caption generation. The model has been trained and evaluated using a Bangla image captioning dataset named BanglaLekhaImageCaptions. The proposed model achieved a training accuracy of 91% and BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.81, 0.67, 0.57, and 0.51 respectively. Moreover, a comparative study for different pretrained feature extractors such as VGG-16 and Xception is presented. Finally, the proposed model has been deployed on an embedded device for analysing the inference time and power consumption.

关键词： -Bangla image caption generation encoder-decoder bidirectional long short term memory (LSTM) bangla natural language processing (NLP)

来源：评论

学校读者我要写书评

暂无评论

Software Reliability Prediction through encoder-decoder Recurrent Neural Networks

INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGE...

引用

INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES 2022年第3期7卷 325-340页

作者： Li, Chen Zheng, Junjun Okamura, Hiroyuki Dohi, Tadashi Kyushu Inst Technol Fac Comp Sci & Syst Engn Dept Biosci & Bioinformat Iizuka Fukuoka 8208502 Japan Ritsumeikan Univ Dept Informat Sci & Engn Kusatsu 5258577 Japan Hiroshima Univ Grad Sch Adv Sci Engn Higashihiroshima 7398527 Japan

With the growing demand for high reliability and safety software, software reliability prediction has attracted more and more attention to identifying potential faults in software. Software reliability growth models (SRGMs) are the most commonly used prediction models in practical software reliability engineering. However, their unrealistic assumptions and environment-dependent applicability restrict their development. Recurrent neural networks (RNNs), such as the long short-term memory (LSTM), provide an end-to-end learning method, have shown a remarkable ability in time-series forecasting and can be used to solve the above problem for software reliability prediction. In this paper, we present an attention-based encoder-decoder RNN called EDRNN to predict the number of failures in the software. More specifically, the encoder-decoder RNN estimates the cumulative faults with the fault detection time as input. The attention mechanism improves the prediction accuracy in the encoder-decoder architecture. Experimental results demonstrate that our proposed model outperforms other traditional SRGMs and neural network-based models in terms of accuracy.

关键词： Software reliability Recurrent neural networks (RNNs) Long short-term memory (LSTM) encoder-decoder Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

A Decomposition-based encoder-decoder Framework for Multi-step Prediction of Burn-Through Point in Sintering Process 6

A Decomposition-based Encoder-Decoder Framework for Multi-st...

引用

IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)

作者： Xie, Yuhan He, Bocun Zhang, Xinmin Song, Zhihuan Zhejiang Univ Coll Control Sci & Engn State Key Lab Ind Control Technol Hangzhou Peoples R China

ISBN: (纸本)9798350311259

Sintering process is a critical step in the ironmaking process. Burn-through point (BTP), as a key performance index of sintering ore, has a great influence on the quality of the sintering product. The existing prediction methods attempt to use a single model to establish the relationship between variables. However, due to the strong volatility, uncertainty, and multivariable coupling of sintering process, the traditional prediction model cannot produce reliable predictions. In order to deal with the complex characteristics of sintering process, this paper proposes a decomposition-based encoder-decoder modeling framework, in which a sequence decomposition module is designed to decompose the input time series into different sub-sequences. Then, these sub-sequences are constructed by the encoder-decoder models separately. The effectiveness of the proposed multi-step ahead prediction modeling framework was evaluated in a real-world sintering process. Compared with the traditional prediction modeling framework, the proposed modeling framework has more accurate results in multi-step ahead prediction.

关键词： Sintering process burn-through point multi-step ahead prediction encoder-decoder series decomposition

来源：评论

学校读者我要写书评

暂无评论

encoder-decoder Architecture for 3D Seismic Inversion

引用

SENSORS 2023年第1期23卷 61页

作者： Gelboim, Maayan Adler, Amir Sun, Yen Araya-Polo, Mauricio Braude Coll Engn Elect Engn Dept IL-2161002 Karmiel Israel TotalEnergies EP R&T Houston TX 77002 USA

Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5 km x 4.5 km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.9143 (out of 1.0) in the presence of field noise at 10 dB signal-to-noise ratio.

关键词： 3D reconstruction seismic inversion seismic velocity inverse problems deep learning transfer learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

MMSpeech: Multi-modal Multi-task encoder-decoder Pre-training for speech recognition 24

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-trainin...

引用

Interspeech Conference

作者： Zhou, Xiaohuan Wang, Jiaming Cui, Zeyu Zhang, Shiliang Yan, Zhijie Zhou, Jingren Zhou, Chang DAMO Acad Alibaba Grp Beijing Peoples R China

In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main difficulty in speech-text joint pre-training comes from the significant difference between speech and text modalities, especially for Mandarin speech and text. Unlike English and other languages with an alphabetic writing system, Mandarin uses an ideographic writing system where character and sound are not tightly mapped to one another. Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text. In addition, a much larger amount of unsupervised text data 292G is utilized for pre-training, which brings significant improvements. Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement.

关键词： ASR pre-training encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Multi-Scale Attention and encoder-decoder Network for Video Saliency Object Detection

引用

PATTERN RECOGNITION AND IMAGE ANALYSIS 2022年第2期32卷 340-350页

作者： Bi, Hongbo Zhu, Huihui Yang, Lina Wu, Ranwan NorthEast Petr Univ Dept Commun Engn Daqing 163318 Peoples R China

In recent years, video saliency object detection has received more and more attention, and many excellent algorithms have been proposed. In the paper, we propose a new idea of video saliency object detection, named MAED-Net. Our method is mainly divided into two modules: spatial module and temporal module. In spatial module: we use a set of parallel dilated convolutions, and add channel attention to each dilated convolutions. Multi-scale mimics the characteristics of the human retina. Attention is to imitate the human attention mechanism. We combine multi-scale information with attention information, which constitutes the pyramid multi-scale channel attention. Multi-scale channel attention allows us to obtain more precise saliency clues, laying a solid foundation for the next part of the temporal. In temporal module: we use a set of encoder-decoder ConvLSTM with different dilated rates, and we use dense connection and skip connection to blend information of different scales. We evaluate our results on four datasets and compare with twelve algorithms. The experimental results show that our algorithm achieved the state-of-the-arts.

关键词： channel attention multi-scale encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

ProDE: Interpretable APT Detection Method Based on encoder-decoder Architecture 29

ProDE: Interpretable APT Detection Method Based on Encoder-d...

引用

29th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2023

作者： Zhou, Fengxi Chang, Baoming Wen, Yu Meng, Dan University of Chinese Academy of Sciences Institute of Information Engineering Chinese Academy of Sciences School of Cyber Security Beijing China Chinese Academy of Sciences Institute of Information Engineering Beijing China

ISBN: (纸本)9798350330717

The detection and analysis of Advanced Persistent Threats (APTs) are pivotal for contemporary network security. Provenance graphs, constructed from audit logs, offer a wealth of contextual information to identify and analyze threats and are popular in APT detection field. However, existing approaches frequently fall short in offering explanatory capabilities for their detection results, placing an additional burden on security analysts. Confronted with coarse-grained detection outcomes, analysts must delve into provenance graphs or audit logs to precisely pinpoint attack entities and events, which can significantly delay the response to threats. In this paper, we propose ProDE, a novel approach that enhances APT detection by providing interpretable results using an encoder-decoder architecture. ProDE initiates the detection process by comparing the encoded representations of the true graph and the predicted graph. Upon detecting abnormalities, the encoder-decoder model is able to decode the encodings into provenance graphs, thereby revealing inconsistencies between the decoded graph and real graph that serve as interpretable results. We evaluate ProDE on two widely used datasets, while taking into account the detection performance, the result shows ProDE can provided the more detailed detection results which provide the interpretation for analysts compared with existing approaches. © 2023 IEEE.

关键词： Advanced Persistent Threats (APTs) encoder-decoder interpretable

来源：评论

学校读者我要写书评

暂无评论

Dynamic energy system modeling using hybrid physics-based and machinelearning encoder–decoder models

引用

Energy and AI 2022年第3期9卷 128-138页

作者： Derek Machalek Jake Tuttle Klas Andersson Kody M.Powell Department of Chemical Engineering University of UtahSalt Lake CityUTUnited States of America Taber International LLCUnited States of America Department of Space Earthand EnvironmentUniversity of ChalmersGothenburgSweden Department of Mechanical Engineering University of UtahSalt Lake CityUTUnited States of America

Three model configurations are presented for multi-step time series predictions of the heat absorbed by thewater and steam in a thermal power plant. The models predict over horizons of 2, 4, and 6 steps into thefuture, where each step is a 5-minute increment. The evaluated models are a pure machine learning model, anovel hybrid machine learning and physics-based model, and the hybrid model with an incomplete dataset. Thehybrid model deconstructs the machine learning into individual boiler heat absorption units: economizer, waterwall, superheater, and reheater. Each configuration uses a gated recurrent unit (GRU) or a GRU-based encoder–decoder as the deep learning architecture. Mean squared error is used to evaluate the models compared totarget values. The encoder–decoder architecture is over 11% more accurate than the GRU only models. Thehybrid model with the incomplete dataset highlights the importance of the manipulated variables to the *** hybrid model, compared to the pure machine learning model, is over 10% more accurate on averageover 20 iterations of each model. Automatic differentiation is applied to the hybrid model to perform a localsensitivity analysis to identify the most impactful of the 72 manipulated variables on the heat absorbed in theboiler. The models and sensitivity analyses are used in a discussion about optimizing the thermal power plant.

关键词： Hybrid model encoder-decoder Time series Automatic differentiation Thermal power plant

来源：评论

学校读者我要写书评

暂无评论

Innovative Disease Management: An encoder-decoder Solution for Tomato Black Mold Detection 4

Innovative Disease Management: An Encoder-Decoder Solution f...

引用

4th IEEE Global Conference for Advancement in Technology, GCAT 2023

作者： Vashisht, Divyam Kukreja, Vinay Sharma, Rishabh Jain, Ayushi Choudhury, Ankur Chitkara University Institute of Engineering and Technology Chitkara University Punjab India Graphic Era Hill University Uttarakhand Dehradun248002 India

ISBN: (纸本)9798350305258

The worldwide spread of tomato black mold disease is a major concern since it reduces crop output and quality. Effective disease control and environmentally responsible farming methods depend on rapid and precise disease identification. Using a dense dilated convolution methodology based on an encoder-decoder pair, the suggested presents a novel method for detecting the black mold illness that can occur in tomatoes in this study. The well-known VGG-16 model is used for feature extraction, and 25, 000 photos of tomatoes that the authors collected themselves are used to train and evaluate the model. Accurate segmentation and localization of infected areas are shown to have been achieved by the model, as evidenced by the high values of the Dice Similarity Coefficient (DSC) and the Intersection over Union (IoU). The model's success in approximating ground truth bounds is further supported by the small Hausdorff Distance. To better ensure the safety of our food supply, our studies aid in the development of precision agriculture by providing farmers with a reliable method of disease detection. Our suggested method offers a viable path for early and precise detection of tomato black mold disease by leveraging the power of modern computer vision techniques and deep learning (DL) models, which will help the agricultural community and promote sustainable crop management practices. © 2023 IEEE.

关键词： Deep Learning Dense Dilated Convolution encoder-decoder Segmentation Tomato black mold disease

来源：评论

学校读者我要写书评

暂无评论

encoder-decoder Neural Network Architecture for solving Job Shop Scheduling Problems using Reinforcement Learning

Encoder-Decoder Neural Network Architecture for solving Job ...

引用

IEEE Symposium Series on Computational Intelligence (IEEE SSCI)

作者： Magalhaes, Ricardo Martins, Miguel Vieira, Susana Santos, Filipe Sousa, Joao Inst Super Tecn IDMEC Lisbon Portugal

ISBN: (纸本)9781728190488

This paper proposes an encoder-decoder neural network architecture with Attention Mechanism for solving the DRC-FJSSP using Deep Q-Learning. In the DRC-FJSSP the number of operations to schedule is problem dependent. Current state-of-the-art reinforcement learning methods arbitrarily simplify the input information to a fixed-size feature input vector. This way, they end up losing relevant problem information for a large enough number of operations. Furthermore, on the one hand, human schedulers tend to optimize production schedules by moving operations individually into more adequate positions in the schedule. On the other hand, the aforementioned state-of-the-art methods apply heuristics recurrently as their optimization procedure. These limitations come as the cost of the neural network architecture, which is limited to fixed-size inputs and outputs. The architecture proposed in this paper is a Recurrent Neural Network, which enables it to work with inputs and outputs of variable sizes. This decisive feature makes it possible for the agent to move a specific operation to a more adequate position in the schedule and receive explicit problem information, such as the processing times of all operations. In the end, this approach proved to be competitive with a state-of-the-art metaheuristic method, the KGFOA. This promising results come even with a limitation in the available computational resources, which only allowed the development of scarcely trained agent.

关键词： Job-Shop Scheduling Reinforcement Learning Deep Q-Learning encoder-decoder Attention Mechanism Pointer Networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：