Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visua...
详细信息
Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visually impaired people by converting the caption to speech to have a better understanding of their surrounding. Though significant amount of research has been conducted for automatic caption generation in other languages, far too little effort has been devoted to Bangla image caption generation. In this paper, we propose an encoder-decoder based model which takes an image as input and generates the corresponding Bangla caption as output. The encoder network consists of a pretrained image feature extractor called ResNet-50, while the decoder network consists of Bidirectional LSTMs for caption generation. The model has been trained and evaluated using a Bangla image captioning dataset named BanglaLekhaImageCaptions. The proposed model achieved a training accuracy of 91% and BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.81, 0.67, 0.57, and 0.51 respectively. Moreover, a comparative study for different pretrained feature extractors such as VGG-16 and Xception is presented. Finally, the proposed model has been deployed on an embedded device for analysing the inference time and power consumption.
With the growing demand for high reliability and safety software, software reliability prediction has attracted more and more attention to identifying potential faults in software. Software reliability growth models (...
详细信息
With the growing demand for high reliability and safety software, software reliability prediction has attracted more and more attention to identifying potential faults in software. Software reliability growth models (SRGMs) are the most commonly used prediction models in practical software reliability engineering. However, their unrealistic assumptions and environment-dependent applicability restrict their development. Recurrent neural networks (RNNs), such as the long short-term memory (LSTM), provide an end-to-end learning method, have shown a remarkable ability in time-series forecasting and can be used to solve the above problem for software reliability prediction. In this paper, we present an attention-based encoder-decoder RNN called EDRNN to predict the number of failures in the software. More specifically, the encoder-decoder RNN estimates the cumulative faults with the fault detection time as input. The attention mechanism improves the prediction accuracy in the encoder-decoder architecture. Experimental results demonstrate that our proposed model outperforms other traditional SRGMs and neural network-based models in terms of accuracy.
Sintering process is a critical step in the ironmaking process. Burn-through point (BTP), as a key performance index of sintering ore, has a great influence on the quality of the sintering product. The existing predic...
详细信息
ISBN:
(纸本)9798350311259
Sintering process is a critical step in the ironmaking process. Burn-through point (BTP), as a key performance index of sintering ore, has a great influence on the quality of the sintering product. The existing prediction methods attempt to use a single model to establish the relationship between variables. However, due to the strong volatility, uncertainty, and multivariable coupling of sintering process, the traditional prediction model cannot produce reliable predictions. In order to deal with the complex characteristics of sintering process, this paper proposes a decomposition-based encoder-decoder modeling framework, in which a sequence decomposition module is designed to decompose the input time series into different sub-sequences. Then, these sub-sequences are constructed by the encoder-decoder models separately. The effectiveness of the proposed multi-step ahead prediction modeling framework was evaluated in a real-world sintering process. Compared with the traditional prediction modeling framework, the proposed modeling framework has more accurate results in multi-step ahead prediction.
Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the ...
详细信息
Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5 km x 4.5 km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.9143 (out of 1.0) in the presence of field noise at 10 dB signal-to-noise ratio.
In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main d...
详细信息
In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both unlabeled speech and text data. The main difficulty in speech-text joint pre-training comes from the significant difference between speech and text modalities, especially for Mandarin speech and text. Unlike English and other languages with an alphabetic writing system, Mandarin uses an ideographic writing system where character and sound are not tightly mapped to one another. Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text. In addition, a much larger amount of unsupervised text data 292G is utilized for pre-training, which brings significant improvements. Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement.
In recent years, video saliency object detection has received more and more attention, and many excellent algorithms have been proposed. In the paper, we propose a new idea of video saliency object detection, named MA...
详细信息
In recent years, video saliency object detection has received more and more attention, and many excellent algorithms have been proposed. In the paper, we propose a new idea of video saliency object detection, named MAED-Net. Our method is mainly divided into two modules: spatial module and temporal module. In spatial module: we use a set of parallel dilated convolutions, and add channel attention to each dilated convolutions. Multi-scale mimics the characteristics of the human retina. Attention is to imitate the human attention mechanism. We combine multi-scale information with attention information, which constitutes the pyramid multi-scale channel attention. Multi-scale channel attention allows us to obtain more precise saliency clues, laying a solid foundation for the next part of the temporal. In temporal module: we use a set of encoder-decoder ConvLSTM with different dilated rates, and we use dense connection and skip connection to blend information of different scales. We evaluate our results on four datasets and compare with twelve algorithms. The experimental results show that our algorithm achieved the state-of-the-arts.
The detection and analysis of Advanced Persistent Threats (APTs) are pivotal for contemporary network security. Provenance graphs, constructed from audit logs, offer a wealth of contextual information to identify and ...
详细信息
Three model configurations are presented for multi-step time series predictions of the heat absorbed by thewater and steam in a thermal power plant. The models predict over horizons of 2, 4, and 6 steps into thefuture...
详细信息
Three model configurations are presented for multi-step time series predictions of the heat absorbed by thewater and steam in a thermal power plant. The models predict over horizons of 2, 4, and 6 steps into thefuture, where each step is a 5-minute increment. The evaluated models are a pure machine learning model, anovel hybrid machine learning and physics-based model, and the hybrid model with an incomplete dataset. Thehybrid model deconstructs the machine learning into individual boiler heat absorption units: economizer, waterwall, superheater, and reheater. Each configuration uses a gated recurrent unit (GRU) or a GRU-based encoder–decoder as the deep learning architecture. Mean squared error is used to evaluate the models compared totarget values. The encoder–decoder architecture is over 11% more accurate than the GRU only models. Thehybrid model with the incomplete dataset highlights the importance of the manipulated variables to the *** hybrid model, compared to the pure machine learning model, is over 10% more accurate on averageover 20 iterations of each model. Automatic differentiation is applied to the hybrid model to perform a localsensitivity analysis to identify the most impactful of the 72 manipulated variables on the heat absorbed in theboiler. The models and sensitivity analyses are used in a discussion about optimizing the thermal power plant.
The worldwide spread of tomato black mold disease is a major concern since it reduces crop output and quality. Effective disease control and environmentally responsible farming methods depend on rapid and precise dise...
详细信息
This paper proposes an encoder-decoder neural network architecture with Attention Mechanism for solving the DRC-FJSSP using Deep Q-Learning. In the DRC-FJSSP the number of operations to schedule is problem dependent. ...
详细信息
ISBN:
(纸本)9781728190488
This paper proposes an encoder-decoder neural network architecture with Attention Mechanism for solving the DRC-FJSSP using Deep Q-Learning. In the DRC-FJSSP the number of operations to schedule is problem dependent. Current state-of-the-art reinforcement learning methods arbitrarily simplify the input information to a fixed-size feature input vector. This way, they end up losing relevant problem information for a large enough number of operations. Furthermore, on the one hand, human schedulers tend to optimize production schedules by moving operations individually into more adequate positions in the schedule. On the other hand, the aforementioned state-of-the-art methods apply heuristics recurrently as their optimization procedure. These limitations come as the cost of the neural network architecture, which is limited to fixed-size inputs and outputs. The architecture proposed in this paper is a Recurrent Neural Network, which enables it to work with inputs and outputs of variable sizes. This decisive feature makes it possible for the agent to move a specific operation to a more adequate position in the schedule and receive explicit problem information, such as the processing times of all operations. In the end, this approach proved to be competitive with a state-of-the-art metaheuristic method, the KGFOA. This promising results come even with a limitation in the available computational resources, which only allowed the development of scarcely trained agent.
暂无评论