Drought forecasting is crucial for minimizing the effects of drought, alerting people to its dangers, and assisting decision-makers in taking preventative action. This article suggests an encoder-decoder framework for...
详细信息
Drought forecasting is crucial for minimizing the effects of drought, alerting people to its dangers, and assisting decision-makers in taking preventative action. This article suggests an encoder-decoder framework for multivariate times series (EDFMTS) forecasting. EDFMTS is composed of three layers: a temporal attention context layer, a gated recurrent unit (GRU)-based decoder component, and a bidirectional gated recurrent unit (Bi-GRU)-based encoder component. The proposed framework was evaluated usingmultivariate gathered from various sources in China (remote-sensing sensors, climate sensors, biophysical sensors, and so on). According to experimental results, the proposed framework outperformed the baselinemethods in univariate and multivariate times series (TS) forecasting. The correlation coefficient of determination (R-2), root-meansquared error (RMSE), and the mean absolute error (MAE) were used for the evaluation of the framework performance. The R-2, RMSE, and MAE are 0.94, 0.20, and 0.13, respectively, for EDFMTS. In contrast, the RMSE provided by autoregressive integrated moving average (ARIMA), PROPHET, long short-term memory (LSTM), GRU and convolutional neural network (CNN)-LSTM are 0.72, 0.92, 0.36, 0.40, and 0.27, respectively.
Visual grounding is a fundamental task that bridges vision and language, aiming to accurately associate natural language queries with specific regions in an image. Existing approaches, predominantly based on Transform...
详细信息
Visual grounding is a fundamental task that bridges vision and language, aiming to accurately associate natural language queries with specific regions in an image. Existing approaches, predominantly based on Transformers or CNNs, struggle with balancing computational efficiency and fine-grained semantic alignment. In this paper, we propose RWKV-VG, the first visual grounding framework entirely built on the RWKV architecture. Leveraging RWKV's unique ability to combine RNN-like sequential modeling with Transformer-like attention, our model efficiently achieves both intra-modal and cross-modal reasoning. The framework consists of a RWKV-based visual encoder, a RWKV-based linguistic encoder, and a RWKV-based visual-linguistic decoder, complemented by a learnable [REG] token designed for box regression. Comprehensive evaluations on benchmark datasets, including ReferItGame and the RefCOCO series, demonstrate the superiority of RWKV-VG, achieving state-of-the-art performance with rapid convergence. Ablation studies further confirm the effectiveness of the RWKV modules and the [REG] token design. Our work establishes RWKV as a compelling alternative to conventional architectures for visual grounding tasks. To facilitate future research, the code and pre-trained models are released at https://***/nianfd/RWKV-VG.
Deep learning faces challenges in the surface defect segmentation of strip steel. Firstly, insufficient processing of feature maps leads to the loss of task-specific feature information. Secondly, the segmentation of ...
详细信息
Deep learning faces challenges in the surface defect segmentation of strip steel. Firstly, insufficient processing of feature maps leads to the loss of task-specific feature information. Secondly, the segmentation of defects with long-tail distributions is not accurate enough. To address these issues, a pixel-level deep segmentation method called task-specific encoder-decoder network (TSEDNet) is proposed to construct an end-to-end defect segmentation model. TSEDNet includes the encoder-multi-decoder structure based on domain knowledge settings tailored to specific tasks, which can achieve effective feature representation and significantly reduce the impact of imbalanced defect quantities. Additionally, a novel metric learning method is introduced to optimize decoder selection. Furthermore, the feature fusion module based on metric learning is proposed to utilize general features for restoring task-specific details, thereby enhancing pixel-level segmentation accuracy. Through experiments and industrial validation, the defect segmentation network demonstrates superior performance compared to other advanced segmentation methods and proves its applicability in practical scenarios.
Currently available thermal image depth estimation methods are difficult to efficiently extract fine multi-scale feature information from thermal images and suffer from the problem of blurring details at the edges of ...
详细信息
Currently available thermal image depth estimation methods are difficult to efficiently extract fine multi-scale feature information from thermal images and suffer from the problem of blurring details at the edges of the estimated depth map. To address these challenges, this paper proposes MSDFNet, a multi-scale detail feature fusion encoder-decoder network, for self-supervised monocular thermal image depth estimation. The model is based on a channel expansion hourglass residual lightweight feature encoder, which can capture rich and fine-grained multi-scale feature information with low computational effort. MSDFNet utilizes a detail feature weight evaluation decoder to fuse cross-scale features and reevaluate the importance of each feature, thereby emphasizing critical edge information at multiple scales. Additionally, MSDFNet incorporates a depth consistency loss function, which provides self-supervised signals for the detailed features of thermal images and improves the optimization of network performance. The method is applied to the ViViD++ and MS2 datasets and achieves state-of-the-art depth estimation performance compared to existing state-of-the-art algorithms. In the Indoor Dark scenario of the ViViD++ dataset, the Abs Rel, Sq Rel, RMSE, and RMSE log error metric values of MSDFNet are reduced by 6.71%, 11.92%, 9.09%, and 5.73%, respectively, while the accuracy metric values delta < 1.25(i), i = 1,2,3 were improved by 4.18%, 1.13%, and 0.2%, respectively. In addition, MSDFNet proves its excellent generalization ability on the MS2 dataset. The Abs Rel and RMSE error values in the night scene are reduced by 45.6% and 30.09%, respectively, and the accuracy delta < 1.25(i), i = 1,3 is improved by 20.95% and 1.33%, respectively. The Abs Rel and RMSE values in the rainy day scenario are reduced by 1.33% and 1.21%, respectively, and the accuracy delta < 1.25(i),i = 1,3 is improved by 0.24% and 0.83%, respectively.
Accurately estimating the state of health (SOH) of lithium-ion batteries is crucial for optimizing battery management systems, extending battery lifespan, and improving energy efficiency. This study proposes an encode...
详细信息
Accurately estimating the state of health (SOH) of lithium-ion batteries is crucial for optimizing battery management systems, extending battery lifespan, and improving energy efficiency. This study proposes an encoderdecoder model based on feature enhancement to improve estimation accuracy by introducing prior knowledge into directly measured data. Unlike models that rely solely on voltage data, this approach integrates valuable prior information from incremental capacity analysis with the intrinsic characteristics of voltage data, presenting a novel approach for battery SOH estimation. Ablation experiments conducted on three publicly available datasets demonstrate that features enhancement can significantly reduce the estimation error, achieving a remarkably low Root Mean Square Error (RMSE) of 0.19 %, which surpasses traditional models, such as support vector regression (0.35 %) and k-nearest neighbors (1.89 %). The study underscores that this model not only improves SOH estimation accuracy but also validates the effectiveness of feature enhancement technique.
Solvent-based post-combustion carbon capture (PCC) technology is a promising, near-term solution for decarbonizing power generation and industrial facilities. Model-based process simulation is crucial for the optimal ...
详细信息
Solvent-based post-combustion carbon capture (PCC) technology is a promising, near-term solution for decarbonizing power generation and industrial facilities. Model-based process simulation is crucial for the optimal design and operation of the PCC process. Recently, data-driven models have gained attention due to their adaptability, efficient computation and high accuracy. However, the nonlinearity, strong couplings and multitime scale features of the PCC process pose significant challenges for model identification. To this end, this paper proposes a multi-gate mixture-of-experts incorporating dual-stage attention-based encoder-decoder (MMoE-DAED) network for dynamic modeling of the PCC process under wide operating conditions. An encoder-decoder composed of long short-term memory (LSTM) network is employed to extract features from the time-dependent input data and learn the complex dynamic interactions caused by the inertial and delay properties of the process. Dual-stage attention mechanism is incorporated into the encoder and decoder respectively to select the most relevant input features and their correlations within the time series data. To enhance multioutput prediction accuracy, multi-gate mixture-of-experts (MMoE) framework that considers correlations of multitask learning is implemented. Simulation results using operating data from a PCC experimental setup indicate that the proposed modeling approach accurately predicts the steady-state values and dynamic trends of the CO2 capture rate and stripper bottom temperature over a wide operating range. The RMSE, MAPE and R2 indices for the CO2 capture rate are 2.1592, 0.0295, 0.9641, respectively, and for the stripper bottom temperature are 0.1491, 0.0003, 0.9833, respectively. Validations on a PCC simulator further verify the accuracy and efficiency of the MMoE-DAED model, which enables an 80.87% reduction in computation time compared to the simulator. This paper points to a new direction for the data-driven dyna
The technology for estimating soil properties using visible and near-infrared spectroscopy has been maturing, with corresponding advances and breakthroughs in deep learning models. In this study, based on the large so...
详细信息
The technology for estimating soil properties using visible and near-infrared spectroscopy has been maturing, with corresponding advances and breakthroughs in deep learning models. In this study, based on the large soil spectral library LUCAS, we explore the potential of encoder-decoder structures to improve convolutional neural network regression predictions. By introducing an encoder-decoder structure into the feature channels of a sixlayer CNN model (TRNN model), we significantly enhanced the performance of shallow CNN models and successfully carried out regression predictions for seven soil properties. We employed IntegratedGradients, DeepLift, GradientShap, and DeepLiftShap methods to interpret the output of the TRNN model. Our TRNN model, built on raw spectra, demonstrated high accuracy in predicting multiple soil properties, outperforming residual architectures, LSTMs, various CNN architectures, and other traditional machine learning methods proposed in previous studies. We also investigated the impact of multi-task output structures (TRNN 1-M and TRNN M-M) and single-task output structures (TRNN 1-1) on model performance. For the TRNN model with an encoder-decoder structure, multi-task output structures resulted in a reduction in performance. The TRNN showed outstanding results in regression analysis of the seven soil properties selected in this study (cation exchange capacity, organic carbon content, calcium carbonate content, pH, clay content, silt content, and sand content), with R2 values exceeding 0.93 for all seven properties. Different soil characteristics correspond to different wavelengths, with multiple characteristic peaks commonly observed. This research convincingly demonstrates the enormous potential of combining large model architectures with traditional deep learning approaches for predicting soil properties, which could significantly advance precision agriculture.
Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address ...
详细信息
Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool in various fields, several deep learning models have been proposed. Convolutional long short-term memory (ConvLSTM) can capture space and time simultaneously and has shown excellent performance in various applications, such as image and video prediction, object detection, and semantic segmentation. However, ConvLSTM has limitations in capturing long-term temporal dependencies. To solve this problem, this study proposes an encoder-decoder structure using self-attention ConvLSTM (SA-ConvLSTM), which retains the advantages of ConvLSTM and effectively captures the long-range dependencies through the self-attention mechanism. The effectiveness of the encoder-decoder structure using SA-ConvLSTM was validated through experiments on the MovingMNIST, KTH dataset.
Developing deep learning models for accurate segmentation of biomedical CT images is challenging due to their complex structures, anatomy variations, noise, and unavailability of sufficient labeled data to train the m...
详细信息
Developing deep learning models for accurate segmentation of biomedical CT images is challenging due to their complex structures, anatomy variations, noise, and unavailability of sufficient labeled data to train the models. There are many models in the literature, but the researchers are yet to be satisfied with their performance in analyzing biomedical Computed Tomography (CT) images. In this article, we pioneer a deep quasi-recurrent self-attention structure that works with a dual encoder-decoder. The proposed novel deep quasi-recurrent self-attention architecture evokes parameter reuse capability that offers consistency in learning and quick convergence of the model. Furthermore, the quasi-recurrent structure leverages the features acquired from the previous time points and elevates the segmentation quality. The model also efficiently addresses long-range dependencies through a selective focus on contextual information and hierarchical representation. Moreover, the dynamic and adaptive operation, incremental and efficient information processing of the deep quasi-recurrent self-attention structure leads to improved generalization across different scales and levels of abstraction. Along with the model, we innovate a new training strategy that fits with the proposed deep quasi-recurrent self-attention architecture. The model performance is evaluated on various publicly available CT scan datasets and compared with state-of-the-art models. The result shows that the proposed model outperforms them in segmentation quality and training speed. The model can assist physicians in improving the accuracy of medical diagnoses.
Bearings are foundational supporting components in diverse mechanical systems, essential for the reliable operation of these systems through real-time monitoring and precise health state assessment. However, vibration...
详细信息
Bearings are foundational supporting components in diverse mechanical systems, essential for the reliable operation of these systems through real-time monitoring and precise health state assessment. However, vibration signals from bearings in practical equipment often contain excessive noise and redundant information, complicating health state assessment. To address this challenge, this paper proposes a neural network-based method named parallel encoder-decoder (PED). This method features a parallel architecture that combines the long short-term memory network and the temporal convolutional network for the encoder, along with a self-attention module for the decoder. PED is adept at learning the temporal representations hidden in original signals and filtering vibration signals to remove noise and redundant information. Additionally, a multi-objective loss function is developed to enhance the prediction results. A normalized Mahalanobis distance-based metric is then employed to compare residual signals during bearing operation with those under normal conditions. The case study evaluates the PED observer's proficiency in accurately predicting vibration signals and assessing the performance of health indicator curves, demonstrating the proposed PED observer's superiority over conventional networks.
暂无评论