检索结果-内蒙古大学图书馆

A Bi-GRU-based encoder-decoder framework for multivariate time series forecasting

SOFT COMPUTING 2024年第9-10期28卷 6775页

作者： Balti, Hanen Ben Abbes, Ali Farah, Imed Riadh Univ Manouba RIADI Lab ENSI Manouba 2010 Tunisia

Drought forecasting is crucial for minimizing the effects of drought, alerting people to its dangers, and assisting decision-makers in taking preventative action. This article suggests an encoder-decoder framework for multivariate times series (EDFMTS) forecasting. EDFMTS is composed of three layers: a temporal attention context layer, a gated recurrent unit (GRU)-based decoder component, and a bidirectional gated recurrent unit (Bi-GRU)-based encoder component. The proposed framework was evaluated usingmultivariate gathered from various sources in China (remote-sensing sensors, climate sensors, biophysical sensors, and so on). According to experimental results, the proposed framework outperformed the baselinemethods in univariate and multivariate times series (TS) forecasting. The correlation coefficient of determination (R-2), root-meansquared error (RMSE), and the mean absolute error (MAE) were used for the evaluation of the framework performance. The R-2, RMSE, and MAE are 0.94, 0.20, and 0.13, respectively, for EDFMTS. In contrast, the RMSE provided by autoregressive integrated moving average (ARIMA), PROPHET, long short-term memory (LSTM), GRU and convolutional neural network (CNN)-LSTM are 0.72, 0.92, 0.36, 0.40, and 0.27, respectively.

关键词： Deep learning Multivariate time series encoder-decoder Drought forecasting

来源：评论

学校读者我要写书评

暂无评论

Rwkv-vg: visual grounding with RWKV-driven encoder-decoder framework

引用

MULTIMEDIA SYSTEMS 2025年第2期31卷 1-12页

作者： Nian, Fudong Gu, Yanhong Wang, Wentao Liu, Aoyu Zhang, Dong Li, Fanding Hefei Univ Sch Adv Mfg Engn Hefei Peoples R China Hefei Univ Anhui Prov Engn Technol Res Ctr Intelligent Vehicl Hefei Peoples R China Unisound AI Technol Co Ltd Beijing Peoples R China Quectel Wireless Solut Co Ltd Shanghai Peoples R China Jilin Univ State Key Lab Automot Simulat & Control Changchun Peoples R China Jiangnan Univ Sch Intelligent Mfg Wuxi Peoples R China

Visual grounding is a fundamental task that bridges vision and language, aiming to accurately associate natural language queries with specific regions in an image. Existing approaches, predominantly based on Transformers or CNNs, struggle with balancing computational efficiency and fine-grained semantic alignment. In this paper, we propose RWKV-VG, the first visual grounding framework entirely built on the RWKV architecture. Leveraging RWKV's unique ability to combine RNN-like sequential modeling with Transformer-like attention, our model efficiently achieves both intra-modal and cross-modal reasoning. The framework consists of a RWKV-based visual encoder, a RWKV-based linguistic encoder, and a RWKV-based visual-linguistic decoder, complemented by a learnable [REG] token designed for box regression. Comprehensive evaluations on benchmark datasets, including ReferItGame and the RefCOCO series, demonstrate the superiority of RWKV-VG, achieving state-of-the-art performance with rapid convergence. Ablation studies further confirm the effectiveness of the RWKV modules and the [REG] token design. Our work establishes RWKV as a compelling alternative to conventional architectures for visual grounding tasks. To facilitate future research, the code and pre-trained models are released at https://***/nianfd/RWKV-VG.

关键词： Visual grounding RWKV encoder-decoder Cross-modal learning

来源：评论

学校读者我要写书评

暂无评论

TSEDNet:Task-specific encoder-decoder network for surface defects of strip steel

引用

MEASUREMENT 2025年 239卷

作者： Guo, Yuyang Wei, Jingliang Feng, Xinglong Shenyang Univ Technol Sch Artificial Intelligence Shenyang 110870 Peoples R China BYD Automobile Ind Co Ltd China 3009 BYD Rd Shenzhen 518118 Guangdong Peoples R China

Deep learning faces challenges in the surface defect segmentation of strip steel. Firstly, insufficient processing of feature maps leads to the loss of task-specific feature information. Secondly, the segmentation of defects with long-tail distributions is not accurate enough. To address these issues, a pixel-level deep segmentation method called task-specific encoder-decoder network (TSEDNet) is proposed to construct an end-to-end defect segmentation model. TSEDNet includes the encoder-multi-decoder structure based on domain knowledge settings tailored to specific tasks, which can achieve effective feature representation and significantly reduce the impact of imbalanced defect quantities. Additionally, a novel metric learning method is introduced to optimize decoder selection. Furthermore, the feature fusion module based on metric learning is proposed to utilize general features for restoring task-specific details, thereby enhancing pixel-level segmentation accuracy. Through experiments and industrial validation, the defect segmentation network demonstrates superior performance compared to other advanced segmentation methods and proves its applicability in practical scenarios.

关键词： encoder-decoder Metric learning Domain knowledge Surface defect detection

来源：评论

学校读者我要写书评

暂无评论

MSDFNet: multi-scale detail feature fusion encoder-decoder network for self-supervised monocular thermal image depth estimation

引用

MEASUREMENT SCIENCE AND TECHNOLOGY 2025年第1期36卷 016039-016039页

作者： Kong, Lingjun Zheng, Qianhui Wang, Wenju Shanghai Publishing & Printing Coll Shanghai 200093 Peoples R China Univ Shanghai Sci & Technol Coll Publishing Shanghai 200093 Peoples R China

Currently available thermal image depth estimation methods are difficult to efficiently extract fine multi-scale feature information from thermal images and suffer from the problem of blurring details at the edges of the estimated depth map. To address these challenges, this paper proposes MSDFNet, a multi-scale detail feature fusion encoder-decoder network, for self-supervised monocular thermal image depth estimation. The model is based on a channel expansion hourglass residual lightweight feature encoder, which can capture rich and fine-grained multi-scale feature information with low computational effort. MSDFNet utilizes a detail feature weight evaluation decoder to fuse cross-scale features and reevaluate the importance of each feature, thereby emphasizing critical edge information at multiple scales. Additionally, MSDFNet incorporates a depth consistency loss function, which provides self-supervised signals for the detailed features of thermal images and improves the optimization of network performance. The method is applied to the ViViD++ and MS2 datasets and achieves state-of-the-art depth estimation performance compared to existing state-of-the-art algorithms. In the Indoor Dark scenario of the ViViD++ dataset, the Abs Rel, Sq Rel, RMSE, and RMSE log error metric values of MSDFNet are reduced by 6.71%, 11.92%, 9.09%, and 5.73%, respectively, while the accuracy metric values delta < 1.25(i), i = 1,2,3 were improved by 4.18%, 1.13%, and 0.2%, respectively. In addition, MSDFNet proves its excellent generalization ability on the MS2 dataset. The Abs Rel and RMSE error values in the night scene are reduced by 45.6% and 30.09%, respectively, and the accuracy delta < 1.25(i), i = 1,3 is improved by 20.95% and 1.33%, respectively. The Abs Rel and RMSE values in the rainy day scenario are reduced by 1.33% and 1.21%, respectively, and the accuracy delta < 1.25(i),i = 1,3 is improved by 0.24% and 0.83%, respectively.

关键词： multi-scale features edge detail encoder-decoder thermal image depth estimation

来源：评论

学校读者我要写书评

暂无评论

Feature-enhanced encoder-decoder model for accurate lithium-ion battery state of health estimation

引用

JOURNAL OF ENERGY STORAGE 2025年 119卷

作者： Wu, Ju Wei, Zheng Wu, Mingwei Shen, Zhonghui He, Qiu Zhao, Yan Wuhan Univ Technol Int Sch Mat Sci & Engn Sch Mat Sci & Engn Wuhan 430070 Peoples R China Sichuan Univ Coll Mat Sci & Engn Chengdu 610065 Peoples R China Karlsruhe Inst Technol Inst Nanotechnol Hermann Von Helmholtz Pl 1 D-76344 Eggenstein Leopoldshafen Germany Wuhan Univ Inst Technol Sci Wuhan 430072 Peoples R China

Accurately estimating the state of health (SOH) of lithium-ion batteries is crucial for optimizing battery management systems, extending battery lifespan, and improving energy efficiency. This study proposes an encoderdecoder model based on feature enhancement to improve estimation accuracy by introducing prior knowledge into directly measured data. Unlike models that rely solely on voltage data, this approach integrates valuable prior information from incremental capacity analysis with the intrinsic characteristics of voltage data, presenting a novel approach for battery SOH estimation. Ablation experiments conducted on three publicly available datasets demonstrate that features enhancement can significantly reduce the estimation error, achieving a remarkably low Root Mean Square Error (RMSE) of 0.19 %, which surpasses traditional models, such as support vector regression (0.35 %) and k-nearest neighbors (1.89 %). The study underscores that this model not only improves SOH estimation accuracy but also validates the effectiveness of feature enhancement technique.

关键词： Lithium-ion batteries State of health estimation Feature enhancement encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Dynamic modeling of post-combustion carbon capture process based on multi-gate mixture-of-experts incorporating dual-stage attention-based encoder-decoder network

引用

APPLIED THERMAL ENGINEERING 2025年 258卷

作者： Zheng, Cheng Sha, Peng Mo, Zhengyang Tang, Zihan Wang, Meihong Wu, Xiao Southeast Univ Natl Engn Res Ctr Power Generat Control & Safety Sch Energy & Environm Nanjing 210096 Peoples R China Univ Sheffield Dept Chem & Biol Engn Sheffield S1 3JD England

Solvent-based post-combustion carbon capture (PCC) technology is a promising, near-term solution for decarbonizing power generation and industrial facilities. Model-based process simulation is crucial for the optimal design and operation of the PCC process. Recently, data-driven models have gained attention due to their adaptability, efficient computation and high accuracy. However, the nonlinearity, strong couplings and multitime scale features of the PCC process pose significant challenges for model identification. To this end, this paper proposes a multi-gate mixture-of-experts incorporating dual-stage attention-based encoder-decoder (MMoE-DAED) network for dynamic modeling of the PCC process under wide operating conditions. An encoder-decoder composed of long short-term memory (LSTM) network is employed to extract features from the time-dependent input data and learn the complex dynamic interactions caused by the inertial and delay properties of the process. Dual-stage attention mechanism is incorporated into the encoder and decoder respectively to select the most relevant input features and their correlations within the time series data. To enhance multioutput prediction accuracy, multi-gate mixture-of-experts (MMoE) framework that considers correlations of multitask learning is implemented. Simulation results using operating data from a PCC experimental setup indicate that the proposed modeling approach accurately predicts the steady-state values and dynamic trends of the CO2 capture rate and stripper bottom temperature over a wide operating range. The RMSE, MAPE and R2 indices for the CO2 capture rate are 2.1592, 0.0295, 0.9641, respectively, and for the stripper bottom temperature are 0.1491, 0.0003, 0.9833, respectively. Validations on a PCC simulator further verify the accuracy and efficiency of the MMoE-DAED model, which enables an 80.87% reduction in computation time compared to the simulator. This paper points to a new direction for the data-driven dyna

关键词： Post-combustion carbon capture Dynamic modeling Multi-output prediction Multi-gate mixture-of-experts encoder-decoder Dual-stage attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Advancing soil property prediction with encoder-decoder structures integrating traditional deep learning methods in Vis-NIR spectroscopy

引用

GEODERMA 2024年 449卷

作者： Ke, Ziyi Ren, Shilin Yin, Liang Beijing Univ Chem Technol Coll Math & Phys Beijing 100029 Peoples R China Beijing Key Lab Environmentally Harmful Chem Anal Beijing 100029 Peoples R China

The technology for estimating soil properties using visible and near-infrared spectroscopy has been maturing, with corresponding advances and breakthroughs in deep learning models. In this study, based on the large soil spectral library LUCAS, we explore the potential of encoder-decoder structures to improve convolutional neural network regression predictions. By introducing an encoder-decoder structure into the feature channels of a sixlayer CNN model (TRNN model), we significantly enhanced the performance of shallow CNN models and successfully carried out regression predictions for seven soil properties. We employed IntegratedGradients, DeepLift, GradientShap, and DeepLiftShap methods to interpret the output of the TRNN model. Our TRNN model, built on raw spectra, demonstrated high accuracy in predicting multiple soil properties, outperforming residual architectures, LSTMs, various CNN architectures, and other traditional machine learning methods proposed in previous studies. We also investigated the impact of multi-task output structures (TRNN 1-M and TRNN M-M) and single-task output structures (TRNN 1-1) on model performance. For the TRNN model with an encoder-decoder structure, multi-task output structures resulted in a reduction in performance. The TRNN showed outstanding results in regression analysis of the seven soil properties selected in this study (cation exchange capacity, organic carbon content, calcium carbonate content, pH, clay content, silt content, and sand content), with R2 values exceeding 0.93 for all seven properties. Different soil characteristics correspond to different wavelengths, with multiple characteristic peaks commonly observed. This research convincingly demonstrates the enormous potential of combining large model architectures with traditional deep learning approaches for predicting soil properties, which could significantly advance precision agriculture.

关键词： Deep learning encoder-decoder Feature wavelengths LUCAS topsoil dataset Soil properties

来源：评论

学校读者我要写书评

暂无评论

Self-Attention (SA)-ConvLSTM encoder-decoder Structure-Based Video Prediction for Dynamic Motion Estimation

引用

APPLIED SCIENCES-BASEL 2024年第23期14卷 11315页

作者： Kim, Jeongdae Choo, Hyunseung Jeong, Jongpil Sungkyunkwan Univ Dept AI Syst Engn 2066 Seobu Ro Suwon 16419 Gyeonggi Do South Korea Sungkyunkwan Univ Dept Smart Factory Convergence 2066 Seobu Ro Suwon 16419 Gyeonggi Do South Korea

Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool in various fields, several deep learning models have been proposed. Convolutional long short-term memory (ConvLSTM) can capture space and time simultaneously and has shown excellent performance in various applications, such as image and video prediction, object detection, and semantic segmentation. However, ConvLSTM has limitations in capturing long-term temporal dependencies. To solve this problem, this study proposes an encoder-decoder structure using self-attention ConvLSTM (SA-ConvLSTM), which retains the advantages of ConvLSTM and effectively captures the long-range dependencies through the self-attention mechanism. The effectiveness of the encoder-decoder structure using SA-ConvLSTM was validated through experiments on the MovingMNIST, KTH dataset.

关键词： SA-ConvLSTM encoder-decoder video prediction spatiotemporal self-attention memory module

来源：评论

学校读者我要写书评

暂无评论

Deep Quasi-Recurrent Self-Attention With Dual encoder-decoder in Biomedical CT Image Segmentation

引用

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS 2024年第12期28卷 7195-7205页

作者： Agarwal, Rohit Chowdhury, Arindam Chatterjee, Rajib Kumar Chel, Haradhan Murmu, Chiranjib Murmu, Narayan Nandi, Debashis Natl Inst Technol Dept Comp Sci & Engn Durgapur 713209 India Cent Inst Technol Dept Elect & Commun Engn Kokrajhar 783370 India Diamond Harbour Govt Med Coll Diamond Harbour 743331 India

Developing deep learning models for accurate segmentation of biomedical CT images is challenging due to their complex structures, anatomy variations, noise, and unavailability of sufficient labeled data to train the models. There are many models in the literature, but the researchers are yet to be satisfied with their performance in analyzing biomedical Computed Tomography (CT) images. In this article, we pioneer a deep quasi-recurrent self-attention structure that works with a dual encoder-decoder. The proposed novel deep quasi-recurrent self-attention architecture evokes parameter reuse capability that offers consistency in learning and quick convergence of the model. Furthermore, the quasi-recurrent structure leverages the features acquired from the previous time points and elevates the segmentation quality. The model also efficiently addresses long-range dependencies through a selective focus on contextual information and hierarchical representation. Moreover, the dynamic and adaptive operation, incremental and efficient information processing of the deep quasi-recurrent self-attention structure leads to improved generalization across different scales and levels of abstraction. Along with the model, we innovate a new training strategy that fits with the proposed deep quasi-recurrent self-attention architecture. The model performance is evaluated on various publicly available CT scan datasets and compared with state-of-the-art models. The result shows that the proposed model outperforms them in segmentation quality and training speed. The model can assist physicians in improving the accuracy of medical diagnoses.

关键词： deep learning encoder-decoder Biomedical CT image segmentation self-attention self-attention U-Net U-Net self-attention U-Net

来源：评论

学校读者我要写书评

暂无评论

Real-time assessment on health state for bearing based on parallel encoder-decoder observer

引用

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL 2024年第5期40卷 2276-2291页

作者： Li, Kunpeng Mi, Jinhua Wang, Zhiguo Yin, Shengjie Bai, Libing Qiu, Gen Univ Elect Sci & Technol China Sch Automat Engn Chengdu Peoples R China Univ Elect Sci & Technol China Ctr Syst Reliabil & Safety Chengdu Peoples R China Shanghai Space Prop Technol Res Inst Dept SRM Design Shanghai Peoples R China

Bearings are foundational supporting components in diverse mechanical systems, essential for the reliable operation of these systems through real-time monitoring and precise health state assessment. However, vibration signals from bearings in practical equipment often contain excessive noise and redundant information, complicating health state assessment. To address this challenge, this paper proposes a neural network-based method named parallel encoder-decoder (PED). This method features a parallel architecture that combines the long short-term memory network and the temporal convolutional network for the encoder, along with a self-attention module for the decoder. PED is adept at learning the temporal representations hidden in original signals and filtering vibration signals to remove noise and redundant information. Additionally, a multi-objective loss function is developed to enhance the prediction results. A normalized Mahalanobis distance-based metric is then employed to compare residual signals during bearing operation with those under normal conditions. The case study evaluates the PED observer's proficiency in accurately predicting vibration signals and assessing the performance of health indicator curves, demonstrating the proposed PED observer's superiority over conventional networks.

关键词： bearing health state assessment encoder-decoder Mahalanobis distance multi-objective loss function

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：