检索结果-内蒙古大学图书馆

Enhancing multi-step-ahead algal bloom forecasts in river ecosystems by a hybrid recursive deep learning model

HYDROLOGY RESEARCH 2025年第3期56卷 260-278页

作者： Xu, Hanbing Wang, Xue Zhou, Yanlai Xia, Tianyu Chang, Fi-John Xu, Chong-Yu Wuhan Univ State Key Lab Water Resources Engn & Management Wuhan 430072 Peoples R China Changjiang Water Resources Commiss River & Lake Protect & Construction Safety Operat Wuhan 430010 Peoples R China Natl Taiwan Univ Dept Bioenvironm Syst Engn Taipei 10617 Taiwan Univ Oslo Dept Geosci POB1047 Blindern N-0316 Oslo Norway

River algal blooms pose a significant environmental threat, necessitating accurate forecasts and timely warnings for effective prevention. This study proposes a novel hybrid model, combining an external recursive long short-term memory neural network based on encoder-decoder (RLSTM-ED) with a backpropagation (BP) neural network, denoted as RLSTM-ED-BP. A dataset comprising 34,992 hydrological, climatic, and water quality (4-hourly) observations from the Hanjiang River Basin in China was divided for model training and testing. Comparative analysis with an RLSTM baseline demonstrated that the RLSTM-ED-BP model enhanced the Nash-Sutcliffe coefficient (NSE) by more than 5% and reduced the root mean square error by over 10% during the 24-h forecast horizon. The RLSTM-ED-BP model yielded NSE and threat score values exceeding 0.95 and efficiently provided early warnings for algal bloom events. The model's enhanced performance contributes to the generalizability of deep learning approaches in addressing the critical environmental challenge of algal blooms.

关键词： algal bloom prediction early warning encoder-decoder architecture Hanjiang River Basin recursive strategy

来源：评论

学校读者我要写书评

暂无评论

A graph attention-based policy gradient method with an adaptive embedding strategy for k-center problems

引用

APPLIED SOFT COMPUTING 2025年 173卷

作者： Zhao, Zhonghao Lee, Carman K. M. Yan, Xiaoyuan Hong Kong Polytech Univ Dept Ind & Syst Engn Hong Kong Peoples R China Lab Artificial Intelligence Design Hong Kong Peoples R China

The k-center problem (KCP) is a well-known NP-hard combinatorial optimization challenge in the field of computer science and operations research, aiming to determine optimal locations for k centers within a given set of nodes to minimize the maximum distance from each node to its nearest center. In contrast to conventional algorithms that have inherent limitations in handling the trade-off between solution quality and computational efficiency, this study proposes a new method based on a graph attention mechanism with an encoder-decoder architecture to find high-quality solutions for KCPs by directly learning heuristics from the graph. Specifically, the encoder processes the input feature of the graph and capture intricate spatial patterns and dependencies among nodes, whereas the decoder leverages the encoded information and attention weights to iteratively generate solutions for the KCP. Moreover, an adaptive embedding strategy is developed to handle the specific attributes and constraints inherent in different KCP instances. To find high-quality solutions, a policy gradient method with an exponential moving average baseline is developed to update and learn the optimal model parameters. A comprehensive set of experiments on multiple problem sizes are conducted to systematically compared the performance of the proposed method with a wide range of baseline methods across four types of KCPs, including the standard KCP, capacitated KCP, non-uniform KCP, and dynamic KCP. The experimental results demonstrate the competitive performance of the graph attention-based method in addressing KCPs.

关键词： Graph attention K -center problem encoder-decoder architecture Policy gradient method

来源：评论

学校读者我要写书评

暂无评论

Innovative multistep and synchronous soft sensing prediction of COD and NH3 in WWTPs via multimodal data and multiple attention mechanisms

引用

WATER RESEARCH 2025年 278卷 123405页

作者： Li, Junchen Lin, Sijie Zhang, Liang Zhong, Lijin Ding, Longzhen Hu, Qing Harbin Inst Technol Sch Environm Harbin 150090 Peoples R China Southern Univ Sci & Technol Sch Environm Sci & Engn Shenzhen 518055 Peoples R China Southern Univ Sci & Technol Engn Innovat Ctr SUSTech Beijing Beijing 100083 Peoples R China Beijing Univ Technol Fac Environm & Life Beijing 100124 Peoples R China Minist Educ China Engn Res Ctr Intelligence Percept & Autonomous Con Beijing 100124 Peoples R China

Accurate prediction of Chemical Oxygen Demand (COD) and ammonia nitrogen (NH3) is crucial for maintaining stable and effective wastewater treatment processes. Traditional methods rely on costly, high-maintenance sensors, limiting their application in resource-limited wastewater treatment plants. Soft sensing methods provide an alternative by reducing dependence on costly sensors. However, existing approaches cannot perform multitarget and multistep predictions, limiting their practical applicability. This study introduced a novel triple attention-enhanced encoder-decoder temporal convolutional network (TAED-TCN) to address this problem. The model used multimodal inputs, including easily accessible water quality parameters and wastewater surface images, for multistep and synchronous prediction of COD and NH3. When it was validated with real-world sequencing batch reactor wastewater data, the model demonstrated superior multistep prediction performance. Specifically, the R2 for 1-h predictions of COD and NH3 was over 26.03 % and 20.51 % higher than the baseline model, respectively. By incorporating multiple attention mechanisms (feature, temporal, and crossattention), TAED-TCN effectively captured essential features, model nonlinear relationships, and identified long-term dependencies, thus enabled consistent multitarget prediction results even under abnormal conditions. Additionally, economic analysis revealed that TAED-TCN could reduce COD and NH3 monitoring costs by 79 % over the equipment life cycle. This study offers a cost-effective solution for water quality prediction, enhancing the operational efficiency of wastewater management.

关键词： Water quality prediction Soft sensing Multiple attention mechanism encoder-decoder architecture Multimodal data

来源：评论

学校读者我要写书评

暂无评论

ST-3DView: Multi-Scale Contrast-Enhanced 3D Point Cloud Reconstruction of Single-View Objects From Video Scene Transition

引用

IEEE ACCESS 2025年 13卷 69596-69618页

作者： Chakraborty, Dipanita Chiracharit, Werapon Chamnongthai, Kosin King Mongkuts Univ Technol Thonburi Dept Elect & Telecommun Engn Bangkok 10140 Thailand

3D object tracking in monocular video relies on understanding the scene content to improve the continuity of the tracking signal. Reconstructing 3D shapes of single-view objects is essential for capturing object depth, orientation, and position within the scene. While existing deep learning-based methods excel in 3D reconstruction and tracking, they primarily focus on object feature semantics in normal frames, neglecting scene transition (ST) frames. This limitation leads to object information loss and discontinuity during tracking. This paper proposes a novel method for 3D reconstruction of single-view objects in monocular video scenes, focusing on fade scene transitions. First, large video datasets are pre-processed and segmented into sequences using cut transition detection via adaptive histogram equalization (AHE), and Euclidean distance estimation (EDE). Second, fade transition sequences are detected and classified into fade-in, fade-out, and mixed-fade scene transitions using pixel intensity-based adaptive threshold. Third, contrast enhancement is applied to fade transition frames using contrast-limited adaptive histogram equalization (CLAHE) to improve object feature extraction. Fourth, a modified DeepLabv3+ network is employed to generate multi-scale features for semantic foreground object and background segmentation. Finally, the segmented objects are processed through the proposed Point-wise multilayer perceptron (MLP) network, which reconstructs 3D object point clouds from segmented 2D single-view object pixels. Experimental evaluations on object categories "Chair," "Car," and "Airplane" from the benchmark TRECVID, Pix3D, ShapeNet, and Multimedia datasets achieved an accuracy improvement of 6.52% for fade transition detection and satisfactory results in 3D point cloud reconstruction.

关键词： 3D neural network 3D reconstruction encoder-decoder architecture encoder-decoder architecture object segmentation object segmentation point cloud point cloud scene transition detection scene transition detection shot boundary detection shot boundary detection video scene content understanding video scene content understanding video scene content understanding

来源：评论

学校读者我要写书评

暂无评论

KIDBA-Net: A Multi-Feature Fusion Brain Tumor Segmentation Network Utilizing Kernel Inception Depthwise Convolution and Bi-Cross Attention

引用

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2025年第2期35卷

作者： Min, Jie Huang, Tongyuan Huang, Boxiong Hu, Chuanxin Zhang, Zhixing Chongqing Univ Technol Sch Artificial Intelligence Chongqing Peoples R China

Automatic brain tumor segmentation technology plays a crucial role in tumor diagnosis, particularly in the precise delineation of tumor subregions. It can assist doctors in accurately assessing the type and location of brain tumors, potentially saving patients' lives. However, the highly variable size and shape of brain tumors, along with their similarity to healthy tissue, pose significant challenges in the segmentation of multi-label brain tumor subregions. This paper proposes a network model, KIDBA-Net, based on an encoder-decoder architecture, aimed at solving the issue of pixel-level classification errors in multi-label tumor subregions. The proposed Kernel Inception Depthwise Block (KIDB) employs multi-kernel depthwise convolution to extract multi-scale features in parallel, accurately capturing the feature differences between tumor types to mitigate misclassification. To ensure the network focuses more on the lesion areas and excludes the interference of irrelevant tissues, this paper adopts Bi-Cross Attention as a skip connection hub to bridge the semantic gap between layers. Additionally, the Dynamic Feature Reconstruction Block (DFRB) exploits the complementary advantages of convolution and dynamic upsampling operators, effectively aiding the model in generating high-resolution prediction maps during the decoding phase. The proposed model surpasses other state-of-the-art brain tumor segmentation methods on the BraTS2018 and BraTS2019 datasets, particularly in the segmentation accuracy of smaller and highly overlapping tumor core (TC) and enhanced tumor (ET), achieving DSC scores of 87.8%, 82.0%, and 90.2%, 88.7%, respectively;Hausdorff distances of 2.8, 2.7 mm, and 2.7, 2.0 mm.

关键词： Bi-Cross Attention brain tumor segmentation encoder-decoder architecture Kernel Inception Depthwise Block MRI

来源：评论

学校读者我要写书评

暂无评论

ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation

引用

VISUAL COMPUTER 2025年第3期41卷 1543-1554页

作者： Li, Ya Li, Ziming Liu, Huiwang Wang, Qing Guangzhou Univ Sch Comp Sci & Cyber Engn Guangzhou 510006 Peoples R China Sun Yat Sen Univ Sch Comp & Engn Guangzhou 510275 Peoples R China

Feature fusion module is an essential component of real-time semantic segmentation networks to bridge the semantic gap among different feature layers. However, many networks are inefficient in multi-level feature fusion. In this paper, we propose a simple yet effective decoder that consists of a series of multi-level attention feature fusion modules (MLA-FFMs) aimed at fusing multi-level features in a top-down manner. Specifically, MLA-FFM is a lightweight attention-based module. Therefore, it can not only efficiently fuse features to bridge the semantic gap at different levels, but also be applied to real-time segmentation tasks. In addition, to solve the problem of low accuracy of existing real-time segmentation methods at semantic boundaries, we propose a semantic boundary supervision module (BSM) to improve the accuracy by supervising the prediction of semantic boundaries. Extensive experiments demonstrate that our network achieves a state-of-the-art trade-off between segmentation accuracy and inference speed on both Cityscapes and CamVid datasets. On a single NVIDIA GeForce 1080Ti GPU, our model achieves 77.4% mIoU with a speed of 97.5 FPS on the Cityscapes test dataset, and 74% mIoU with a speed of 156.6 FPS on the CamVid test dataset, which is superior to most state-of-the-art real-time methods.

关键词： Real-time semantic segmentation Multi-level feature fusion Attention mechanism encoder-decoder architecture Boundary supervision

来源：评论

学校读者我要写书评

暂无评论

HEDN: multi-oriented hierarchical extraction and dual-frequency decoupling network for 3D medical image segmentation

引用

MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING 2025年第1期63卷 267-291页

作者： Wang, Yu Huang, Guoheng Lu, Zeng Wang, Ying Chen, Xuhang Yuan, Xiaochen Li, Yan Ni, Liujie Huang, Yingping Hunan Tradit Chinese Med Coll Publ Courses Dept Zhuzhou 412012 Hunan Peoples R China Guangdong Univ Technol Sch Comp Sci Guangzhou 510006 Guangdong Peoples R China Guangzhou Interesting Pill Network Technol Co Ltd Guangzhou 510630 Guangdong Peoples R China Macao Polytech Univ Fac Appl Sci Taipa 999078 Peoples R China Huizhou Univ Sch Comp Sci & Engn Huizhou 516001 Guangdong Peoples R China Shenzhen Polytech Univ Shenzhen 518000 Guangdong Peoples R China Ningxiang Tradit Chinese Med Hosp Dept Cardiol 8 Second Ring South Rd Ningxiang 410699 Hunan Peoples R China Sun Yat sen Univ Collaborat Innovat Ctr Canc MedCanc Ctr Dept Radiat OncolGuangdong Key Lab Nasopharyngeal State Key Lab Oncol South China Guangzhou 510006 Guangdong Peoples R China

3D encoder-decoder segmentation architectures struggled with fine-grained feature decomposition, resulting in unclear feature hierarchies when fused across layers. Furthermore, the blurred nature of contour boundaries in medical imaging limits the focus on high-frequency contour features. To address these challenges, we propose a Multi-oriented Hierarchical Extraction and Dual-frequency Decoupling Network (HEDN), which consists of three modules: encoder-decoder Module (E-DM), Multi-oriented Hierarchical Extraction Module (Multi-HEM), and Dual-frequency Decoupling Module (Dual-DM). The E-DM performs the basic encoding and decoding tasks, while Multi-HEM decomposes and fuses spatial and slice-level features in 3D, enriching the feature hierarchy by weighting them through 3D fusion. Dual-DM separates high-frequency features from the reconstructed network using self-supervision. Finally, the self-supervised high-frequency features separated by Dual-DM are inserted into the process following Multi-HEM, enhancing interactions and complementarities between contour features and hierarchical features, thereby mutually reinforcing both aspects. On the Synapse dataset, HEDN outperforms existing methods, boosting Dice Similarity Score (DSC) by 1.38% and decreasing 95% Hausdorff Distance (HD95) by 1.03 mm. Likewise, on the Automatic Cardiac Diagnosis Challenge (ACDC) dataset, HEDN achieves 0.5% performance gains across all categories.

关键词： 3D medical image segmentation encoder-decoder architecture Multi-oriented hierarchical extraction Dual-frequency decoupling

来源：评论

学校读者我要写书评

暂无评论

BMST-Net: bidirectional multi-scale spatiotemporal network for salient object detection in videos

引用

SIGNAL IMAGE AND VIDEO PROCESSING 2025年第1期19卷 1-9页

作者： Sharma, Gaurav Singh, Maheep Kumain, Sandeep Chand Kumar, Kamal Natl Inst Technol Dept Comp Sci & Engn Srinagar Uttarakhand India Doon Univ Dept Comp Sci Dehra Dun India UPES Sch Comp Sci Dehra Dun India IGDTUW Dept Informat Technol Delhi India

Video saliency prediction aims to simulate human visual attention by locating the most pertinent and instructive areas within a video frame or sequence. While ignoring the audio aspect, time and space data are essential when measuring video saliency, especially with challenging factors like swift motion, changeable background, and nonrigid deformation. Additionally, video saliency detection is inappropriate when using image saliency models directly neglecting video temporal information. This paper suggests a novel Bidirectional Multi-scale SpatioTemporal Network (BMST-Net) for identifying prominent video objects to address the above problem. The BMST-Net yields notable results for any given frame sequence, employing an encoder and decoder technique to learn and map features over time and space. The BMST-Net model consists of bidirectional LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network), where the VGG16 (Visual Geometry Group) single layer is used for feature extraction of the input video frames. Our proposed approach produced noteworthy findings concerning qualitative and quantitative investigation of the publicly available challenging video datasets, achieving competitive performance concerning state-of-the-art saliency models.

关键词： Video saliency Spatiotemporal encoder-decoder architecture U-Net

来源：评论

学校读者我要写书评

暂无评论

Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2024年第PartA期137卷

作者： Hossen, Md. Bipul Ye, Zhongfu Abdussalam, Amr Ul Hassan, Shabih Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230027 Anhui Peoples R China

Fine-grained image captioning with attribute information has garnered significant attention in the realms of computer vision and natural language processing, demanding precise and contextually relevant descriptions of visual content. While previous attribute-driven image captioning models have shown improvements, challenges remain, such as the independence of attribute predictors and caption generators and the semantic gap between images and attributes. Another common issue is the inclusion of all attributes at every time step, despite most attributes being irrelevant to the word currently being generated. This can divert the model's attention toward erroneous semantic details, resulting in a performance decline. To address these issues, we propose a novel Attribute-Driven Filtering (ADF) captioning network designed to provide rich and nuanced descriptions. This model incorporates a unique Attribute Predictor Module (APM) that dynamically predicts the most pertinent attributes in accordance with the textual context, utilizing different attributes at various time steps. The novelty of this approach lies in recognizing that not all attributes hold equal relevance at each time step, and the APM filters out irrelevant attributes to generate precise and contextually relevant captions. Furthermore, this model features a fusion mechanism that integrates visual information from a conventional attention module with attribute information predicted by the APM, aiming to reduce the visual semantic gap between images and attributes. Extensive experimentation demonstrates that the ADF model outperforms advanced models, achieving impressive CIDEr-D scores of 72.0 (Flickr30K) and 123.3 (MS-COCO) through reinforcement learning optimization. It consistently surpasses baseline models across diverse evaluation metrics, highlighting its effectiveness and robustness.

关键词： Fine-grained captioning Fusion mechanism encoder-decoder architecture Attribute predictor module

来源：评论

学校读者我要写书评

暂无评论

Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2024年 32卷 95-112页

作者： Xu, Xuenan Xie, Zeyu Wu, Mengyue Yu, Kai Shanghai Jiao Tong Univ AI Inst MoE Key Lab Artificial Intelligence X LANCE LabDept Comp Sci & Engn Shanghai 200240 Peoples R China

Automated audio captioning (AAC), a task that mimics human perception as well as innovatively links audio processing and natural language processing, has overseen much progress over the last few years. AAC requires recognizing contents such as the environment, sound events and the temporal relationships between sound events and describing these elements with a fluent sentence. Currently, an encoder-decoder-based deep learning framework is the standard approach to tackle this problem. Plenty of works have proposed novel network architectures and training schemes, including extra guidance, reinforcement learning, audio-text self-supervised learning and diverse or controllable captioning. Effective data augmentation techniques, especially based on large language models are explored. Benchmark datasets and AAC-oriented evaluation metrics also accelerate the improvement of this field. This article situates itself as a comprehensive survey covering the comparison between AAC and its related tasks, the existing deep learning techniques, datasets, and the evaluation metrics in AAC, with insights provided to guide potential future research directions.

关键词： Automated audio captioning audio recognition encoder-decoder architecture evaluation metrics natural language generation training schemes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：