检索结果-内蒙古大学图书馆

Exploration of dual-attention mechanism-based deep learning for multi-step-ahead flood probabilistic forecasting

JOURNAL OF HYDROLOGY 2023年第PartB期622卷

作者： Cui, Zhen Guo, Shenglian Zhou, Yanlai Wang, Jun Wuhan Univ State Key Lab Water Resources & Hydropower Engn Sc Wuhan 430072 Peoples R China

In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-attention (DA) mechanism and recursive encoder-decoder (RED) structure into the long short-term memory (LSTM) neural network to develop a DA-LSTM-RED model. The feature attention acts on the input variables of the encoder, and the temporal attention mechanism acts on the hidden layer states extracted by the LSTM neural network during encoding process, prompting the proposed model to extract critical input information among different types and moments of input variables to improve the multi-step-ahead flood forecasting accuracy. Second, the copula-based Hydrological Uncertainty Processor (copula-HUP) is used to quantify the forecast uncertainty of the proposed model meanwhile creating multi-step-ahead flood probabilistic forecasts. Combining the long-term 6 h hydrologic data series of the Xiangjiaba-TGR interval-basin and the forecasted precipitation from the European Centre for Medium-Range Weather Forecasts (ECMWF), the effectiveness of the proposed model, the effect of forecast precipitation on multi-step-ahead flood forecasting, and the effect of different copula functions on the probabilistic forecast of copula-HUP are investigated, respectively. The results show that the DALSTM-RED model can effectively improve the forecasting accuracy for long forecast horizons (3-7d) compared to the LSTM-RED model, and the average absolute error metrics are reduced by 10%-17%. Meanwhile, the proposed model can identify input variables with a high correlation with the target output variables, which improves the interpretability of deep learning to a certain extent. The Student copula-HUP has the lowest RB and CRPS metrics than the Frank and Gaussian copula-HUP, which can better quantify the DA-LSTM-RED model's forecast uncertainty. Therefore, combining the proposed mo

关键词： Dual-attention mechanism encoder-decoder Deep learning Hydrological uncertainty processor Copula function Probabilistic forecasting

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Based Standard Sign Language Discrimination

引用

IEEE ACCESS 2023年 11卷 125822-125834页

作者： Zhang, Menglin Yang, Shuying Zhao, Min Tianjin Univ Technol Sch Comp Sci & Engn Tianjin 300384 Peoples R China Minist Educ Key Lab Comp Vis & Syst Tianjin 300384 Peoples R China

General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in a sign language education software. To address this issue, this paper proposed a sign language category and standardization correctness discrimination model for sign language education. The proposed model is implemented with a hand detection and standard sign language discrimination method. For hand detection, the proposed method utilizes flow-guided features and acquires relevant proposals using stable and flow key frame detections. This model can resolve the inconsistency between the forward optical flow and the box center point offset. In addition, the proposed method employs an encoder-decoder model structure for sign language correctness discrimination. The encoder model combines 3D convolution and 2D deformable convolution results with residual structures, and it implements a sequence attention mechanism. A Sign Language Correctness Discrimination dataset (SLCD dataset) was also constructed in this study. In this dataset, each sign language video has two recognition labels, i.e., sign language category and standardization category. The semi-supervised learning method was employed to generate pseudo hand position labels. The hand detection model was getting sufficiently high hand detection result. The sign language correctness discrimination model was tested with hand patches or full images. SLCD dataset is available at https://***/10.21227/p9sn-dz70.

关键词： Sign language Encoding Video coding Object detection Three-dimensional displays Convolution Gesture recognition Assistive technologies Semisupervised learning Educational courses Software packages Continuous sign language recognition encoder-decoder tubelet video object detection 3D convolution

来源：评论

学校读者我要写书评

暂无评论

Multi-scale boundary neural network for gastric tumor segmentation

引用

VISUAL COMPUTER 2023年第3期39卷 915-926页

作者： Wang, Pengfei Li, Yunqi Sun, Yaru He, Dongzhi Wang, Zhiqiang Beijing Univ Technol Fac Informat Technol Beijing Peoples R China Chinese Peoples Liberat Army Gen Hosp Med Ctr 2 Dept Gastroenterol Beijing 100853 Peoples R China Chinese Peoples Liberat Army Gen Hosp Natl Clin Res Ctr Geriatr Dis Beijing 100853 Peoples R China Chinese Peoples Liberat Army Gen Hosp Med Ctr 1 Dept Gastroenterol Beijing 100853 Peoples R China

At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric cancer. However, the existing gastric cancer image datasets have disadvantages such as small data sizes and difficulty in labeling. Moreover, most existing CNN-based methods are unable to generate satisfactory segmentation masks without accurate labels, which are due to the limited context information and insufficient discriminative feature maps obtained after the consecutive pooling and convolution operations. This paper presents a gastric cancer lesion dataset for gastric tumor image segmentation research. A multiscale boundary neural network (MBNet) is proposed to automatically segment the real tumor area in gastric cancer images. MBNet adopts encoder-decoder architecture. In each stage of the encoder, a boundary extraction refinement module is proposed for obtaining multi granular edge information and refinement firstly. Then, we build a selective fusion module to selectively fuse features from the different stages. By cascading the two modules, the richer context and fine-grained features of each stage are encoded. Finally, the astrous spatial pyramid pooling is improved to obtain the remote dependency relationship of the overall context and the fine spatial structure information. The experimental results show that the accuracy of the model reaches 92.3%, the similarity coefficient (DICE) reaches 86.9%, and the performance of the proposed method on the CVC-ClinicDB and Kvasir-SEG datasets also outperforms existing approaches.

关键词： Gastric tumor segmentation encoder-decoder Convolutional neural network Deep learning

来源：评论

学校读者我要写书评

暂无评论

A forecast model of short-term wind speed based on the attention mechanism and long short-term memory

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第15期83卷 45603页

作者： Xing, Wang Qi-liang, Wu Gui-rong, Tan Dai-li, Qian Ke, Zhou Nanjing Univ Informat Sci & Technol Natl Demonstrat Ctr Expt Atmospher Sci & Environm Nanjing Jiangsu Peoples R China Nanjing Univ Informat Sci & Technol Sch Artificial Intelligence Nanjing Jiangsu Peoples R China Nanjing Univ Aeronaut & Astronaut Coll Civil Aviat Nanjing Jiangsu Peoples R China

Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future period at the target station by using the time series of wind speed in the past period at the target station and its adjacent stations. This method is established by using deep learning technology. Based on the infrastructure of encoder-decoder, the driving series at the adjacent stations and the target series at the target station are taken as the input of the encoder module and the decoder module, respectively. There are two attention layers in the encoder module. One is used to strengthen the contribution of each influence factor in the input driving series to the hidden state in the long short-term memory (LSTM) layer. The other is used to enable the encoder to adaptively select the hidden state output by the LSTM layer. The loss function based on the Gaussian kernel function is adopted in the forecast model of this study, and the dynamic weight is designed to optimize the attention to the errors of the output results at different forecast leading times in the training process of the neural network model, thus improving the model forecast performance for longer forecast leading times. The results show that the performance of this method is excellent in the wind speed forecast from T+1 to T+24. The mean absolute error and root mean squared error of the forecast results at T+24 are 0.796 m s-1 and 1.029 m s-1, respectively, which are better than those of the other two models in the experiment. It is proved that the method proposed in this study can not only be applied to the wind speed forecast but also can provide technical support for operational applications such as early-warning of gale disaster and wind power prediction.

关键词： Wind speed forecasting Attention mechanism encoder-decoder Long short-term memory Deep learning Dynamic weight

来源：评论

学校读者我要写书评

暂无评论

OSLPNet: A neural network model for street lamp post extraction from street view imagery

引用

EXPERT SYSTEMS WITH APPLICATIONS 2023年第1期231卷

作者： Zhang, Tengda Dai, Jiguang Song, Weidong Zhao, Ruishan Zhang, Bing Liaoning Tech Univ Sch Geomat Yulong St Fuxin 123000 Liaoning Provin Peoples R China Liaoning Tech Univ Inst Spatiotemporal Transportat Data Fuxin Peoples R China

Quickly and accurately obtaining street lamp post information has great application value in smart city construction and automatic vehicle navigation. However, the existing deep learning methods are affected by factors such as the perspective effect, different objects with the same spectrum, and occlusion. There can also be some problems in the semantic segmentation results for street lamp posts, such as under-segmentation, misextraction, and discontinuity. In this paper, we present the OSLPNet model for the extraction of street lamp posts from street view imagery. According to the characteristics of the various scales of street lamp posts in the imagery, a multiscale phased controller (MPC) with multi-level receptive fields is proposed to reduce the under-segmentation problem for street lamp posts. According to the unique "elbow" structure of street lamp posts, deformable convolution is introduced to reduce the problem of misextraction of street lamp posts. According to the topological relationship of street lamp post context, a lightweight spatial context (LSC) module is proposed to solve the problem of discontinuous detection of street lamp posts caused by occlusion. We also proposed two street lamp pole datasets, and experimental results showed that our F1 values can reach 85.2% and 82.4% under both datasets, which is superior to the existing state of art method. The code and datasets are publicly available at htt ps://***/ZzzTD/OSLPNet.

关键词： Convolutional neural network encoder-decoder Street lamp post extraction Engineering application Urban component acquisition

来源：评论

学校读者我要写书评

暂无评论

Short-term traffic flow prediction model based on a shared weight gate recurrent unit neural network

引用

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS 2023年 618卷

作者： Sun, Xiaoyong Chen, Fenghao Wang, Yuchen Lin, Xuefen Ma, Weifeng Zhejiang Univ Sci & Technol Sch Informat & Elect Engn Hangzhou 310023 Peoples R China

Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction for large-scale traffic networks with a large number of sensors deployed has become a challenging issue. In this paper, we propose a multi-step many-to-many traffic prediction model for large-scale traffic networks, called spatio-temporal Shared GRU (STSGRU), which receive inputs from multiple sensors and provides predictions for all sensors simultaneously. First, we model the weekly pattern of traffic flow, using periodicity to explore long-term temporal features and provide smooth traffic flow to reduce the impact of data volatility. Second, different from existing models, we propose a shared weight mechanism to achieve many-to-many prediction without mapping traffic networks to images or graph structures. The proposed model strikes a delicate balance between complexity and accuracy. We validate the effectiveness of the proposed method on the Caltrans Performance Measurement System (PeMS) dataset. The results show that our model achieves similar prediction performance with advanced graph neural networks and has higher flexibility. & COPY;2023 Elsevier B.V. All rights reserved.

关键词： Traffic flow prediction Gated recurrent unit encoder-decoder Periodicity Shared weight

来源：评论

学校读者我要写书评

暂无评论

Semantic Segmentation Based on Spatial Pyramid Pooling and Multilayer Feature Fusion

引用

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS 2023年第3期15卷 1524-1535页

作者： Ji, Jian Li, Sitong Liao, Xianfu Zhang, Fangrong Xidian Univ Sch Comp Sci & Technol Xian 710071 Peoples R China

In recent years, significant progress has been made in semantic segmentation methods. Traditional semantic segmentation methods based on convolutional neural network (CNN) are prone to lose spatial information in the feature extraction stage, and pay less attention to global context information, especially, in some lightweight real-time semantic segmentation networks. This is a huge challenge for semantic segmentation tasks. In addition, although some methods have improved this problem to a certain extent, they are often embedded in specific networks and cannot be applied to other network models. Aiming at these problems, a semantic segmentation method based on multilayer feature fusion is proposed. The flexible and lightweight squeeze-excitation module is used to improve the spatial pyramid pooling (SPP) network, and the accuracy of the semantic segmentation method is further improved by extracting network feature information at different levels. To verify the efficiency and commonality of our methodology, we selected ERFNet and Deeplabv3 networks to experiment on Cityscapes and COCO data sets. Experiments show that our best method can improve 3.1% mIoU and 3.2% mAcc on the Cityscapes data set relative to ERFNet, and at the same time, our method can achieve 61.93 FPS on 1024 x 512 resolution images and the best improvement of 0.9% mIoU 1.4% mAcc was achieved on the Deeplabv3 network. The experimental results show that the improved multilayer feature fusion structure can improve the accuracy of the semantic segmentation network.

关键词： encoder-decoder feature extraction pyramid pooling semantic segmentation squeeze-excitation module

来源：评论

学校读者我要写书评

暂无评论

Multilevel attention and relation network based image captioning model

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第7期82卷 10981-11003页

作者： Sharma, Himanshu Srivastava, Swati GLA Univ Mathura Dept Comp Engn & Applicat Mathura Chaumuhan India

The aim of the image captioning task is to understand various semantic concepts such as objects and their relationships in an image and combine them to generate a natural language description. Thus, it needs an algorithm to understand the visual content of a given image and translates it into a sequence of output words. In this paper, a Local Relation Network (LRN) is designed over the objects and image regions which not only discovers the relationship between the object and the image regions but also generates significant context-based features corresponding to every region in the image. Also, a multilevel attention approach is used to focus on a given image region and its related image regions, thus enhancing the image representation capability of the proposed method. Finally, a variant of traditional long-short term memory (LSTM), which uses an attention mechanism, is employed which focuses on relevant contextual information, spatial locations, and deep visual features. With these measures, the proposed model encodes an image in an improved way, which gives the model significant cues and thus leads to improved caption generation. Extensive experiments have been performed on three benchmark datasets: Flickr30k, MSCOCO, and Nocaps. On Flickr30k, the obtained evaluation scores are 31.2 BLEU@4, 23.5 METEOR, 51.5 ROUGE, 65.6 CIDEr and 17.2 SPICE. On MSCOCO, the proposed model has attained 42.4 BLEU@4, 29.4 METEOR, 59.7 ROUGE, 125.7 CIDEr and 23.2 SPICE. The overall CIDEr score on Nocaps dataset achieved by the proposed model is 114.3. The above scores clearly show the superiority of the proposed method over the existing methods.

关键词： Relation network Semantic Attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of convolution neural networks in canopy height estimation using sentinel 2 data, application to Thailand

引用

INTERNATIONAL JOURNAL OF REMOTE SENSING 2023年第5期44卷 1726-1748页

作者： ElGharbawi, Tamer Susaki, Junichi Chureesampant, Kamolratn Arunplod, Chomchanok Thanyapraneedkul, Juthasinee Limlahapun, Ponthip Suliman, Amany Suez Canal Univ Fac Engn Civil Engn Dept Ismailia Egypt Kyoto Univ Dept Civil & Earth Resources Engn Kyoto Japan Elect Generating Author Thailand Nonthaburi Thailand Srinakharinwirot Univ Bangkok Thailand Thammasat Univ Bangkok Thailand Kasetsart Univ Bangkok Thailand Mansoura Univ Fac Engn Publ Works Dept Mansoura Egypt Suez Canal Univ Fac Engn New Campus Kilo 4-5Ring RdPO 41522 Ismailia Egypt

Electric shorting induced by tall vegetation is one of the major hazards affecting power transmission lines extending through rural regions and rough terrain for tens of kilometres. This raises the need for an accurate, reliable, and cost-effective approach for continuous monitoring of canopy heights. This paper proposes and evaluates two deep convolution neural network (CNN) variants based on Seg-Net and Res-Net architectures, characterized by their small number of trainable weights (nearly 800,000) while maintaining high estimation accuracy. The proposed models utilize the freely available data from Sentinel-2, and a digital surface model to estimate forest canopy heights with high accuracy and a spatial resolution of 10 metres. Various factors affect canopy height estimation, including topography signature, dataset diversity, input layers, and model structure. The proposed models are applied separately to two powerline regions located in the northern and southern parts of Thailand. The application results show that the proposed encoder-decoder CNN Seg-Net model presents an average mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination R 2 of 1.38 m, 1.85 m, and 0.87, respectively, and is nearly 4.8 times faster than the CNN Res-Net model in conversion. These results prove the proposed model's capability of estimating and monitoring canopy heights with high accuracy and fine spatial resolution.

关键词： Forest Structure encoder-decoder High Resolution remote sensing Feature Importance

来源：评论

学校读者我要写书评

暂无评论

Implicit regularization of a deep augmented neural network model for human motion prediction

引用

APPLIED INTELLIGENCE 2023年第14期53卷 18027-18040页

作者： Yadav, Gaurav Kumar Abdel-Nasser, Mohamed Rashwan, Hatem A. Puig, Domenec Nandi, G. C. Univ Rovira & Virgili Dept Comp Engn & Math Tarragona 43002 Spain Indian Inst Informat Technol Allahabad Dept Informat Technol Prayagraj 211012 Uttar Pradesh India Aswan Univ Fac Engn Elect Engn Dept Elect & Commun Engn Sect Aswan 81528 Egypt

Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing the results for cases that follow a homogeneous distribution (in distribution) and not discussing the issues of the domain shift problem, where training and testing data follow a heterogeneous (out of distribution) problem, which is the reality when such models are used in practice. However, recent research proposed addressing domain shift issues by augmenting the discriminative model with a generative model and obtained better results. In the present investigation, we propose regularizing the extended network by inserting linear layers to minimize the rank of the latent space and train the entire end-to-end network. We regularize the network to strengthen the model to deal effectively with domain shift scenarios. Both training and testing data come from different distribution sets;to deal with this, we toughen our network by adding the extra linear layers to the network encoder. We tested our model with the benchmark datasets, CMU Motion Capture and Human3.6M, and proved that our model outperforms 14 OoD actions of H3.6M and 7 OoD actions of CMU MoCap in terms of the Euclidean distance calculated between predicted and ground truth joint angle values. Our average results of 14 OoD actions for short-term (80, 160, 320, 400) are 0.34, 0.6, 0.96, 1.07, and for CMU MoCap of 7 OoD actions for short-term and long term (80, 160, 320, 400, 1000) are 0.28, 0.45, 0.77, 0.89, 1.46. All these results are much better than the other state-of-the-art results.

关键词： Linear layers Human motion Out-of-distribution In-distribution encoder-decoder Graph convolution network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：