In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-atte...
详细信息
In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-attention (DA) mechanism and recursive encoder-decoder (RED) structure into the long short-term memory (LSTM) neural network to develop a DA-LSTM-RED model. The feature attention acts on the input variables of the encoder, and the temporal attention mechanism acts on the hidden layer states extracted by the LSTM neural network during encoding process, prompting the proposed model to extract critical input information among different types and moments of input variables to improve the multi-step-ahead flood forecasting accuracy. Second, the copula-based Hydrological Uncertainty Processor (copula-HUP) is used to quantify the forecast uncertainty of the proposed model meanwhile creating multi-step-ahead flood probabilistic forecasts. Combining the long-term 6 h hydrologic data series of the Xiangjiaba-TGR interval-basin and the forecasted precipitation from the European Centre for Medium-Range Weather Forecasts (ECMWF), the effectiveness of the proposed model, the effect of forecast precipitation on multi-step-ahead flood forecasting, and the effect of different copula functions on the probabilistic forecast of copula-HUP are investigated, respectively. The results show that the DALSTM-RED model can effectively improve the forecasting accuracy for long forecast horizons (3-7d) compared to the LSTM-RED model, and the average absolute error metrics are reduced by 10%-17%. Meanwhile, the proposed model can identify input variables with a high correlation with the target output variables, which improves the interpretability of deep learning to a certain extent. The Student copula-HUP has the lowest RB and CRPS metrics than the Frank and Gaussian copula-HUP, which can better quantify the DA-LSTM-RED model's forecast uncertainty. Therefore, combining the proposed mo
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in ...
详细信息
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in a sign language education software. To address this issue, this paper proposed a sign language category and standardization correctness discrimination model for sign language education. The proposed model is implemented with a hand detection and standard sign language discrimination method. For hand detection, the proposed method utilizes flow-guided features and acquires relevant proposals using stable and flow key frame detections. This model can resolve the inconsistency between the forward optical flow and the box center point offset. In addition, the proposed method employs an encoder-decoder model structure for sign language correctness discrimination. The encoder model combines 3D convolution and 2D deformable convolution results with residual structures, and it implements a sequence attention mechanism. A Sign Language Correctness Discrimination dataset (SLCD dataset) was also constructed in this study. In this dataset, each sign language video has two recognition labels, i.e., sign language category and standardization category. The semi-supervised learning method was employed to generate pseudo hand position labels. The hand detection model was getting sufficiently high hand detection result. The sign language correctness discrimination model was tested with hand patches or full images. SLCD dataset is available at https://***/10.21227/p9sn-dz70.
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric can...
详细信息
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric cancer. However, the existing gastric cancer image datasets have disadvantages such as small data sizes and difficulty in labeling. Moreover, most existing CNN-based methods are unable to generate satisfactory segmentation masks without accurate labels, which are due to the limited context information and insufficient discriminative feature maps obtained after the consecutive pooling and convolution operations. This paper presents a gastric cancer lesion dataset for gastric tumor image segmentation research. A multiscale boundary neural network (MBNet) is proposed to automatically segment the real tumor area in gastric cancer images. MBNet adopts encoder-decoder architecture. In each stage of the encoder, a boundary extraction refinement module is proposed for obtaining multi granular edge information and refinement firstly. Then, we build a selective fusion module to selectively fuse features from the different stages. By cascading the two modules, the richer context and fine-grained features of each stage are encoded. Finally, the astrous spatial pyramid pooling is improved to obtain the remote dependency relationship of the overall context and the fine spatial structure information. The experimental results show that the accuracy of the model reaches 92.3%, the similarity coefficient (DICE) reaches 86.9%, and the performance of the proposed method on the CVC-ClinicDB and Kvasir-SEG datasets also outperforms existing approaches.
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future perio...
详细信息
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future period at the target station by using the time series of wind speed in the past period at the target station and its adjacent stations. This method is established by using deep learning technology. Based on the infrastructure of encoder-decoder, the driving series at the adjacent stations and the target series at the target station are taken as the input of the encoder module and the decoder module, respectively. There are two attention layers in the encoder module. One is used to strengthen the contribution of each influence factor in the input driving series to the hidden state in the long short-term memory (LSTM) layer. The other is used to enable the encoder to adaptively select the hidden state output by the LSTM layer. The loss function based on the Gaussian kernel function is adopted in the forecast model of this study, and the dynamic weight is designed to optimize the attention to the errors of the output results at different forecast leading times in the training process of the neural network model, thus improving the model forecast performance for longer forecast leading times. The results show that the performance of this method is excellent in the wind speed forecast from T+1 to T+24. The mean absolute error and root mean squared error of the forecast results at T+24 are 0.796 m s-1 and 1.029 m s-1, respectively, which are better than those of the other two models in the experiment. It is proved that the method proposed in this study can not only be applied to the wind speed forecast but also can provide technical support for operational applications such as early-warning of gale disaster and wind power prediction.
Quickly and accurately obtaining street lamp post information has great application value in smart city construction and automatic vehicle navigation. However, the existing deep learning methods are affected by factor...
详细信息
Quickly and accurately obtaining street lamp post information has great application value in smart city construction and automatic vehicle navigation. However, the existing deep learning methods are affected by factors such as the perspective effect, different objects with the same spectrum, and occlusion. There can also be some problems in the semantic segmentation results for street lamp posts, such as under-segmentation, misextraction, and discontinuity. In this paper, we present the OSLPNet model for the extraction of street lamp posts from street view imagery. According to the characteristics of the various scales of street lamp posts in the imagery, a multiscale phased controller (MPC) with multi-level receptive fields is proposed to reduce the under-segmentation problem for street lamp posts. According to the unique "elbow" structure of street lamp posts, deformable convolution is introduced to reduce the problem of misextraction of street lamp posts. According to the topological relationship of street lamp post context, a lightweight spatial context (LSC) module is proposed to solve the problem of discontinuous detection of street lamp posts caused by occlusion. We also proposed two street lamp pole datasets, and experimental results showed that our F1 values can reach 85.2% and 82.4% under both datasets, which is superior to the existing state of art method. The code and datasets are publicly available at htt ps://***/ZzzTD/OSLPNet.
Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction...
详细信息
Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction for large-scale traffic networks with a large number of sensors deployed has become a challenging issue. In this paper, we propose a multi-step many-to-many traffic prediction model for large-scale traffic networks, called spatio-temporal Shared GRU (STSGRU), which receive inputs from multiple sensors and provides predictions for all sensors simultaneously. First, we model the weekly pattern of traffic flow, using periodicity to explore long-term temporal features and provide smooth traffic flow to reduce the impact of data volatility. Second, different from existing models, we propose a shared weight mechanism to achieve many-to-many prediction without mapping traffic networks to images or graph structures. The proposed model strikes a delicate balance between complexity and accuracy. We validate the effectiveness of the proposed method on the Caltrans Performance Measurement System (PeMS) dataset. The results show that our model achieves similar prediction performance with advanced graph neural networks and has higher flexibility. & COPY;2023 Elsevier B.V. All rights reserved.
In recent years, significant progress has been made in semantic segmentation methods. Traditional semantic segmentation methods based on convolutional neural network (CNN) are prone to lose spatial information in the ...
详细信息
In recent years, significant progress has been made in semantic segmentation methods. Traditional semantic segmentation methods based on convolutional neural network (CNN) are prone to lose spatial information in the feature extraction stage, and pay less attention to global context information, especially, in some lightweight real-time semantic segmentation networks. This is a huge challenge for semantic segmentation tasks. In addition, although some methods have improved this problem to a certain extent, they are often embedded in specific networks and cannot be applied to other network models. Aiming at these problems, a semantic segmentation method based on multilayer feature fusion is proposed. The flexible and lightweight squeeze-excitation module is used to improve the spatial pyramid pooling (SPP) network, and the accuracy of the semantic segmentation method is further improved by extracting network feature information at different levels. To verify the efficiency and commonality of our methodology, we selected ERFNet and Deeplabv3 networks to experiment on Cityscapes and COCO data sets. Experiments show that our best method can improve 3.1% mIoU and 3.2% mAcc on the Cityscapes data set relative to ERFNet, and at the same time, our method can achieve 61.93 FPS on 1024 x 512 resolution images and the best improvement of 0.9% mIoU 1.4% mAcc was achieved on the Deeplabv3 network. The experimental results show that the improved multilayer feature fusion structure can improve the accuracy of the semantic segmentation network.
The aim of the image captioning task is to understand various semantic concepts such as objects and their relationships in an image and combine them to generate a natural language description. Thus, it needs an algori...
详细信息
The aim of the image captioning task is to understand various semantic concepts such as objects and their relationships in an image and combine them to generate a natural language description. Thus, it needs an algorithm to understand the visual content of a given image and translates it into a sequence of output words. In this paper, a Local Relation Network (LRN) is designed over the objects and image regions which not only discovers the relationship between the object and the image regions but also generates significant context-based features corresponding to every region in the image. Also, a multilevel attention approach is used to focus on a given image region and its related image regions, thus enhancing the image representation capability of the proposed method. Finally, a variant of traditional long-short term memory (LSTM), which uses an attention mechanism, is employed which focuses on relevant contextual information, spatial locations, and deep visual features. With these measures, the proposed model encodes an image in an improved way, which gives the model significant cues and thus leads to improved caption generation. Extensive experiments have been performed on three benchmark datasets: Flickr30k, MSCOCO, and Nocaps. On Flickr30k, the obtained evaluation scores are 31.2 BLEU@4, 23.5 METEOR, 51.5 ROUGE, 65.6 CIDEr and 17.2 SPICE. On MSCOCO, the proposed model has attained 42.4 BLEU@4, 29.4 METEOR, 59.7 ROUGE, 125.7 CIDEr and 23.2 SPICE. The overall CIDEr score on Nocaps dataset achieved by the proposed model is 114.3. The above scores clearly show the superiority of the proposed method over the existing methods.
Electric shorting induced by tall vegetation is one of the major hazards affecting power transmission lines extending through rural regions and rough terrain for tens of kilometres. This raises the need for an accurat...
详细信息
Electric shorting induced by tall vegetation is one of the major hazards affecting power transmission lines extending through rural regions and rough terrain for tens of kilometres. This raises the need for an accurate, reliable, and cost-effective approach for continuous monitoring of canopy heights. This paper proposes and evaluates two deep convolution neural network (CNN) variants based on Seg-Net and Res-Net architectures, characterized by their small number of trainable weights (nearly 800,000) while maintaining high estimation accuracy. The proposed models utilize the freely available data from Sentinel-2, and a digital surface model to estimate forest canopy heights with high accuracy and a spatial resolution of 10 metres. Various factors affect canopy height estimation, including topography signature, dataset diversity, input layers, and model structure. The proposed models are applied separately to two powerline regions located in the northern and southern parts of Thailand. The application results show that the proposed encoder-decoder CNN Seg-Net model presents an average mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination R 2 of 1.38 m, 1.85 m, and 0.87, respectively, and is nearly 4.8 times faster than the CNN Res-Net model in conversion. These results prove the proposed model's capability of estimating and monitoring canopy heights with high accuracy and fine spatial resolution.
Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing th...
详细信息
Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing the results for cases that follow a homogeneous distribution (in distribution) and not discussing the issues of the domain shift problem, where training and testing data follow a heterogeneous (out of distribution) problem, which is the reality when such models are used in practice. However, recent research proposed addressing domain shift issues by augmenting the discriminative model with a generative model and obtained better results. In the present investigation, we propose regularizing the extended network by inserting linear layers to minimize the rank of the latent space and train the entire end-to-end network. We regularize the network to strengthen the model to deal effectively with domain shift scenarios. Both training and testing data come from different distribution sets;to deal with this, we toughen our network by adding the extra linear layers to the network encoder. We tested our model with the benchmark datasets, CMU Motion Capture and Human3.6M, and proved that our model outperforms 14 OoD actions of H3.6M and 7 OoD actions of CMU MoCap in terms of the Euclidean distance calculated between predicted and ground truth joint angle values. Our average results of 14 OoD actions for short-term (80, 160, 320, 400) are 0.34, 0.6, 0.96, 1.07, and for CMU MoCap of 7 OoD actions for short-term and long term (80, 160, 320, 400, 1000) are 0.28, 0.45, 0.77, 0.89, 1.46. All these results are much better than the other state-of-the-art results.
暂无评论