Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between fore...
详细信息
Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS.
Air pollution can have detrimental effects on human health as well as the environment. Particulate Matter (PM), as a global issue, is a type of air pollution that consists of small particles suspended in the air. Ther...
详细信息
Air pollution can have detrimental effects on human health as well as the environment. Particulate Matter (PM), as a global issue, is a type of air pollution that consists of small particles suspended in the air. Therefore, it is crucial to estimate and monitor levels of PM in the air in order to protect public health and the environment. This study proposed a novel hybrid method to apply the capability of two various deep learning models, namely, the encoder-decoder convolutional neural network and the Long Short-Term Memory (LSTM) model for PM10 prediction. The first model was utilized as a data argumentation method to enhance dataset diversity, and the LSTM model employed meteorological parameters and spatiotemporal factors to estimate the PM10 levels. The proposed technique achieved performance resulting in a coefficient of determination value of 0.88 and a mean absolute error value of 7.24. The results confirm that the developed hybrid method as an effective tool of PM prediction can be used to inform decision-making about policies and actions to reduce PM levels.
Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplet...
详细信息
Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. Random walk based path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores a bidirectional Gated Recurrent Unit (GRU) based encoder-decoder to learn the representation of the paths whereas SBERT is used to generate the representation of the entity descriptions. The proposed approach outperforms most of the state-of-the-art models and achieves comparable results with the rest when evaluated with FB15K, FB15K-237, WN18, WN18RR, and YAGO3-10 datasets.
In order to improve the teaching quality, this paper proposes a multi-modal feature fusion-based abnormal behavior detection method, aiming at the problems of false detection, missing detection and imbalance of positi...
详细信息
In order to improve the teaching quality, this paper proposes a multi-modal feature fusion-based abnormal behavior detection method, aiming at the problems of false detection, missing detection and imbalance of positive and negative samples in the abnormal behavior detection of students in class. The new method consists of encoder module, detection module and decoder module. The encoder module is used to extract the characteristic information of students behavior image and transfer it to the detection module. The behavior detection module obtains more image information through the feature fusion group to reduce the color distortion and artifacts of the behavior image, and transfers the obtained image information to the deep normalization correction convolution block to reduce the covariate shift and make the model easier to train. The multi-path feature convolution block can obtain image information with richer texture details. Finally, the decoder module converts the low-dimensional feature mapping back to the high-dimensional original input space through deconvolution and up-sampling operations to obtain the behavior detection image.
In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-atte...
详细信息
In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-attention (DA) mechanism and recursive encoder-decoder (RED) structure into the long short-term memory (LSTM) neural network to develop a DA-LSTM-RED model. The feature attention acts on the input variables of the encoder, and the temporal attention mechanism acts on the hidden layer states extracted by the LSTM neural network during encoding process, prompting the proposed model to extract critical input information among different types and moments of input variables to improve the multi-step-ahead flood forecasting accuracy. Second, the copula-based Hydrological Uncertainty Processor (copula-HUP) is used to quantify the forecast uncertainty of the proposed model meanwhile creating multi-step-ahead flood probabilistic forecasts. Combining the long-term 6 h hydrologic data series of the Xiangjiaba-TGR interval-basin and the forecasted precipitation from the European Centre for Medium-Range Weather Forecasts (ECMWF), the effectiveness of the proposed model, the effect of forecast precipitation on multi-step-ahead flood forecasting, and the effect of different copula functions on the probabilistic forecast of copula-HUP are investigated, respectively. The results show that the DALSTM-RED model can effectively improve the forecasting accuracy for long forecast horizons (3-7d) compared to the LSTM-RED model, and the average absolute error metrics are reduced by 10%-17%. Meanwhile, the proposed model can identify input variables with a high correlation with the target output variables, which improves the interpretability of deep learning to a certain extent. The Student copula-HUP has the lowest RB and CRPS metrics than the Frank and Gaussian copula-HUP, which can better quantify the DA-LSTM-RED model's forecast uncertainty. Therefore, combining the proposed mo
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in ...
详细信息
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in a sign language education software. To address this issue, this paper proposed a sign language category and standardization correctness discrimination model for sign language education. The proposed model is implemented with a hand detection and standard sign language discrimination method. For hand detection, the proposed method utilizes flow-guided features and acquires relevant proposals using stable and flow key frame detections. This model can resolve the inconsistency between the forward optical flow and the box center point offset. In addition, the proposed method employs an encoder-decoder model structure for sign language correctness discrimination. The encoder model combines 3D convolution and 2D deformable convolution results with residual structures, and it implements a sequence attention mechanism. A Sign Language Correctness Discrimination dataset (SLCD dataset) was also constructed in this study. In this dataset, each sign language video has two recognition labels, i.e., sign language category and standardization category. The semi-supervised learning method was employed to generate pseudo hand position labels. The hand detection model was getting sufficiently high hand detection result. The sign language correctness discrimination model was tested with hand patches or full images. SLCD dataset is available at https://***/10.21227/p9sn-dz70.
To solve the problem of cage whirl motion capture and evaluation, this paper developed an efficient non -contact measurement method based on semantic segmentation technology. An encoder-decoder network whose backbone ...
详细信息
To solve the problem of cage whirl motion capture and evaluation, this paper developed an efficient non -contact measurement method based on semantic segmentation technology. An encoder-decoder network whose backbone is U-Net is constructed by introducing residual learning and attention mechanism for cage motion state segmentation. A random move augmentation strategy is used to simulate the random movement of cage mass center. The network is trained with 1368 high-speed cage rotational images using the augmentation strategy. Additionally, 150 images are validation set, and 5000 images under different operating conditions are test set. A trained network is applied to the cage whirl motion capture under different operating conditions by matching the suitable parameters during the training phase. The results show that our method effectively predicts the trend of cage whirl motion, with the predicted cage whirl orbit used for the accurate analysis of cage rotational stability.
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric can...
详细信息
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric cancer. However, the existing gastric cancer image datasets have disadvantages such as small data sizes and difficulty in labeling. Moreover, most existing CNN-based methods are unable to generate satisfactory segmentation masks without accurate labels, which are due to the limited context information and insufficient discriminative feature maps obtained after the consecutive pooling and convolution operations. This paper presents a gastric cancer lesion dataset for gastric tumor image segmentation research. A multiscale boundary neural network (MBNet) is proposed to automatically segment the real tumor area in gastric cancer images. MBNet adopts encoder-decoder architecture. In each stage of the encoder, a boundary extraction refinement module is proposed for obtaining multi granular edge information and refinement firstly. Then, we build a selective fusion module to selectively fuse features from the different stages. By cascading the two modules, the richer context and fine-grained features of each stage are encoded. Finally, the astrous spatial pyramid pooling is improved to obtain the remote dependency relationship of the overall context and the fine spatial structure information. The experimental results show that the accuracy of the model reaches 92.3%, the similarity coefficient (DICE) reaches 86.9%, and the performance of the proposed method on the CVC-ClinicDB and Kvasir-SEG datasets also outperforms existing approaches.
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future perio...
详细信息
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future period at the target station by using the time series of wind speed in the past period at the target station and its adjacent stations. This method is established by using deep learning technology. Based on the infrastructure of encoder-decoder, the driving series at the adjacent stations and the target series at the target station are taken as the input of the encoder module and the decoder module, respectively. There are two attention layers in the encoder module. One is used to strengthen the contribution of each influence factor in the input driving series to the hidden state in the long short-term memory (LSTM) layer. The other is used to enable the encoder to adaptively select the hidden state output by the LSTM layer. The loss function based on the Gaussian kernel function is adopted in the forecast model of this study, and the dynamic weight is designed to optimize the attention to the errors of the output results at different forecast leading times in the training process of the neural network model, thus improving the model forecast performance for longer forecast leading times. The results show that the performance of this method is excellent in the wind speed forecast from T+1 to T+24. The mean absolute error and root mean squared error of the forecast results at T+24 are 0.796 m s-1 and 1.029 m s-1, respectively, which are better than those of the other two models in the experiment. It is proved that the method proposed in this study can not only be applied to the wind speed forecast but also can provide technical support for operational applications such as early-warning of gale disaster and wind power prediction.
Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction...
详细信息
Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction for large-scale traffic networks with a large number of sensors deployed has become a challenging issue. In this paper, we propose a multi-step many-to-many traffic prediction model for large-scale traffic networks, called spatio-temporal Shared GRU (STSGRU), which receive inputs from multiple sensors and provides predictions for all sensors simultaneously. First, we model the weekly pattern of traffic flow, using periodicity to explore long-term temporal features and provide smooth traffic flow to reduce the impact of data volatility. Second, different from existing models, we propose a shared weight mechanism to achieve many-to-many prediction without mapping traffic networks to images or graph structures. The proposed model strikes a delicate balance between complexity and accuracy. We validate the effectiveness of the proposed method on the Caltrans Performance Measurement System (PeMS) dataset. The results show that our model achieves similar prediction performance with advanced graph neural networks and has higher flexibility. & COPY;2023 Elsevier B.V. All rights reserved.
暂无评论