检索结果-内蒙古大学图书馆

Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation

IEEE TRANSACTIONS ON IMAGE PROCESSING 2023年 32卷 2348-2359页

作者： Pei, Gensheng Yao, Yazhou Shen, Fumin Huang, Dan Huang, Xingguo Shen, Heng-Tao Nanjing Univ Sci & Technol Sch Comp Sci & Engn Nanjing 210094 Peoples R China Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 611731 Peoples R China Chinese Acad Ordnance Sci Beijing 100089 Peoples R China Jilin Univ Coll Instrumentat & Elect Engn Changchun 130061 Peoples R China

Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS.

关键词： Video object segmentation hierarchical co-attention encoder-decoder cross-modal

来源：评论

学校读者我要写书评

暂无评论

Prediction of hourly PM10 concentration through a hybrid deep learning-based method

引用

EARTH SCIENCE INFORMATICS 2024年第1期17卷 37-49页

作者： Molaei, Sahar Nasabpour Salajegheh, Ali Khosravi, Hassan Nasiri, Amin Abadi, Abbas Ranjbar Saadat Univ Tehran Fac Nat Resources Dept Arid & Mt Reg Reclamat Karaj Iran Univ Tennessee Dept Biosyst Engn & Soil Sci Knoxville TN USA Atmospher Sci & Meteorol Res Ctr ASMERC Dept Meteorol Tehran Iran

Air pollution can have detrimental effects on human health as well as the environment. Particulate Matter (PM), as a global issue, is a type of air pollution that consists of small particles suspended in the air. Therefore, it is crucial to estimate and monitor levels of PM in the air in order to protect public health and the environment. This study proposed a novel hybrid method to apply the capability of two various deep learning models, namely, the encoder-decoder convolutional neural network and the Long Short-Term Memory (LSTM) model for PM10 prediction. The first model was utilized as a data argumentation method to enhance dataset diversity, and the LSTM model employed meteorological parameters and spatiotemporal factors to estimate the PM10 levels. The proposed technique achieved performance resulting in a coefficient of determination value of 0.88 and a mean absolute error value of 7.24. The results confirm that the developed hybrid method as an effective tool of PM prediction can be used to inform decision-making about policies and actions to reduce PM levels.

关键词： Particulate matter encoder-decoder CNN LSTM

来源：评论

学校读者我要写书评

暂无评论

MADLINK: Attentive multihop and entity descriptions for link prediction in knowledge graphs

引用

SEMANTIC WEB 2024年第1期15卷 83-106页

作者： Biswas, Russa Sack, Harald Alam, Mehwish Karlsruhe Inst Technol FIZ Karlsruhe Leibniz Inst Informat Infrastruct Karlsruhe Germany Karlsruhe Inst Technol Inst Appl Informat & Formal Descript Syst AIFB Karlsruhe Germany

Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. Random walk based path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores a bidirectional Gated Recurrent Unit (GRU) based encoder-decoder to learn the representation of the paths whereas SBERT is used to generate the representation of the entity descriptions. The proposed approach outperforms most of the state-of-the-art models and achieves comparable results with the rest when evaluated with FB15K, FB15K-237, WN18, WN18RR, and YAGO3-10 datasets.

关键词： Knowledge Graph embedding encoder-decoder link prediction path selection

来源：评论

学校读者我要写书评

暂无评论

Behavior Anomaly Detection Based On Multi-modal Feature Fusion And Its Application In English Teaching

JOURNAL OF APPLIED SCIENCE AND ENGINEERING

引用

JOURNAL OF APPLIED SCIENCE AND ENGINEERING 2025年第9期28卷 1657-1666页

作者： Kan, Lei Wang, Man Zhengzhou Vocat Coll Intelligent Technol Zhengzhou 451161 Peoples R China Natl Acad Governance Cent Comm CPC Grad Sch Party Sch Beijing 100089 Peoples R China Shenzhen MSU BIT Univ Coll Marxism Shenzhen 518000 Peoples R China

In order to improve the teaching quality, this paper proposes a multi-modal feature fusion-based abnormal behavior detection method, aiming at the problems of false detection, missing detection and imbalance of positive and negative samples in the abnormal behavior detection of students in class. The new method consists of encoder module, detection module and decoder module. The encoder module is used to extract the characteristic information of students behavior image and transfer it to the detection module. The behavior detection module obtains more image information through the feature fusion group to reduce the color distortion and artifacts of the behavior image, and transfers the obtained image information to the deep normalization correction convolution block to reduce the covariate shift and make the model easier to train. The multi-path feature convolution block can obtain image information with richer texture details. Finally, the decoder module converts the low-dimensional feature mapping back to the high-dimensional original input space through deconvolution and up-sampling operations to obtain the behavior detection image.

关键词： Abnormal behavior detection Multimodal feature fusion encoder-decoder Multi-path feature convolution block

来源：评论

学校读者我要写书评

暂无评论

Exploration of dual-attention mechanism-based deep learning for multi-step-ahead flood probabilistic forecasting

引用

JOURNAL OF HYDROLOGY 2023年第PartB期622卷

作者： Cui, Zhen Guo, Shenglian Zhou, Yanlai Wang, Jun Wuhan Univ State Key Lab Water Resources & Hydropower Engn Sc Wuhan 430072 Peoples R China

In order to improve the flood forecasting accuracy and reflect the forecast uncertainty information in the Three Gorges Reservoir (TGR) interval-basin in China, this study integrates the feature and temporal dual-attention (DA) mechanism and recursive encoder-decoder (RED) structure into the long short-term memory (LSTM) neural network to develop a DA-LSTM-RED model. The feature attention acts on the input variables of the encoder, and the temporal attention mechanism acts on the hidden layer states extracted by the LSTM neural network during encoding process, prompting the proposed model to extract critical input information among different types and moments of input variables to improve the multi-step-ahead flood forecasting accuracy. Second, the copula-based Hydrological Uncertainty Processor (copula-HUP) is used to quantify the forecast uncertainty of the proposed model meanwhile creating multi-step-ahead flood probabilistic forecasts. Combining the long-term 6 h hydrologic data series of the Xiangjiaba-TGR interval-basin and the forecasted precipitation from the European Centre for Medium-Range Weather Forecasts (ECMWF), the effectiveness of the proposed model, the effect of forecast precipitation on multi-step-ahead flood forecasting, and the effect of different copula functions on the probabilistic forecast of copula-HUP are investigated, respectively. The results show that the DALSTM-RED model can effectively improve the forecasting accuracy for long forecast horizons (3-7d) compared to the LSTM-RED model, and the average absolute error metrics are reduced by 10%-17%. Meanwhile, the proposed model can identify input variables with a high correlation with the target output variables, which improves the interpretability of deep learning to a certain extent. The Student copula-HUP has the lowest RB and CRPS metrics than the Frank and Gaussian copula-HUP, which can better quantify the DA-LSTM-RED model's forecast uncertainty. Therefore, combining the proposed mo

关键词： Dual-attention mechanism encoder-decoder Deep learning Hydrological uncertainty processor Copula function Probabilistic forecasting

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Based Standard Sign Language Discrimination

引用

IEEE ACCESS 2023年 11卷 125822-125834页

作者： Zhang, Menglin Yang, Shuying Zhao, Min Tianjin Univ Technol Sch Comp Sci & Engn Tianjin 300384 Peoples R China Minist Educ Key Lab Comp Vis & Syst Tianjin 300384 Peoples R China

General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in a sign language education software. To address this issue, this paper proposed a sign language category and standardization correctness discrimination model for sign language education. The proposed model is implemented with a hand detection and standard sign language discrimination method. For hand detection, the proposed method utilizes flow-guided features and acquires relevant proposals using stable and flow key frame detections. This model can resolve the inconsistency between the forward optical flow and the box center point offset. In addition, the proposed method employs an encoder-decoder model structure for sign language correctness discrimination. The encoder model combines 3D convolution and 2D deformable convolution results with residual structures, and it implements a sequence attention mechanism. A Sign Language Correctness Discrimination dataset (SLCD dataset) was also constructed in this study. In this dataset, each sign language video has two recognition labels, i.e., sign language category and standardization category. The semi-supervised learning method was employed to generate pseudo hand position labels. The hand detection model was getting sufficiently high hand detection result. The sign language correctness discrimination model was tested with hand patches or full images. SLCD dataset is available at https://***/10.21227/p9sn-dz70.

关键词： Sign language Encoding Video coding Object detection Three-dimensional displays Convolution Gesture recognition Assistive technologies Semisupervised learning Educational courses Software packages Continuous sign language recognition encoder-decoder tubelet video object detection 3D convolution

来源：评论

学校读者我要写书评

暂无评论

A novel method for cage whirl motion capture of high-precision bearing inspired by U-Net

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2023年第PartA期117卷 105552页

作者： Niu, Xiaoliang Yang, Zhaohui Zhou, Ningning Li, Chonghe Northwestern Polytech Univ Sch Aeronaut Xian 710072 Peoples R China Northwestern Polytech Univ Shenzhen Res & Dev Inst Shenzhen 518057 Peoples R China Beijing Inst Control Engn Beijing Key Lab Long Life Technol Precise Rotat & Beijing 100094 Peoples R China Aviat Ind Dev Res Ctr China AVICADR Res Inst Management Engn Beijing Peoples R China

To solve the problem of cage whirl motion capture and evaluation, this paper developed an efficient non -contact measurement method based on semantic segmentation technology. An encoder-decoder network whose backbone is U-Net is constructed by introducing residual learning and attention mechanism for cage motion state segmentation. A random move augmentation strategy is used to simulate the random movement of cage mass center. The network is trained with 1368 high-speed cage rotational images using the augmentation strategy. Additionally, 150 images are validation set, and 5000 images under different operating conditions are test set. A trained network is applied to the cage whirl motion capture under different operating conditions by matching the suitable parameters during the training phase. The results show that our method effectively predicts the trend of cage whirl motion, with the predicted cage whirl orbit used for the accurate analysis of cage rotational stability.

关键词： High-precision bearing Semantic segmentation encoder-decoder Random move augmentation Cage whirl motion

来源：评论

学校读者我要写书评

暂无评论

Multi-scale boundary neural network for gastric tumor segmentation

引用

VISUAL COMPUTER 2023年第3期39卷 915-926页

作者： Wang, Pengfei Li, Yunqi Sun, Yaru He, Dongzhi Wang, Zhiqiang Beijing Univ Technol Fac Informat Technol Beijing Peoples R China Chinese Peoples Liberat Army Gen Hosp Med Ctr 2 Dept Gastroenterol Beijing 100853 Peoples R China Chinese Peoples Liberat Army Gen Hosp Natl Clin Res Ctr Geriatr Dis Beijing 100853 Peoples R China Chinese Peoples Liberat Army Gen Hosp Med Ctr 1 Dept Gastroenterol Beijing 100853 Peoples R China

At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric cancer. However, the existing gastric cancer image datasets have disadvantages such as small data sizes and difficulty in labeling. Moreover, most existing CNN-based methods are unable to generate satisfactory segmentation masks without accurate labels, which are due to the limited context information and insufficient discriminative feature maps obtained after the consecutive pooling and convolution operations. This paper presents a gastric cancer lesion dataset for gastric tumor image segmentation research. A multiscale boundary neural network (MBNet) is proposed to automatically segment the real tumor area in gastric cancer images. MBNet adopts encoder-decoder architecture. In each stage of the encoder, a boundary extraction refinement module is proposed for obtaining multi granular edge information and refinement firstly. Then, we build a selective fusion module to selectively fuse features from the different stages. By cascading the two modules, the richer context and fine-grained features of each stage are encoded. Finally, the astrous spatial pyramid pooling is improved to obtain the remote dependency relationship of the overall context and the fine spatial structure information. The experimental results show that the accuracy of the model reaches 92.3%, the similarity coefficient (DICE) reaches 86.9%, and the performance of the proposed method on the CVC-ClinicDB and Kvasir-SEG datasets also outperforms existing approaches.

关键词： Gastric tumor segmentation encoder-decoder Convolutional neural network Deep learning

来源：评论

学校读者我要写书评

暂无评论

A forecast model of short-term wind speed based on the attention mechanism and long short-term memory

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第15期83卷 45603页

作者： Xing, Wang Qi-liang, Wu Gui-rong, Tan Dai-li, Qian Ke, Zhou Nanjing Univ Informat Sci & Technol Natl Demonstrat Ctr Expt Atmospher Sci & Environm Nanjing Jiangsu Peoples R China Nanjing Univ Informat Sci & Technol Sch Artificial Intelligence Nanjing Jiangsu Peoples R China Nanjing Univ Aeronaut & Astronaut Coll Civil Aviat Nanjing Jiangsu Peoples R China

Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future period at the target station by using the time series of wind speed in the past period at the target station and its adjacent stations. This method is established by using deep learning technology. Based on the infrastructure of encoder-decoder, the driving series at the adjacent stations and the target series at the target station are taken as the input of the encoder module and the decoder module, respectively. There are two attention layers in the encoder module. One is used to strengthen the contribution of each influence factor in the input driving series to the hidden state in the long short-term memory (LSTM) layer. The other is used to enable the encoder to adaptively select the hidden state output by the LSTM layer. The loss function based on the Gaussian kernel function is adopted in the forecast model of this study, and the dynamic weight is designed to optimize the attention to the errors of the output results at different forecast leading times in the training process of the neural network model, thus improving the model forecast performance for longer forecast leading times. The results show that the performance of this method is excellent in the wind speed forecast from T+1 to T+24. The mean absolute error and root mean squared error of the forecast results at T+24 are 0.796 m s-1 and 1.029 m s-1, respectively, which are better than those of the other two models in the experiment. It is proved that the method proposed in this study can not only be applied to the wind speed forecast but also can provide technical support for operational applications such as early-warning of gale disaster and wind power prediction.

关键词： Wind speed forecasting Attention mechanism encoder-decoder Long short-term memory Deep learning Dynamic weight

来源：评论

学校读者我要写书评

暂无评论

Short-term traffic flow prediction model based on a shared weight gate recurrent unit neural network

引用

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS 2023年 618卷

作者： Sun, Xiaoyong Chen, Fenghao Wang, Yuchen Lin, Xuefen Ma, Weifeng Zhejiang Univ Sci & Technol Sch Informat & Elect Engn Hangzhou 310023 Peoples R China

Accurate traffic flow prediction is critical for enhancing traffic network operational efficiency. With the continuous expansion of traffic networks, providing reliable and efficient multi-step traffic flow prediction for large-scale traffic networks with a large number of sensors deployed has become a challenging issue. In this paper, we propose a multi-step many-to-many traffic prediction model for large-scale traffic networks, called spatio-temporal Shared GRU (STSGRU), which receive inputs from multiple sensors and provides predictions for all sensors simultaneously. First, we model the weekly pattern of traffic flow, using periodicity to explore long-term temporal features and provide smooth traffic flow to reduce the impact of data volatility. Second, different from existing models, we propose a shared weight mechanism to achieve many-to-many prediction without mapping traffic networks to images or graph structures. The proposed model strikes a delicate balance between complexity and accuracy. We validate the effectiveness of the proposed method on the Caltrans Performance Measurement System (PeMS) dataset. The results show that our model achieves similar prediction performance with advanced graph neural networks and has higher flexibility. & COPY;2023 Elsevier B.V. All rights reserved.

关键词： Traffic flow prediction Gated recurrent unit encoder-decoder Periodicity Shared weight

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：