检索结果-内蒙古大学图书馆

49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Song, Kai Wang, Zhengtan Dai, Huhe Zheng, Yuan Inner Mongolia Univ Hohhot Peoples R China

ISBN: (纸本)9798350344868;9798350344851

Attention mechanisms are widely adopted in existing scene parsing methods due to their excellent performance, especially spatial self-attention. However, spatial self-attention suffers from high computational complexity, which limits the practical applications of the scene parsing methods on mobile devices with limited resources. In view of this, we propose a simple yet effective spatial attention module, namely Content-Aware Attention Module (CA2 M). CA2M is a lightweight spatial attention module that consists of several convolution and pooling operations, compared to various spatial self-attention modules. Moreover, it is able to adaptively select spatial pixel information which is helpful for scene parsing task. With CA2M, we present a Content-aware Enhanced Network for scene parsing (CENet), where CA2M is introduced into the lateral connections at four different scales, resulting in a semantic alignment at adjacent scales and an effective semantic propagation. To validate the performance of the proposed CA2 M and CENet, we conduct extensive experiments and achieve consistently improved performances on three popular benchmarks. Furthermore, we verify their generalization ability when using different baseline models and backbone networks. Code is available at https://***/ZY-IMU-CV/CENET_SK_2023.

关键词： Spatial attention image semantic segmentation encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

A stroke of genius: Predicting the next move in badminton

A stroke of genius: Predicting the next move in badminton

引用

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Ibh, Magnus Grasshof, Stella Hansen, Dan Witzner IT Univ Copenhagen Machine Learning Grp Copenhagen Denmark

ISBN: (纸本)9798350365474

This paper presents, RallyTemPose, a transformer encoder-decoder model for predicting future badminton strokes based on previous rally actions. The model uses court position, skeleton poses, and player-specific embeddings to learn stroke and player-specific latent representations in a spatiotemporal encoder module. The representations are then used to condition the subsequent strokes in a decoder module through rally-aware fusion blocks, which provide additional relevant strategic and technical considerations to make more informed predictions. RallyTemPose shows improved forecasting accuracy compared to traditional sequential methods on two real-world badminton datasets. The performance boost can also be attributed to the inclusion of improved stroke embeddings extracted from the latent representation of a pre-trained large-language model subjected to detailed text descriptions of stroke descriptions. In the discussion, the latent representations learned by the encoder module show useful properties regarding player analysis and comparisons. The code can be found at: This https url.

关键词： Action Forecasting Computer Vision encoder-decoder Skeleton-data Sports Application

来源：评论

学校读者我要写书评

暂无评论

A FEDERATED GRAPH TO EMBEDDING APPROACH FOR KNOWLEDGE GRAPH COMPLETION 49

A FEDERATED GRAPH TO EMBEDDING APPROACH FOR KNOWLEDGE GRAPH ...

引用

49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Sun, Hongliang Bi, Xiaofeng Sui, Dianbo Tu, Zhiying Harbin Inst Technol Weihai Peoples R China

ISBN: (纸本)9798350344868;9798350344851

Knowledge graph completion (KGC) tasks have been developed to address the inherent incompleteness of KGs. Recently, knowledge graph embedding (KGE) methods have gained popularity for embedding entities and relations, proving effective in KGC. However, privacy concerns make it challenging to collect privacy KG data from different institutions in the actual application. Federated learning has emerged as a solution for training models with decentralized data, eliminating the need for collecting private data. However, existing federated KGE methods overlook the implicit graph structural information of entities and relations, resulting in fragmented and incomplete representations within federated clients. Moreover, these methods often struggle with capturing multiple relational representations. To address these challenges, we propose a Federated Graph to Embedding (FedGE) approach based on encoder-decoder to capture interactions among entities and relations. Extensive experiments on two common KG datasets demonstrate the superiority of our method. The code is available at https://***/s460305450/***.

关键词： knowledge graph completion federated learning privacy protection encoder-decoder graph convolutional network

来源：评论

学校读者我要写书评

暂无评论

Deep Compressed Sensing-Based Cascaded Channel Estimation for RIS-Aided Communication Systems

引用

IEEE WIRELESS COMMUNICATIONS LETTERS 2022年第4期11卷 846-850页

作者： Xie, Wenwu Xiao, Jian Zhu, Peng Yu, Chao Yang, Liang Hunan Inst Sci & Technol Sch Informat Sci & Engn Yueyang 414006 Peoples R China Adv Cryptog & Syst Secur Key Lab Sichuan Prov Chengdu 610103 Peoples R China

To reduce the pilot overhead of cascaded channel estimation for RIS-aided Massive MIMO communication system, we proposed a deep compressed sensing-based channel estimation scheme, where U-shaped network (U-Net), an encoder-decoder with skip connection, is used to recover the high-dimensional cascaded channel matrix from limited pilot overhead. The skip connections between encoder and decoder can fuse features of different scales and semantic by concatenating the feature map, which enhance the reconstruction performance of cascaded channel. To further improve the feature extraction ability of U-Net, we design a ResU-Net architecture with stacked residual units to increase the depth of network. Simulation results show the channel estimation of ResU-Net is more accurate than conventional algorithm and other network model. Meanwhile, ResU-Net has good generalization and robustness for different pilot lengths and phase quantization errors.

关键词： Channel estimation Azimuth Quantization (signal) Estimation Sensors Compressed sensing Chaos Reconfigurable intelligent surface channel estimation deep compressed sensing encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Efficient Lung Segmentation for Tumour Detection 19th

Efficient Lung Segmentation for Tumour Detection

引用

19th International Symposium on Visual Computing

作者： Hiraman, Anura Viriri, Serestina Gwetu, Mandlenkosi Univ KwaZulu Natal Sch Math Stat & Comp Sci Durban South Africa

ISBN: (纸本)9783031773914;9783031773921

Over the past decade, deep learning has significantly impacted medical imaging, particularly in segmentation tasks. encoder-decoder methods have advanced medical image segmentation by restoring feature map resolution and minimizing information loss during decoding. This paper introduces three key contributions: a method for region of interest (RoI) detection that isolates the lung region by classifying CT scan slices using two convolutional neural networks-one for slices above the lungs;an architecture inspired by U-Net, which employs a convolutional encoder-decoder-encoder-decoder pattern to enhance learning by feeding output feature maps from the first decoder into the second encoder;and concatenation layers between the encoder-decoder networks that reintroduce important features lost in the initial structure, improving the learning of complex features. Evaluation results demonstrate the effectiveness of these methods for lung segmentation, achieving an average DSC of 92.68%.

关键词： CNN encoder-decoder Lung segmentation Region of interest

来源：评论

学校读者我要写书评

暂无评论

Split, Embed and Merge: An accurate table structure recognizer

引用

PATTERN RECOGNITION 2022年 126卷 108565-108565页

作者： Zhang, Zhenrong Zhang, Jianshu Du, Jun Wang, Fengren Univ Sci & Technol China Natl Engn Res Ctr Speech & Language Informat Proc 96 JinZhai Rd Hefei Anhui Peoples R China IFLYTEK Res Hefei Anhui Peoples R China

Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row/column separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and text modalities. Moreover, we achieve a higher precision in our experiments through providing additional textual features. Finally, we process the merging of these basic table grids in a self-regression manner. The corresponding merging results are learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97 . 11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing. Extensive experiments on other publicly available datasets further demonstrate the effectiveness of our proposed approach. (c) 2022 Elsevier Ltd. All rights reserved.

关键词： Table structure recognition Self-regression Attention mechanism encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2022年第2期13卷 485-496页

作者： Zhan, Hongjian Lyu, Shujing Lu, Yue Shanghai Key Lab Multidimens Informat Proc Shanghai 200241 Peoples R China East China Normal Univ Sch Commun & Elect Engn Shanghai 200062 Peoples R China

In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character;then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.

关键词： Offline Handwritten Chinese text recognition Glyph embedding Semanteme embedding Embedding fusion encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

PhyCRNet: Physics-informed convolutional-recurrent network for solving spatiotemporal PDEs

引用

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2022年 389卷 114399-114399页

作者： Ren, Pu Rao, Chengping Liu, Yang Wang, Jian-Xun Sun, Hao Northeastern Univ Dept Civil & Environm Engn Boston MA 02115 USA Northeastern Univ Dept Mech & Ind Engn Boston MA 02115 USA Univ Notre Dame Dept Aerosp & Mech Engn Notre Dame IN 46556 USA Renmin Univ China Gaoling Sch Artificial Intelligence Beijing 100872 Peoples R China Beijing Key Lab Big Data Management & Anal Method Beijing 100872 Peoples R China MIT Dept Civil & Environm Engn 77 Massachusetts Ave Cambridge MA 02139 USA

Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed neural networks (PINNs) to solve PDEs as a basis for data-driven modeling and inverse analysis. However, the majority of existing PINN methods, based on fully-connected NNs, pose intrinsic limitations to low-dimensional spatiotemporal parameterizations. Moreover, since the initial/boundary conditions (I/BCs) are softly imposed via penalty, the solution quality heavily relies on hyperparameter tuning. To this end, we propose the novel physics-informed convolutional-recurrent learning architectures (PhyCRNet and PhyCRNet-s) for solving PDEs without any labeled data. Specifically, an encoder-decoder convolutional long short-term memory network is proposed for low-dimensional spatial feature extraction and temporal evolution learning. The loss function is defined as the aggregated discretized PDE residuals, while the I/BCs are hard-encoded in the network to ensure forcible satisfaction (e.g., periodic boundary padding). The networks are further enhanced by autoregressive and residual connections that explicitly simulate time marching. The performance of our proposed methods has been assessed by solving three nonlinear PDEs (e.g., 2D Burgers' equations, the lambda-omega and FitzHugh Nagumo reaction-diffusion equations), and compared against the start-of-the-art baseline algorithms. The numerical results demonstrate the superiority of our proposed methodology in the context of solution accuracy, extrapolability and generalizability. (C) 2021 Elsevier B.V. All rights reserved.

关键词： Convolutional-recurrent learning Partial differential equations encoder-decoder Physics-informed deep learning Residual connection Hard-encoding of I/BCs

来源：评论

学校读者我要写书评

暂无评论

A review of sign language recognition research

引用

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022年第4期43卷 3879-3898页

作者： Yu, Ming Jia, Jingli Xue, Cuihong Yan, Gang Guo, Yingchun Liu, Yuehao Hebei Univ Technol Sch Artificial Intelligence Tianjin Peoples R China Tianjin Univ Technol Tech Coll Deaf Tianjin Peoples R China

Sign language is the primary way of communication between hard-of-hearing and hearing people. Sign language recognition helps promote the better integration of deaf and hard-of-hearing people into society. We reviewed 95 types of research on sign language recognition technology from 1993 to 2021, analyzing and comparing algorithms from three aspects of gesture, isolated word, and continuous sentence recognition, elaborating the evolution of sign language acquisition equipment and we summarized the datasets of sign language recognition research and evaluation criteria. Finally, the main technology trends are discussed, and future challenges are analyzed.

关键词： Sign language recognition convolutional neural network encoder-decoder dataset

来源：评论

学校读者我要写书评

暂无评论

Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2022年 60卷 1页

作者： Li, Yunpeng Zhang, Xiangrong Gu, Jing Li, Chen Wang, Xin Tang, Xu Jiao, Licheng Xidian Univ Sch Artificial Intelligence Xian 710071 Peoples R China Xi An Jiao Tong Univ Sch Elect & Informat Engn Xian 710049 Peoples R China

The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing images. We introduce a novel recurrent attention and semantic gate (RASG) framework to facilitate the remote sensing image captioning in this article, which integrates competitive visual features and a recurrent attention mechanism to generate a better context vector for the images every time as well as enhances the representations of the current word state. Specifically, we first project each image into competitive visual features by taking the advantage of both static visual features and multiscale features. Then, a novel recurrent attention mechanism is developed to extract the high-level attentive maps from encoded features and nonvisual features, which can help the decoder recognize and focus on the effective information for understanding the complex content of the remote sensing images. Finally, the hidden states from the long short-term memory (LSTM) and other semantic references are incorporated into a semantic gate, which contributes to more comprehensive and precise semantic understanding. Comprehensive experiments on three widely used datasets, Sydney-Captions, UCM-Captions, and Remote Sensing Image Captioning Dataset, have demonstrated the superiority of the proposed RASG over a series of attentive models based on image captioning methods.

关键词： Feature extraction Semantics Visualization Remote sensing Logic gates Decoding Neural networks Attention mechanism encoder-decoder remote sensing image captioning semantic understanding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：