Attention mechanisms are widely adopted in existing scene parsing methods due to their excellent performance, especially spatial self-attention. However, spatial self-attention suffers from high computational complexi...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Attention mechanisms are widely adopted in existing scene parsing methods due to their excellent performance, especially spatial self-attention. However, spatial self-attention suffers from high computational complexity, which limits the practical applications of the scene parsing methods on mobile devices with limited resources. In view of this, we propose a simple yet effective spatial attention module, namely Content-Aware Attention Module (CA2 M). CA2M is a lightweight spatial attention module that consists of several convolution and pooling operations, compared to various spatial self-attention modules. Moreover, it is able to adaptively select spatial pixel information which is helpful for scene parsing task. With CA2M, we present a Content-aware Enhanced Network for scene parsing (CENet), where CA2M is introduced into the lateral connections at four different scales, resulting in a semantic alignment at adjacent scales and an effective semantic propagation. To validate the performance of the proposed CA2 M and CENet, we conduct extensive experiments and achieve consistently improved performances on three popular benchmarks. Furthermore, we verify their generalization ability when using different baseline models and backbone networks. Code is available at https://***/ZY-IMU-CV/CENET_SK_2023.
This paper presents, RallyTemPose, a transformer encoder-decoder model for predicting future badminton strokes based on previous rally actions. The model uses court position, skeleton poses, and player-specific embedd...
详细信息
ISBN:
(纸本)9798350365474
This paper presents, RallyTemPose, a transformer encoder-decoder model for predicting future badminton strokes based on previous rally actions. The model uses court position, skeleton poses, and player-specific embeddings to learn stroke and player-specific latent representations in a spatiotemporal encoder module. The representations are then used to condition the subsequent strokes in a decoder module through rally-aware fusion blocks, which provide additional relevant strategic and technical considerations to make more informed predictions. RallyTemPose shows improved forecasting accuracy compared to traditional sequential methods on two real-world badminton datasets. The performance boost can also be attributed to the inclusion of improved stroke embeddings extracted from the latent representation of a pre-trained large-language model subjected to detailed text descriptions of stroke descriptions. In the discussion, the latent representations learned by the encoder module show useful properties regarding player analysis and comparisons. The code can be found at: This https url.
Knowledge graph completion (KGC) tasks have been developed to address the inherent incompleteness of KGs. Recently, knowledge graph embedding (KGE) methods have gained popularity for embedding entities and relations, ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Knowledge graph completion (KGC) tasks have been developed to address the inherent incompleteness of KGs. Recently, knowledge graph embedding (KGE) methods have gained popularity for embedding entities and relations, proving effective in KGC. However, privacy concerns make it challenging to collect privacy KG data from different institutions in the actual application. Federated learning has emerged as a solution for training models with decentralized data, eliminating the need for collecting private data. However, existing federated KGE methods overlook the implicit graph structural information of entities and relations, resulting in fragmented and incomplete representations within federated clients. Moreover, these methods often struggle with capturing multiple relational representations. To address these challenges, we propose a Federated Graph to Embedding (FedGE) approach based on encoder-decoder to capture interactions among entities and relations. Extensive experiments on two common KG datasets demonstrate the superiority of our method. The code is available at https://***/s460305450/***.
To reduce the pilot overhead of cascaded channel estimation for RIS-aided Massive MIMO communication system, we proposed a deep compressed sensing-based channel estimation scheme, where U-shaped network (U-Net), an en...
详细信息
To reduce the pilot overhead of cascaded channel estimation for RIS-aided Massive MIMO communication system, we proposed a deep compressed sensing-based channel estimation scheme, where U-shaped network (U-Net), an encoder-decoder with skip connection, is used to recover the high-dimensional cascaded channel matrix from limited pilot overhead. The skip connections between encoder and decoder can fuse features of different scales and semantic by concatenating the feature map, which enhance the reconstruction performance of cascaded channel. To further improve the feature extraction ability of U-Net, we design a ResU-Net architecture with stacked residual units to increase the depth of network. Simulation results show the channel estimation of ResU-Net is more accurate than conventional algorithm and other network model. Meanwhile, ResU-Net has good generalization and robustness for different pilot lengths and phase quantization errors.
Over the past decade, deep learning has significantly impacted medical imaging, particularly in segmentation tasks. encoder-decoder methods have advanced medical image segmentation by restoring feature map resolution ...
详细信息
ISBN:
(纸本)9783031773914;9783031773921
Over the past decade, deep learning has significantly impacted medical imaging, particularly in segmentation tasks. encoder-decoder methods have advanced medical image segmentation by restoring feature map resolution and minimizing information loss during decoding. This paper introduces three key contributions: a method for region of interest (RoI) detection that isolates the lung region by classifying CT scan slices using two convolutional neural networks-one for slices above the lungs;an architecture inspired by U-Net, which employs a convolutional encoder-decoder-encoder-decoder pattern to enhance learning by feeding output feature maps from the first decoder into the second encoder;and concatenation layers between the encoder-decoder networks that reintroduce important features lost in the initial structure, improving the learning of complex features. Evaluation results demonstrate the effectiveness of these methods for lung segmentation, achieving an average DSC of 92.68%.
Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure ...
详细信息
Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row/column separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and text modalities. Moreover, we achieve a higher precision in our experiments through providing additional textual features. Finally, we process the merging of these basic table grids in a self-regression manner. The corresponding merging results are learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97 . 11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing. Extensive experiments on other publicly available datasets further demonstrate the effectiveness of our proposed approach. (c) 2022 Elsevier Ltd. All rights reserved.
In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese chara...
详细信息
In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character;then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.
Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed...
详细信息
Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed neural networks (PINNs) to solve PDEs as a basis for data-driven modeling and inverse analysis. However, the majority of existing PINN methods, based on fully-connected NNs, pose intrinsic limitations to low-dimensional spatiotemporal parameterizations. Moreover, since the initial/boundary conditions (I/BCs) are softly imposed via penalty, the solution quality heavily relies on hyperparameter tuning. To this end, we propose the novel physics-informed convolutional-recurrent learning architectures (PhyCRNet and PhyCRNet-s) for solving PDEs without any labeled data. Specifically, an encoder-decoder convolutional long short-term memory network is proposed for low-dimensional spatial feature extraction and temporal evolution learning. The loss function is defined as the aggregated discretized PDE residuals, while the I/BCs are hard-encoded in the network to ensure forcible satisfaction (e.g., periodic boundary padding). The networks are further enhanced by autoregressive and residual connections that explicitly simulate time marching. The performance of our proposed methods has been assessed by solving three nonlinear PDEs (e.g., 2D Burgers' equations, the lambda-omega and FitzHugh Nagumo reaction-diffusion equations), and compared against the start-of-the-art baseline algorithms. The numerical results demonstrate the superiority of our proposed methodology in the context of solution accuracy, extrapolability and generalizability. (C) 2021 Elsevier B.V. All rights reserved.
Sign language is the primary way of communication between hard-of-hearing and hearing people. Sign language recognition helps promote the better integration of deaf and hard-of-hearing people into society. We reviewed...
详细信息
Sign language is the primary way of communication between hard-of-hearing and hearing people. Sign language recognition helps promote the better integration of deaf and hard-of-hearing people into society. We reviewed 95 types of research on sign language recognition technology from 1993 to 2021, analyzing and comparing algorithms from three aspects of gesture, isolated word, and continuous sentence recognition, elaborating the evolution of sign language acquisition equipment and we summarized the datasets of sign language recognition research and evaluation criteria. Finally, the main technology trends are discussed, and future challenges are analyzed.
The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content a...
详细信息
The remote sensing image captioning has attracted wide spread attention in remote sensing field due to its application potentiality. However, most existing approaches model limited interactions between image content and sentence and fail to exploit special characteristics of the remote sensing images. We introduce a novel recurrent attention and semantic gate (RASG) framework to facilitate the remote sensing image captioning in this article, which integrates competitive visual features and a recurrent attention mechanism to generate a better context vector for the images every time as well as enhances the representations of the current word state. Specifically, we first project each image into competitive visual features by taking the advantage of both static visual features and multiscale features. Then, a novel recurrent attention mechanism is developed to extract the high-level attentive maps from encoded features and nonvisual features, which can help the decoder recognize and focus on the effective information for understanding the complex content of the remote sensing images. Finally, the hidden states from the long short-term memory (LSTM) and other semantic references are incorporated into a semantic gate, which contributes to more comprehensive and precise semantic understanding. Comprehensive experiments on three widely used datasets, Sydney-Captions, UCM-Captions, and Remote Sensing Image Captioning Dataset, have demonstrated the superiority of the proposed RASG over a series of attentive models based on image captioning methods.
暂无评论