检索结果-内蒙古大学图书馆

DAEA-Net: Dual Attention and Elevation-Aware Networks for Airborne LiDAR Point Cloud Semantic Segmentation

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2024年 62卷 1页

作者： Zhu, Yurong Liu, Zhihui Liu, Changhong China Univ Geosci Sch Math & Phys Wuhan 430074 Peoples R China China Univ Geosci Hubei Key Lab Intelligent Geoinformat Proc Wuhan 430078 Peoples R China

Semantic segmentation of airborne laser scanning (ALS) point clouds remains a challenging task due to the complexity and diversity of 3-D scenes in the real world. Currently, most deep learning-based airborne LiDAR point cloud segmentation methods prioritize designing local feature extraction operators while overlooking the long-range dependencies among neighborhoods and the inherently diverse properties of point cloud data. To address these issues, this article introduces a dual-attention and elevation-aware airborne LiDAR point cloud semantic segmentation network (DAEA-Net) built upon an encoding-decoding architecture. First, we develop a cross multiple anti-affine attention (CMAAA) module that effectively captures global contextual information across different neighborhoods through interactive learning of multiple features. Second, we introduce an elevation awareness (EA) module that uses normal vectors to establish a geometric similarity discriminant for each neighboring point. It incorporates an autoencoder architecture to fuse elevation information, enhancing the horizontal structural dissimilarity between objects of similar height while enriching the representation of elevation data. Additionally, to compensate for the potential information loss in the encoding-decoding hierarchical structure, we design a lightweight U-global attention (UGA) module to link decoding and encoding hierarchical levels. It merges features of different resolutions and levels during downsampling and upsampling through pooling while utilizing the self-attention mechanism to enhance the network's global expression capability. The proposed DAEA-Net enhances ALS semantic segmentation performance by enabling interactive learning of multiple features and effectively representing elevation information. Extensive experiments conducted on two datasets demonstrate that our method delivers superior semantic segmentation performance compared to several existing advanced techniques.

关键词： Point cloud compression Feature extraction Semantic segmentation Three-dimensional displays Laser radar Geology Encoding Attention mechanism elevation information encoder-decoder structure global perception point cloud semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Enhanced pixel-level crack detection using Att-SegCrack: an improved CNN with feature fusion and large receptive field

引用

ROAD MATERIALS AND PAVEMENT DESIGN 2025年

作者： Su, Liangliang Huang, Hao Yang, Yalong Liu, Yunlin Niu, Zhen Hu, Qizhi Anhui Jianzhu Univ Anhui Prov Key Lab Intelligent Bldg & Bldg Energy Hefei Peoples R China Anhui Jianzhu Univ Anhui Inst Strateg Study Carbon Dioxide Emiss Peak Hefei Peoples R China Anhui Jianzhu Univ Sch Elect & Informat Engn Hefei Peoples R China Anhui Jianzhu Univ Coll Civil Engn Hefei Peoples R China

Deep convolutional neural networks have demonstrated significant advancements in pavement crack detection. Nevertheless, challenges persist in achieving satisfactory performance due to discontinuous crack edges and low background contrast. This paper presents Att-SegCrack, an enhanced encoder-decoder network that addresses these limitations through three key components. First, a simple yet effective feature fusion scheme restores crack details by bilinearly up-sampling encoder features and integrating them with outputs from the penultimate decoder layer, which subsequently serves as input to the final decoding layer. Second, dilated convolutions expand the receptive field to capture comprehensive contextual information for complete crack profiles. Third, the convolutional block attention module enhances crack-background differentiation in low-level features. Evaluations on two benchmark datasets (Crack500 and DeepCrack) demonstrate that our method outperforms other state-of-the-art methods in crack detection performance.

关键词： Crack detection multi-scale feature fusion attention mechanism dilated convolution encoder-decoder structure

来源：评论

学校读者我要写书评

暂无评论

Ship Segmentation via encoder-decoder Network With Global Attention in High-Resolution SAR Images

引用

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 2022年 19卷 1页

作者： Li, Jichao Gou, Shuiping Li, Ruimin Chen, Jia-Wei Sun, Xiaolong Xidian Univ Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Peoples R China Xidian Univ Acad Adv Interdisciplinary Res Xian 710071 Shaanxi Peoples R China China Elect Technol Grp Corp 20th Res Inst Xian 710068 Shaanxi Peoples R China

Ship detection in the synthetic aperture radar (SAR) image is of great significance in the fields of military and coastal defense. Most ship detection methods are designed based on the object detection framework, which can only provide the vertices' coordinates of the bounding box covering the ship targets but cannot provide more detailed contour information. Target segmentation can further explore the shape and edge information of the objects, which can be used as a blazing novel means for automatic object detection. In this letter, a 3-D atrous encoder-decoder neural network with global attention modules (GAM-EDNet) is proposed to achieve ship segmentation in SAR images. The encoder-decoder structure with atrous convolution is developed as the network body to fully exploit the structural information of the ship targets with various sizes. To increase the structural information of the single-polarization SAR images, a 3-D image cube is designed as the input of the GAM-EDNet. A global attention module is proposed to further improve the segmentation performance by integrating the high-level semantic features with the low-level location features. Besides, an SAR ship segmentation dataset (SAR-HR4) is built to evaluate the segmentation performance, and the experimental results show that the proposed GAM-EDNet achieves better performance than other state-of-the-art methods.

关键词： Marine vehicles Image segmentation Radar polarimetry Semantics Synthetic aperture radar Feature extraction Wavelet transforms encoder-decoder structure global attention module high-resolution synthetic aperture radar (SAR) images ship segmentation

来源：评论

学校读者我要写书评

暂无评论

Design of oral English teaching model based on multi-modal perception of the Internet of Things and improved conventional neural networks

引用

PEERJ COMPUTER SCIENCE 2023年 9卷 e1503页

作者： Qin, Haitao Hubei Normal Univ Coll Foreign Studies Huangshi Hubei Peoples R China

Oral English instruction plays a pivotal role in educational endeavors. The emergence of online teaching in response to the epidemic has created an urgent demand for a methodology to evaluate and monitor oral English instruction. In the post-epidemic era, distance learning has become indispensable for educational pursuits. Given the distinct teaching modality and approach of oral English instruction, it is imperative to explore an intelligent scoring technique that can effectively oversee the content of English teaching. With this objective in mind, we have devised a scoring approach for oral English instruction based on multi-modal perception utilizing the Internet of Things (IoT). Initially, a trained convolutional neural network (CNN) model is employed to extract and quantify visual information and audio features from the IoT, reducing them to a fixed dimension. Subsequently, an external attention model is proposed to compute spoken English and image characteristics. Lastly, the content of English instruction is classified and graded based on the quantitative attributes of oral dialogue. Our findings illustrate that our scoring model for oral English instruction surpasses others, achieving the highest rankings and an accuracy of 88.8%, outperforming others by more than 2%.

关键词： Oral English teaching Multi-modal perception encoder-decoder structure CNN

来源：评论

学校读者我要写书评

暂无评论

AI-Enhanced Digital Creativity Design: Content-Style Alignment for Image Stylization

引用

IEEE ACCESS 2023年 11卷 143964-143979页

作者： Yu, Lanting Zheng, Qiang Chongqing Business Vocat Coll Publishing & Media Dept Chongqing 401331 Peoples R China CISDI Info Chongqing 401122 Peoples R China

This paper presents an AI(Artificial Intelligence)-powered method for enhancing digital creative design through image stylization. To achieve this, we introduce the Content-Style Alignment Module (CSAM), which includes the Dual-Stream Content-Style Processing Block (DS-CSPB), Content-Style Matching Attention Block (CS-MAB), and Content-Style Space-Aware Interpolation Block (CS-SAIB). DS-CSPB removes style information from content descriptors using whitening transformation while preserving semantic structures. CS-MAB reorganizes each content descriptor with its most relevant style descriptor, ensuring optimal style adaptation for content semantics. CS-SAIB aligns content and style descriptors in the same space, enabling diverse semantic distributions in content images to match various style patterns. Moreover, we introduce the Multifaceted Optimization Loss (MOL). This loss comprises multiple components: The relaxed Earth Mover Distance (rEMD) loss enhances color and texture distributions on content images. The Moment Matching (MM) loss reduces visual artifacts caused by cosine distance. The differentiable Color Histogram (CH) loss efficiently addresses color blending issues, preserving image naturalness. The content loss ensures no significant deformation or distortion during stylization. The reconstruction loss constrains all encoder-decoder features to the VGG feature space, maintaining shared spaces between content and style descriptors. We conducted extensive comparative and ablation experiments, which demonstrated superior performance in image stylization, resulting in high-quality stylized images. Additionally, we provide a comprehensive review of current research in image stylization, effectively bridging the gap in this area.

关键词： Deep learning stylization encoder-decoder structure VGG

来源：评论

学校读者我要写书评

暂无评论

Rethinking Transformers for Semantic Segmentation of Remote Sensing Images

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2023年 61卷 1页

作者： Liu, Yuheng Zhang, Yifan Wang, Ye Mei, Shaohui Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Peoples R China

Transformer has been widely applied in image processing tasks as a substitute for convolutional neural networks (CNNs) for feature extraction due to its superiority in global context modeling and flexibility in model generalization. However, the existing transformer-based methods for semantic segmentation of remote sensing (RS) images are still with several limitations, which can be summarized into two main aspects: 1) the transformer encoder is generally combined with CNN-based decoder, leading to inconsistency in feature representations;and 2) the strategies for global and local context information utilization are not sufficiently effective. Therefore, in this article, a global-local transformer segmentor (GLOTS) framework is proposed for the semantic segmentation of RS images to acquire consistent feature representations by adopting transformers for both encoding and decoding, in which a masked image modeling (MIM) pretrained transformer encoder is adopted to learn semantic-rich representations of input images and a multiscale global-local transformer decoder is designed to fully exploit the global and local features. Specifically, the transformer decoder uses a feature separation-aggregation module (FSAM) to utilize the feature adequately at different scales and adopts a global-local attention module (GLAM) containing global attention block (GAB) and local attention block (LAB) to capture the global and local context information, respectively. Furthermore, a learnable progressive upsampling strategy (LPUS) is proposed to restore the resolution progressively, which can flexibly recover the fine-grained details in the upsampling process. The experiment results on the three benchmark RS datasets demonstrate that the proposed GLOTS is capable of achieving better performance with some state-of-the-art methods, and the superiority of the proposed framework is also verified by ablation studies. The code will be available at https://***/lyhnsn/GLOTS.

关键词： encoder-decoder structure global-local transformer remote sensing (RS) semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Cascaded transformer U-net for image restoration

引用

SIGNAL PROCESSING 2023年第1期206卷

作者： Yan, Longbin Zhao, Min Liu, Shumin Shi, Shuaikai Chen, Jie Northwestern Polytech Univ Shenzhen Res & Dev Inst Shenzhen Peoples R China Northwestern Polytech Univ Sch Marine Sci & Technol Xian 710072 Peoples R China Blueye Intelligence Zhenjiang Peoples R China

Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs. (c) 2022 Published by Elsevier B.V.

关键词： Image deraining Underwater image enhancement Near infrared image colorization encoder-decoder structure Long -range dependence modeling

来源：评论

学校读者我要写书评

暂无评论

Attention Aggregation encoder-decoder Network Framework for Stereo Matching

引用

IEEE SIGNAL PROCESSING LETTERS 2020年 27卷 760-764页

作者： Zhang, Yaru Li, Yaqian Kong, Yating Liu, Bin Yanshan Univ Sch Informat Sci & Engn Qinhuangdao 066004 Hebei Peoples R China Yanshan Univ Sch Elect Engn Qinhuangdao 066004 Hebei Peoples R China

In the stereo matching networks based on deep learning, current cost aggregation networks lack the means to aggregate cost volume to the utmost extent. Therefore, different from the standard encoder-decoder structures, we propose an attention aggregation encoder-decoder network framework for stereo matching that contains three modules. Specifically, we design a sub-branch and cross-stage aggregation encoding module, which aggregate context information of different sub-branches and cross-stages to achieve the mutual utilization of different deep cost volumes. Meanwhile, we introduce a three-dimensional attention recoding module to obtain the robust discriminative cost volume through recalibrating the high-level semantic information of the sub-branches. In addition, we construct a stepwise aggregation decoding module to decode the cost volume via the stepwise fusion upsampling strategy, which further enhances the learning ability of the network model. The experimental results on Scene Flow and KITTI benchmark datasets show that the proposed network framework is superior to other similar methods in aggregating information.

关键词： Three-dimensional displays Encoding Training Semantics Decoding Feature extraction Convolution Attention mechanism deep learning encoder-decoder structure stereo matching

来源：评论

学校读者我要写书评

暂无评论

A batch-wise LSTM-encoder decoder network for batch process monitoring

引用

CHEMICAL ENGINEERING RESEARCH & DESIGN 2020年 164卷 102-112页

作者： Ren, Jiayang Ni, Dong Zhejiang Univ Coll Control Sci & Engn Hangzhou 310027 Peoples R China

Process monitoring is essential to keep quality consistency and operation safety in the batch process. However, the existence of multiphase, nonlinearity and dynamic features in the batch process makes the batch process monitoring a complicated task. In this work, a multi-layer recurrent neural network in the encoder-decoder structure called batch-wise LSTM-encoder decoder network is proposed to solve the difficulties mentioned above in batch process monitoring. The LSTM-encoder extracts the nonlinear dynamic features in both between and within batch direction, then projects the high dimensional input space to a low dimensional hidden state space. The decoder part regenerates the samples from hidden states. Control statistics H2 and SPE are designed for process monitoring, and the corresponding control limits are estimated by kernel density estimation. A case study on an extensive reference penicillin fermentation dataset suggests that the proposed method can detect the fault samples more effectively than previous methods while keeping the same robustness in normal conditions. (c) 2020 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

关键词： Nonlinear batch processes Process monitoring Multi-layer LSTM encoder-decoder structure Kernel density estimation

来源：评论

学校读者我要写书评

暂无评论

LIGHTWEIGHT MESH CRACK DETECTION ALGORITHM BASED ON EFFICIENT ATTENTION MECHANISM

引用

INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION 2023年第3期38卷 170-179页

作者： Hang, Die Yang, Jianxi Jiang, Shixin Li, Hao Zou, Xiaoxue Tang, Chuncheng Liu, Die Chongqing Jiaotong Univ Sch Informat Sci & Engn Chongqing Peoples R China Chongqing Jiaotong Univ Coll Traff & Transportat Chongqing Peoples R China Chongqing Jiaotong Univ Sch Civil Engn Chongqing Peoples R China

Cracks are one of the most common anomalies in concrete structures, affecting their safety, and thus have received much attention. However, most of the previous studies have focused on regular cracks, while fewer studies have analysed mesh cracks. Due to the characteristics of early appearance and high complexity, mesh cracks cause severe damage to concrete structures. Therefore, the automatic detection of mesh cracks is crucial to the safety of concrete structures. As mesh cracks consist of many fine branches, which can cause discontinuous results, this paper proposes a lightweight mesh crack detection model (MCM-Net) based on an efficient attention mechanism. The proposed network adopts an encoder-decoder structure and introduces improved efficient channel attention that assigns high weights to crack pixels. The introduction of lightweight convolutional modules into the proposed network reduces the computational cost, while the superposition of max -pooling and mean-pooling enables the extraction of more minutiae pixels. The proposed network is verified by experiments on the crack-detection (CD) and bridge-crack-image (BCI) datasets. The experimental results show that the proposed network can improve the stability and computational efficiency of mesh crack detection.

关键词： Mesh crack detection encoder-decoder structure lightweight convolutional module efficient channel attention max-pooling mean-pooling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：