Semantic segmentation of airborne laser scanning (ALS) point clouds remains a challenging task due to the complexity and diversity of 3-D scenes in the real world. Currently, most deep learning-based airborne LiDAR po...
详细信息
Semantic segmentation of airborne laser scanning (ALS) point clouds remains a challenging task due to the complexity and diversity of 3-D scenes in the real world. Currently, most deep learning-based airborne LiDAR point cloud segmentation methods prioritize designing local feature extraction operators while overlooking the long-range dependencies among neighborhoods and the inherently diverse properties of point cloud data. To address these issues, this article introduces a dual-attention and elevation-aware airborne LiDAR point cloud semantic segmentation network (DAEA-Net) built upon an encoding-decoding architecture. First, we develop a cross multiple anti-affine attention (CMAAA) module that effectively captures global contextual information across different neighborhoods through interactive learning of multiple features. Second, we introduce an elevation awareness (EA) module that uses normal vectors to establish a geometric similarity discriminant for each neighboring point. It incorporates an autoencoder architecture to fuse elevation information, enhancing the horizontal structural dissimilarity between objects of similar height while enriching the representation of elevation data. Additionally, to compensate for the potential information loss in the encoding-decoding hierarchical structure, we design a lightweight U-global attention (UGA) module to link decoding and encoding hierarchical levels. It merges features of different resolutions and levels during downsampling and upsampling through pooling while utilizing the self-attention mechanism to enhance the network's global expression capability. The proposed DAEA-Net enhances ALS semantic segmentation performance by enabling interactive learning of multiple features and effectively representing elevation information. Extensive experiments conducted on two datasets demonstrate that our method delivers superior semantic segmentation performance compared to several existing advanced techniques.
Deep convolutional neural networks have demonstrated significant advancements in pavement crack detection. Nevertheless, challenges persist in achieving satisfactory performance due to discontinuous crack edges and lo...
详细信息
Deep convolutional neural networks have demonstrated significant advancements in pavement crack detection. Nevertheless, challenges persist in achieving satisfactory performance due to discontinuous crack edges and low background contrast. This paper presents Att-SegCrack, an enhanced encoder-decoder network that addresses these limitations through three key components. First, a simple yet effective feature fusion scheme restores crack details by bilinearly up-sampling encoder features and integrating them with outputs from the penultimate decoder layer, which subsequently serves as input to the final decoding layer. Second, dilated convolutions expand the receptive field to capture comprehensive contextual information for complete crack profiles. Third, the convolutional block attention module enhances crack-background differentiation in low-level features. Evaluations on two benchmark datasets (Crack500 and DeepCrack) demonstrate that our method outperforms other state-of-the-art methods in crack detection performance.
Ship detection in the synthetic aperture radar (SAR) image is of great significance in the fields of military and coastal defense. Most ship detection methods are designed based on the object detection framework, whic...
详细信息
Ship detection in the synthetic aperture radar (SAR) image is of great significance in the fields of military and coastal defense. Most ship detection methods are designed based on the object detection framework, which can only provide the vertices' coordinates of the bounding box covering the ship targets but cannot provide more detailed contour information. Target segmentation can further explore the shape and edge information of the objects, which can be used as a blazing novel means for automatic object detection. In this letter, a 3-D atrous encoder-decoder neural network with global attention modules (GAM-EDNet) is proposed to achieve ship segmentation in SAR images. The encoder-decoder structure with atrous convolution is developed as the network body to fully exploit the structural information of the ship targets with various sizes. To increase the structural information of the single-polarization SAR images, a 3-D image cube is designed as the input of the GAM-EDNet. A global attention module is proposed to further improve the segmentation performance by integrating the high-level semantic features with the low-level location features. Besides, an SAR ship segmentation dataset (SAR-HR4) is built to evaluate the segmentation performance, and the experimental results show that the proposed GAM-EDNet achieves better performance than other state-of-the-art methods.
Oral English instruction plays a pivotal role in educational endeavors. The emergence of online teaching in response to the epidemic has created an urgent demand for a methodology to evaluate and monitor oral English ...
详细信息
Oral English instruction plays a pivotal role in educational endeavors. The emergence of online teaching in response to the epidemic has created an urgent demand for a methodology to evaluate and monitor oral English instruction. In the post-epidemic era, distance learning has become indispensable for educational pursuits. Given the distinct teaching modality and approach of oral English instruction, it is imperative to explore an intelligent scoring technique that can effectively oversee the content of English teaching. With this objective in mind, we have devised a scoring approach for oral English instruction based on multi-modal perception utilizing the Internet of Things (IoT). Initially, a trained convolutional neural network (CNN) model is employed to extract and quantify visual information and audio features from the IoT, reducing them to a fixed dimension. Subsequently, an external attention model is proposed to compute spoken English and image characteristics. Lastly, the content of English instruction is classified and graded based on the quantitative attributes of oral dialogue. Our findings illustrate that our scoring model for oral English instruction surpasses others, achieving the highest rankings and an accuracy of 88.8%, outperforming others by more than 2%.
This paper presents an AI(Artificial Intelligence)-powered method for enhancing digital creative design through image stylization. To achieve this, we introduce the Content-Style Alignment Module (CSAM), which include...
详细信息
This paper presents an AI(Artificial Intelligence)-powered method for enhancing digital creative design through image stylization. To achieve this, we introduce the Content-Style Alignment Module (CSAM), which includes the Dual-Stream Content-Style Processing Block (DS-CSPB), Content-Style Matching Attention Block (CS-MAB), and Content-Style Space-Aware Interpolation Block (CS-SAIB). DS-CSPB removes style information from content descriptors using whitening transformation while preserving semantic structures. CS-MAB reorganizes each content descriptor with its most relevant style descriptor, ensuring optimal style adaptation for content semantics. CS-SAIB aligns content and style descriptors in the same space, enabling diverse semantic distributions in content images to match various style patterns. Moreover, we introduce the Multifaceted Optimization Loss (MOL). This loss comprises multiple components: The relaxed Earth Mover Distance (rEMD) loss enhances color and texture distributions on content images. The Moment Matching (MM) loss reduces visual artifacts caused by cosine distance. The differentiable Color Histogram (CH) loss efficiently addresses color blending issues, preserving image naturalness. The content loss ensures no significant deformation or distortion during stylization. The reconstruction loss constrains all encoder-decoder features to the VGG feature space, maintaining shared spaces between content and style descriptors. We conducted extensive comparative and ablation experiments, which demonstrated superior performance in image stylization, resulting in high-quality stylized images. Additionally, we provide a comprehensive review of current research in image stylization, effectively bridging the gap in this area.
Transformer has been widely applied in image processing tasks as a substitute for convolutional neural networks (CNNs) for feature extraction due to its superiority in global context modeling and flexibility in model ...
详细信息
Transformer has been widely applied in image processing tasks as a substitute for convolutional neural networks (CNNs) for feature extraction due to its superiority in global context modeling and flexibility in model generalization. However, the existing transformer-based methods for semantic segmentation of remote sensing (RS) images are still with several limitations, which can be summarized into two main aspects: 1) the transformer encoder is generally combined with CNN-based decoder, leading to inconsistency in feature representations;and 2) the strategies for global and local context information utilization are not sufficiently effective. Therefore, in this article, a global-local transformer segmentor (GLOTS) framework is proposed for the semantic segmentation of RS images to acquire consistent feature representations by adopting transformers for both encoding and decoding, in which a masked image modeling (MIM) pretrained transformer encoder is adopted to learn semantic-rich representations of input images and a multiscale global-local transformer decoder is designed to fully exploit the global and local features. Specifically, the transformer decoder uses a feature separation-aggregation module (FSAM) to utilize the feature adequately at different scales and adopts a global-local attention module (GLAM) containing global attention block (GAB) and local attention block (LAB) to capture the global and local context information, respectively. Furthermore, a learnable progressive upsampling strategy (LPUS) is proposed to restore the resolution progressively, which can flexibly recover the fine-grained details in the upsampling process. The experiment results on the three benchmark RS datasets demonstrate that the proposed GLOTS is capable of achieving better performance with some state-of-the-art methods, and the superiority of the proposed framework is also verified by ablation studies. The code will be available at https://***/lyhnsn/GLOTS.
Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks ...
详细信息
Image restoration is one of the most important computer vision tasks, aiming at recovering high-quality images from degraded or low-quality observations. The restoration methods based on convolutional neural networks (CNNs) have achieved attractive performance, however, as convolutions only intake local information, CNN-based methods have limitations in modeling objects in long ranges and extracting global information. In addition, existing one-stage methods damage the performance due to lacking diversified receptive fields. In this paper, we propose a multi-stage cascaded transformer architecture for image restoration. Firstly, the Swin transformer based encoder relying on self-attention is used to improve the modeling ability for long-range objects and outputs hierarchical multi-level semantic features. Then, a shape perceiving module is designed and embedded in the decoder to enhance the representation of irregular objects, Moreover, a multi-stage cascaded encoder-decoder architecture possessing diversified receptive fields is proposed to progressively obtain fine restoration results and thus boost the performance. We conduct extensive experiments, including image deraining, underwater image enhancement, near infrared image colorization and low-light image enhancement. The results show that our proposed method can achieve comparable or better performance than state-of-the-art methods while with less training and inference costs. (c) 2022 Published by Elsevier B.V.
In the stereo matching networks based on deep learning, current cost aggregation networks lack the means to aggregate cost volume to the utmost extent. Therefore, different from the standard encoder-decoder structures...
详细信息
In the stereo matching networks based on deep learning, current cost aggregation networks lack the means to aggregate cost volume to the utmost extent. Therefore, different from the standard encoder-decoder structures, we propose an attention aggregation encoder-decoder network framework for stereo matching that contains three modules. Specifically, we design a sub-branch and cross-stage aggregation encoding module, which aggregate context information of different sub-branches and cross-stages to achieve the mutual utilization of different deep cost volumes. Meanwhile, we introduce a three-dimensional attention recoding module to obtain the robust discriminative cost volume through recalibrating the high-level semantic information of the sub-branches. In addition, we construct a stepwise aggregation decoding module to decode the cost volume via the stepwise fusion upsampling strategy, which further enhances the learning ability of the network model. The experimental results on Scene Flow and KITTI benchmark datasets show that the proposed network framework is superior to other similar methods in aggregating information.
Process monitoring is essential to keep quality consistency and operation safety in the batch process. However, the existence of multiphase, nonlinearity and dynamic features in the batch process makes the batch proce...
详细信息
Process monitoring is essential to keep quality consistency and operation safety in the batch process. However, the existence of multiphase, nonlinearity and dynamic features in the batch process makes the batch process monitoring a complicated task. In this work, a multi-layer recurrent neural network in the encoder-decoder structure called batch-wise LSTM-encoderdecoder network is proposed to solve the difficulties mentioned above in batch process monitoring. The LSTM-encoder extracts the nonlinear dynamic features in both between and within batch direction, then projects the high dimensional input space to a low dimensional hidden state space. The decoder part regenerates the samples from hidden states. Control statistics H2 and SPE are designed for process monitoring, and the corresponding control limits are estimated by kernel density estimation. A case study on an extensive reference penicillin fermentation dataset suggests that the proposed method can detect the fault samples more effectively than previous methods while keeping the same robustness in normal conditions. (c) 2020 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
Cracks are one of the most common anomalies in concrete structures, affecting their safety, and thus have received much attention. However, most of the previous studies have focused on regular cracks, while fewer stud...
详细信息
Cracks are one of the most common anomalies in concrete structures, affecting their safety, and thus have received much attention. However, most of the previous studies have focused on regular cracks, while fewer studies have analysed mesh cracks. Due to the characteristics of early appearance and high complexity, mesh cracks cause severe damage to concrete structures. Therefore, the automatic detection of mesh cracks is crucial to the safety of concrete structures. As mesh cracks consist of many fine branches, which can cause discontinuous results, this paper proposes a lightweight mesh crack detection model (MCM-Net) based on an efficient attention mechanism. The proposed network adopts an encoder-decoder structure and introduces improved efficient channel attention that assigns high weights to crack pixels. The introduction of lightweight convolutional modules into the proposed network reduces the computational cost, while the superposition of max -pooling and mean-pooling enables the extraction of more minutiae pixels. The proposed network is verified by experiments on the crack-detection (CD) and bridge-crack-image (BCI) datasets. The experimental results show that the proposed network can improve the stability and computational efficiency of mesh crack detection.
暂无评论