Synthetic Aperture Radar (SAR) image segmentation is an important step in SAR image interpretation. Common Patch-based methods treat all the pixels within the patch as a single category and do not take the label consi...
详细信息
ISBN:
(纸本)9783030394318;9783030394301
Synthetic Aperture Radar (SAR) image segmentation is an important step in SAR image interpretation. Common Patch-based methods treat all the pixels within the patch as a single category and do not take the label consistency between neighbor patches into consideration, which makes the segmentation results less accurate. In this paper, we use an encoder-decoder network to conduct pixel-wise segmentation. Then, in order to make full use of the contextual information between patches, we use fully-connected conditional random field to optimize the combined probability map output from encoder-decoder network. The testing results on our SAR data set shows that our method can effectively maintain contextual information of pixels and achieve better segmentation results.
With the increasing growth of road infrastructure in recent decades, road surface damage is becoming more prevalent. The rapid advance of neural networks and their intelligent technologies can scale up efforts to help...
详细信息
ISBN:
(纸本)9798350334449
With the increasing growth of road infrastructure in recent decades, road surface damage is becoming more prevalent. The rapid advance of neural networks and their intelligent technologies can scale up efforts to help deal with this problem. One of the technologies that can be applied in this context is computer vision with semantic segmentation, which can help automatically identify road surface damage. While a naive implementation of semantic segmentation often sacrifices running time and speed performance, in this study, we propose the lightweight encoder-decoder network model to overcome this issue. Numerical experiments show that this method gives us 110 minutes running time and is able to run at 26 fps, which can boost nearly 2x than the baseline model's running time and speed performance for automated road surface damage identification tasks and can be extended to automatically measure the area of road damage and provide more meaningful information for decision-makers.
In recent years, deep learning models have been employed for speech enhancement. Most of the existing methods based on deep learning use fully Convolutional Neural network (CNN) to capture time-frequency information o...
详细信息
In recent years, deep learning models have been employed for speech enhancement. Most of the existing methods based on deep learning use fully Convolutional Neural network (CNN) to capture time-frequency information of input features. Compared with CNNs, it is more reasonable to use Long Short-Term Memory (LSTM) network to capture contextual information on the time axis of features. However, the computation load of a fully LSTM structure is heavy. To balance the model complexity and the capability of capturing time-frequency features, we present an LSTM-Convolutional-BLSTM encoder-decoder (LCLED) network for speech enhancement. The LCLED additionally incorporates transpose convolution and skip connection. The key idea is that we use two LSTM parts and convolutional layers to model the contextual information and frequency dimension features, respectively. Furthermore, in order to achieve a higher quality of enhanced speech, a priori Signal-to-Noise Ratio (SNR) is applied as the learning target of LCLED. The Minimum Mean-Square Error (MMSE) approach is used for postprocessing. The results indicate that the proposed LCLED not only reduces the model complexity and training time but also improves the quality and the intelligibility of enhanced speech compared with the fully LSTM structure. (C) 2020 Elsevier Ltd. All rights reserved.
Purpose: As a portable and radiation-free imaging modality, ultrasound can be easily used to image various types of tissue structures. It is important to develop a method which supports the multi-type ultrasound image...
详细信息
Purpose: As a portable and radiation-free imaging modality, ultrasound can be easily used to image various types of tissue structures. It is important to develop a method which supports the multi-type ultrasound images co-segmentation. However, state-of-the-art ultrasound segmentation methods commonly only focus on the single type images or ignore the type-aware information. Methods: To solve the above problem, this work proposes a novel type-aware encoder-decoder network (TypeSeg) for the multi-type ultrasound images co-segmentation. First, we develop a type-aware metric learning module to find an optimum latent feature space where the ultrasound images of the same types are close and that of the different types are separated by a certain margin. Second, depending on the extracted features, a decision module decides whether the input ultrasound images have the common tissue type or not, and the encoder-decoder network produces a segmentation mask accordingly. Results: We evaluate the performance of the proposed TypeSeg model on the ultrasound dataset that contains four types of tissues. The proposed TypeSeg model achieves the overall best results with the mean IOU score of 87.51% +/- 3.93% for the multi-type ultrasound images. Conclusion: The experimental results indicate that the proposed method outperforms all the compared state-of-the-art algorithms for the multi-type ultrasound images co-segmentation task. (C) 2021 Elsevier B.V. All rights reserved.
Automatic ultrasound image segmentation plays an important role in early diagnosis of human diseases. This paper introduces a novel and efficient encoder-decoder network, called Lightweight Attention encoder-decoder N...
详细信息
Automatic ultrasound image segmentation plays an important role in early diagnosis of human diseases. This paper introduces a novel and efficient encoder-decoder network, called Lightweight Attention encoder-decoder network (LAEDNet), for automatic ultrasound image segmentation. In contrast to previous encoder-decoder networks that involve complicated architecture with numerous parameters, our LAEDNet adopts lightweight version of EfficientNet as encoder. On the other hand, a Lightweight Residual Squeeze-and-Excitation (LRSE) block is employed in decoder. To achieve trade-off between segmentation accuracy and implementing efficiency, we also present a family of models, from light to heavy (denoted as LAEDNet-S, LAEDNet-M, and LAEDNet-L, respectively), with varying lightweight version of EfficientNet backbones. To evaluate LAEDNet, we have conducted extensive experiments on Brachial Plexus Dataset (BP), Breast Ultrasound Images Dataset (BUSI), and Head Circumference Ultrasound Images Dataset (HCUS), where ultrasound images are suffered from high noise, blurred borders and low contrast. The experiments show that, compared with U-Net and its variants, e.g., M-Net, U-Net++ and TransUNet, our LAEDNet achieves better results in terms of Dice Coefficient (DSC) and running speed. Particularly, LAEDNet-M only has 10.75M model parameters with 40.7 FPS, yet obtaining 73.0%, 73.8% and 91.3% DSC on BP, BUSI and HCUS datasets, respectively.
Wind energy is a clean energy source that is characterised by significant uncertainty. The electricity generated from wind power also exhibits strong unpredictability, which when integrated can have a substantial impa...
详细信息
Wind energy is a clean energy source that is characterised by significant uncertainty. The electricity generated from wind power also exhibits strong unpredictability, which when integrated can have a substantial impact on the security of the power grid. In the context of integrating wind power into the grid, accurate prediction of wind power generation is crucial in order to minimise damage to the grid system. This paper proposes a novel composite model (MLL-MPFLA) that combines a multilayer perceptron (MLP) and an LSTM-based encoder-decoder network for short-term prediction of wind power generation. In this model, the MLP first extracts multidimensional features from wind power data. Subsequently, an LSTM-based encoder-decoder network explores the temporal characteristics of the data in depth, combining multidimensional features and temporal features for effective prediction. During decoding, an improved focused linear attention mechanism called multi-point focused linear attention is employed. This mechanism enhances prediction accuracy by weighting predictions from different subspaces. A comparative analysis against the MLP, LSTM, LSTM-Attention-LSTM, LSTM-Self_Attention-LSTM, and CNN-LSTM-Attention models demonstrates that the proposed MLL-MPFLA model outperforms the others in terms of MAE, RMSE, MAPE, and R2, thereby validating its predictive performance.
The extraction of water stream based on synthetic aperture radar (SAR) is of great significance in surface water monitoring, flood monitoring, and the management of water resources. However, in recent years, the resea...
详细信息
The extraction of water stream based on synthetic aperture radar (SAR) is of great significance in surface water monitoring, flood monitoring, and the management of water resources. However, in recent years, the research mainly uses the backscattering feature (BF) to extract water bodies. In this paper, a feature-fused encoder-decoder network was proposed for delineating the water stream more completely and precisely using both the BF and polarimetric feature (PF) from SAR images. Firstly, the standard BFs were extracted and PFs were obtained using model-based decomposition. Specifically, the newly model-based decomposition, more suitable for dual-pol SAR images, was selected to acquire three different PFs of surface water stream for the first time. Five groups of candidate feature combinations were formed with two BFs and three PFs. Then, a new feature-fused encoder-decoder network (FFEDN) was developed for mining and fusing both BFs and PFs. Finally, several typical areas were selected to evaluate the performance of different combinations for water stream extraction. To further verify the effectiveness of the proposed method, two machine learning methods and four state-of-the-art deep learning algorithms were utilized for comparison. The experimental results showed that the proposed method using the optimal feature combination achieved the highest accuracy, with a precision of 95.21%, recall of 91.79%, intersection over union (IoU) score of 87.73%, overall accuracy (OA) of 93.35%, and average accuracy (AA) of 93.41%. The results showed that the performance was higher when BF and PF were combined. In short, in this study, the effectiveness of PFs for water stream extraction was verified and the proposed FFEDN can further improve the accuracy of water stream extraction.
In this work, a new method is proposed that allows the use of a single RGB camera for the real-time detection of objects that could be potential collision sources for Unmanned Aerial Vehicles. For this purpose, a new ...
详细信息
In this work, a new method is proposed that allows the use of a single RGB camera for the real-time detection of objects that could be potential collision sources for Unmanned Aerial Vehicles. For this purpose, a new network with an encoder-decoder architecture has been developed, which allows rapid distance estimation from a single image by performing RGB to depth mapping. Based on a comparison with other existing RGB to depth mapping methods, the proposed network achieved a satisfactory trade-off between complexity and accuracy. With only 6.3 million parameters, it achieved efficiency close to models with more than five times the number of parameters. This allows the proposed network to operate in real time. A special algorithm makes use of the distance predictions made by the network, compensating for measurement inaccuracies. The entire solution has been implemented and tested in practice in an indoor environment using a micro-drone equipped with a front-facing RGB camera. All data and source codes and pretrained network weights are available to download. Thus, one can easily reproduce the results, and the resulting solution can be tested and quickly deployed in practice.
To meet the need for multispectral images having high spatial resolution in practical applications, we propose a dense encoder-decoder network with feedback connections for pan-sharpening. Our network consists of four...
详细信息
To meet the need for multispectral images having high spatial resolution in practical applications, we propose a dense encoder-decoder network with feedback connections for pan-sharpening. Our network consists of four parts. The first part consists of two identical subnetworks, one each to extract features from PAN and MS images, respectively. The second part is an efficient feature-extraction block. We hope that the network can focus on features at different scales, so we propose innovative multiscale feature-extraction blocks that fully extract effective features from networks of various depths and widths by using three multiscale feature-extraction blocks and two long-jump connections. The third part is the feature fusion and recovery network. We are inspired by the work on U-Net network improvements to propose a brand new encodernetwork structure with dense connections that improves network performance through effective connections to encoders and decoders at different scales. The fourth part is a continuous feedback connection operation with overfeedback to refine shallow features, which enables the network to obtain better reconstruction capabilities earlier. To demonstrate the effectiveness of our method, we performed several experiments. Experiments on various satellite datasets show that the proposed method outperforms existing methods. Our results show significant improvements over those from other models in terms of the multiple-target index values used to measure the spectral quality and spatial details of the generated images.
As an indispensable component of intelligent monitoring systems, crowd counting plays a crucial role in many fields, particularly crowd management and control during the COVID-19 pandemic. Despite the promising achiev...
详细信息
As an indispensable component of intelligent monitoring systems, crowd counting plays a crucial role in many fields, particularly crowd management and control during the COVID-19 pandemic. Despite the promising achievements of many methods, crowd scale variations and noise interference in congested crowd scenes remain urgent problems to be solved. In this paper, we propose a novel Hierarchical Scale-aware encoder-decoder network (HSED-Net) for single-image crowd counting to handle scale variations and noise interference, thereby generating high-quality density maps. The HSED-Net is designed as an encoder-decoder architecture, which contains two core networks: Scale-Aware Encoding network (SAEnet) and Multi-path Aggregation Decoding network (MADnet). The SAEnet focuses on extracting rich multi-scale crowd features, which employs cascaded scale-aware encoding branches to collaboratively obtain high-resolution feature representations. During the encoding phase, two adaptive weight generators are proposed to filter the crowd features from different dimensions to resist the interference of noise. Instead of fusing the multi-scale and multi-level features indiscriminately, the MADnet adopts a multi-path adaptive fusion strategy and selectively emphasizes more appropriate features through the spatial and channel guidance modules, further improving the quality of density maps and the robustness of network. Extensive experiments on four challenging datasets have strongly demonstrated the superiority of our HSED-Net.
暂无评论