Low-light images captured under low light or backlight conditions can suffer from different types of degradation such as low visibility, strong noise and color distortion. In this paper, to solve the degradation probl...
详细信息
Low-light images captured under low light or backlight conditions can suffer from different types of degradation such as low visibility, strong noise and color distortion. In this paper, to solve the degradation problem of low -light images, we propose Two-stage Perceptual Enhancement Transformer Network(TPET) for Low-light Image Enhancement by combining the advantages of local spatial perception of convolutional neural network and global spatial perception of transformer. The method is generally divided into two stages: feature extraction stage and detail fusion stage. First, in the feature extraction stage, the encoder composed of transformers performs global feature extraction and expands the receptive field. Since the transformer lacks the ability to capture local features, we introduce a perceptual enhancement module (PEM) to improve the interaction of local and global feature information. Second, between the corresponding encoding and decoding blocks in each layer, a feature fusion block (FFB) is introduced to compensate the feature information at different scales to improve the reusability of features and enhance the stability of the network. In addition, between the two stages, the local information features are redistributed and the network supervision capability is improved by introducing a self-calibration module (SCM). In the detail fusion stage, in order to further preserve the details of textural features of the image, we designed a detail enhancement unit (DEU) for recovering high-resolution enhanced images. Through qualitative comparison and quantitative analysis, our method outperforms other low-light image enhancement methods in terms of subjective visual effects and objective metrics values.
Pansharpening is a significant branch in the field of remote sensing image processing, the goal of which is to fuse panchromatic (PAN) and multispectral (MS) images through certain rules to generate high-resolution MS...
详细信息
Pansharpening is a significant branch in the field of remote sensing image processing, the goal of which is to fuse panchromatic (PAN) and multispectral (MS) images through certain rules to generate high-resolution MS (HRMS) images. Therefore, how to improve the spatial and spectral resolutions of the fused image is the problem that we need to solve urgently. In this article, a multistage remote sensing image fusion network (MRFNet) is proposed on the basis of in-depth research and exploration on the fusion of the PAN and MS images to obtain a clear fused image that can reflect the ground features more comprehensively and completely. The proposed network consists of three stages that are connected by cross-stage fusion. The first two stages are used to extract the features of the PAN and MS images. The structure of the encoder-decoder and the channel attention module are used to extract the features of the remote sensing image in the channel domain. The third stage is the image reconstruction stage fusing the extracted features with the original image to improve the spatial and spectral resolutions of the fused result. A series of experiments are conducted on the benchmark datasets WorldView II, GF-2, and QuickBird. Qualitative analysis and quantitative comparison show the superiority of MRFNet in visual effects and the values of evaluation indicators.
Semantic segmentation can address the perceived needs of autonomous driving and micro-robots and is one of the challenging tasks in computer vision. From the application point of view, the difficulty faced by semantic...
详细信息
Semantic segmentation can address the perceived needs of autonomous driving and micro-robots and is one of the challenging tasks in computer vision. From the application point of view, the difficulty faced by semantic segmentation is how to satisfy inference speed, network parameters, and segmentation accuracy at the same time. This paper proposes a lightweight multi-dimensional dynamic convolutional network (LMDCNet) for real-time semantic segmentation to address this problem. At the core of our architecture is Multidimensional Dynamic Convolution (MDy-Conv), which uses an attention mechanism and factorial convolution to remain efficient while maintaining remarkable accuracy. Specifically, LMDCNet belongs to an asymmetric network architecture. Therefore, we design an encoder module containing MDy-Conv convolution: MS-DAB. The success of this module is attributed to the use of MDy-Conv convolution, which increases the utilization of local and contextual information of features. Furthermore, we design a decoder module containing a feature pyramid and attention: SC-FP, which performs a multi-scale fusion of features accompanied by feature selection. On the Cityscapes and CamVid datasets, LMDCNet achieves accuracies of 73.8 mIoU and 69.6 mIoU at 71.2 FPS and 92.4 FPS, respectively, without pre-training or post-processing. Our designed LMDCNet is trained and inferred only on one 1080Ti GPU. Our experiments show that LMDCNet achieves a good balance between segmentation accuracy and network parameters with only 1.05 M.
Wind power can effectively alleviate the energy crisis. However, its integration into the grid affects power quality and power grid stability. Accurate wind speed prediction is a key factor in the efficient use of win...
详细信息
Wind power can effectively alleviate the energy crisis. However, its integration into the grid affects power quality and power grid stability. Accurate wind speed prediction is a key factor in the efficient use of wind power. Because of its intermittent and nonstationary nature, wind speed forecasting is difficult, and is the topic of much research, especially long-time multistep forecasts. In this paper, the multistep wind speed prediction problem is regarded as a sequence-to-sequence mapping problem, and a multistep wind speed prediction model based on a transformer is proposed. This model is based on an encoder-decoder architecture, where the encoder generates representations of historical wind speed sequences of any length, the decoder generates arbitrarily long future wind speed sequences, and the encoder and decoder are associated by an attention mechanism. At the same time, the encoder and decoder of Transformer are completely based on a multi-head attention mechanism. For easy modeling, a 1-dimensional original wind speed sequence is transformed to a 16-dimensional sequence by ensemble empirical mode decomposition (EEMD), and the multidimensional wind speed data are directly modeled with Transformer. We trained the model with very large-scale (19 years of data) wind speed data averaged at 10-minute intervals, and performed the evaluation over one-year wind speed data. Results show that our one-step forecast model achieved an average mean absolute error (MAE) and root mean square error (RMSE) of 0.167 and 0.221, respectively. To the best of our knowledge, our 3-, 6-, 12-, and 24-hour multistep forecast model achieves a new state of the art in wind speed forecasting, with respective MAEs of 0.243, 0.290, 0.362, and 0.453, and RMSEs of 0.326, 0.401, 0.513, and 0.651. It is believed that performance can be further improved with better model parameter optimization.
In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natur...
详细信息
ISBN:
(纸本)9781450397810
In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.
We propose a semi-supervised singing synthesizer, which is able to learn new voices from audio data only, without any annotations such as phonetic segmentation. Our system is an encoder-decoder model with two encoders...
详细信息
ISBN:
(纸本)9781728176055
We propose a semi-supervised singing synthesizer, which is able to learn new voices from audio data only, without any annotations such as phonetic segmentation. Our system is an encoder-decoder model with two encoders, linguistic and acoustic, and one (acoustic) decoder. In a first step, the system is trained in a supervised manner, using a labeled multi-singer dataset. Here, we ensure that the embeddings produced by both encoders are similar, so that we can later use the model with either acoustic or linguistic input features. To learn a new voice in an unsupervised manner, the pretrained acoustic encoder is used to train a decoder for the target singer. Finally, at inference, the pretrained linguistic encoder is used together with the decoder of the new voice, to produce acoustic features from linguistic input. We evaluate our system with a listening test and show that the results are comparable to those obtained with an equivalent supervised approach.
Routine visual inspection of concrete structures is essential to maintain safe conditions. Therefore, studies of concrete crack segmentation using deep learning methods have been extensively conducted in recent years....
详细信息
Routine visual inspection of concrete structures is essential to maintain safe conditions. Therefore, studies of concrete crack segmentation using deep learning methods have been extensively conducted in recent years. However, insufficient performance remains a major challenge in diverse field-inspection scenarios. In this study, a novel SegCrack model for pixel-level crack segmentation is therefore proposed using a hierarchically structured Transformer encoder to output multiscale features and a top-down pathway with lateral connections to progressively up-sample and fuse features from the deepest layer of the encoder. Furthermore, an online hard example mining strategy was adopted to strengthen the detection of hard samples and improve the model performance. The effect of dataset size on the segmentation performance was then investigated. The results indicated that SegCrack achieved a precision, recall, F1 score, and mean intersection over union of 96.66%, 95.46%, 96.05%, and 92.63%, respectively, using the test set.
In this study, we aimed to develop and assess a hydrological model using a deep learning algorithm for improved water management. Single-output long short-term memory (LSTM SO) and encoder-decoder long short-term memo...
详细信息
In this study, we aimed to develop and assess a hydrological model using a deep learning algorithm for improved water management. Single-output long short-term memory (LSTM SO) and encoder-decoder long short-term memory (LSTM ED) models were developed, and their performances were compared using different input variables. We used water-level and rainfall data from 2018 to 2020 in the Takayama Reservoir (Nara Prefecture, Japan) to train, test, and assess both models. The root-mean-squared error and Nash-Sutcliffe efficiency were estimated to compare the model performances. The results showed that the LSTM ED model had better accuracy. Analysis of water levels and water-level changes presented better results than the analysis of water levels. However, the accuracy of the model was significantly lower when predicting water levels outside the range of the training datasets. Within this range, the developed model could be used for water management to reduce the risk of downstream flooding, while ensuring sufficient water storage for irrigation, because of its ability to determine an appropriate amount of water for release from the reservoir before rainfall events.
Water body segmentation is an important tool for the hydrological monitoring of the Earth. With the rapid development of convolutional neural networks, semantic segmentation techniques have been used on remote sensing...
详细信息
Water body segmentation is an important tool for the hydrological monitoring of the Earth. With the rapid development of convolutional neural networks, semantic segmentation techniques have been used on remote sensing images to extract water bodies. However, some difficulties need to be overcome to achieve good results in water body segmentation, such as complex background, huge scale, water connectivity, and rough edges. In this study, a water body segmentation model (DUPnet) with dense connectivity and multi-scale pyramidal pools is proposed to rapidly and accurately extract water bodies from Gaofen satellite and Landsat 8 OLI (Operational Land Imager) images. The proposed method includes three parts: (1) a multi-scale spatial pyramid pooling module (MSPP) is introduced to combine shallow and deep features for small water bodies and to compensate for the feature loss caused by the sampling process;(2) dense blocks are used to extract more spatial features to DUPnet's backbone, increasing feature propagation and reuse;(3) a regression loss function is proposed to train the network to deal with the unbalanced dataset caused by small water bodies. The experimental results show that the F1, MIoU, and FWIoU of DUPnet on the 2020 Gaofen dataset are 97.67%, 88.17%, and 93.52%, respectively, and on the Landsat River dataset, they are 96.52%, 84.72%, 91.77%, respectively.
Face parsing refers to the labeling of each facial component in a face image and has been employed in facial stimulation, expression recognition, and makeup use, effectively providing a basis for further analysis, com...
详细信息
Face parsing refers to the labeling of each facial component in a face image and has been employed in facial stimulation, expression recognition, and makeup use, effectively providing a basis for further analysis, computations, animation, modification, and numerous other applications. Although existing face parsing methods have demonstrated good performance, they fail to extract rich features and recover accurate segmentation maps, particularly for faces with high variations in expression and sufficiently similar appearances. Moreover, these approaches neglect the semantic gaps and dependencies between facial categories and their boundaries. To address these drawbacks, we propose an efficient dilated convolution network with different aspect ratios to attain accurate face parsing of the output by applying the feature extraction capability. The proposed network-structured multiscale dilated encoder-decoder convolution model obtains rich component information and efficiently improves the capture of global information by obtaining low- and high-level semantic features. To achieve a delicate parsing output of the face components along the borders and analyze the connections between the face categories and their border edges, the semantic edge map is learned using a conditional random field, which aims to distinguish border and non-border pixels during the modeling. We conducted experiments using three well-known publicly available face databases. The recorded results demonstrate the high accuracy and capacity of the proposed method in comparison to previous state-of-art methods. Our proposed model achieved a mean accuracy of 90% on the CelebAMask-HQdataset for the category case and 81.43% for the accessory case, and achieved accuracies of 91.58% and 92.44% on the HELEN and LaPa datasets, respectively, thereby demonstrating its effectiveness. (C) 2022 The Author(s). Published by Elsevier B.V.
暂无评论