检索结果-内蒙古大学图书馆

TPET: Two-stage Perceptual Enhancement Transformer Network for Low-light Image Enhancement

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2022年第0期116卷

作者： Cui, Hengshuai Li, Jinjiang Hua, Zhen Fan, Linwei Shandong Technol & Business Univ Future Intelligent Comp Coinnovat Ctr Shandong Coll & Univ Sch Comp Sci & Technol Yantai Peoples R China Shandong Univ Finance & Econ Sch Comp Sci & Technol Jinan Peoples R China

Low-light images captured under low light or backlight conditions can suffer from different types of degradation such as low visibility, strong noise and color distortion. In this paper, to solve the degradation problem of low -light images, we propose Two-stage Perceptual Enhancement Transformer Network(TPET) for Low-light Image Enhancement by combining the advantages of local spatial perception of convolutional neural network and global spatial perception of transformer. The method is generally divided into two stages: feature extraction stage and detail fusion stage. First, in the feature extraction stage, the encoder composed of transformers performs global feature extraction and expands the receptive field. Since the transformer lacks the ability to capture local features, we introduce a perceptual enhancement module (PEM) to improve the interaction of local and global feature information. Second, between the corresponding encoding and decoding blocks in each layer, a feature fusion block (FFB) is introduced to compensate the feature information at different scales to improve the reusability of features and enhance the stability of the network. In addition, between the two stages, the local information features are redistributed and the network supervision capability is improved by introducing a self-calibration module (SCM). In the detail fusion stage, in order to further preserve the details of textural features of the image, we designed a detail enhancement unit (DEU) for recovering high-resolution enhanced images. Through qualitative comparison and quantitative analysis, our method outperforms other low-light image enhancement methods in terms of subjective visual effects and objective metrics values.

关键词： Low-light Image Enhancement Transformer encoder-decoder Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Attention-Based Multistage Fusion Network for Remote Sensing Image Pansharpening

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2022年 60卷

作者： Zhang, Wanwan Li, Jinjiang Hua, Zhen Shandong Technol & Business Univ Sch Informat & Elect Engn Yantai 264005 Peoples R China ICT YANTAD Inst Network Technol Yantai 264005 Peoples R China Shandong Technol & Business Univ Sch Comp Sci & Technol Yantai 264005 Peoples R China

Pansharpening is a significant branch in the field of remote sensing image processing, the goal of which is to fuse panchromatic (PAN) and multispectral (MS) images through certain rules to generate high-resolution MS (HRMS) images. Therefore, how to improve the spatial and spectral resolutions of the fused image is the problem that we need to solve urgently. In this article, a multistage remote sensing image fusion network (MRFNet) is proposed on the basis of in-depth research and exploration on the fusion of the PAN and MS images to obtain a clear fused image that can reflect the ground features more comprehensively and completely. The proposed network consists of three stages that are connected by cross-stage fusion. The first two stages are used to extract the features of the PAN and MS images. The structure of the encoder-decoder and the channel attention module are used to extract the features of the remote sensing image in the channel domain. The third stage is the image reconstruction stage fusing the extracted features with the original image to improve the spatial and spectral resolutions of the fused result. A series of experiments are conducted on the benchmark datasets WorldView II, GF-2, and QuickBird. Qualitative analysis and quantitative comparison show the superiority of MRFNet in visual effects and the values of evaluation indicators.

关键词： Pansharpening Feature extraction Spatial resolution Remote sensing Image resolution Sensors Image reconstruction Attention encoder-decoder multispectral (MS) images panchromatic (PAN) images pansharpening spatial resolution spectral resolution

来源：评论

学校读者我要写书评

暂无评论

A lightweight multi-dimension dynamic convolutional network for real-time semantic segmentation

引用

FRONTIERS IN NEUROROBOTICS 2022年 16卷 1075520页

作者： Zhang, Chunyu Xu, Fang Wu, Chengdong Xu, Chenglong Northeastern Univ Fac Robot Sci & Engn Shenyang Peoples R China Shenyang Siasun Robot & Automat Co Ltd Shenyang Peoples R China Harbin Engn Univ Coll Intelligent Syst Sci & Engn Harbin Peoples R China

Semantic segmentation can address the perceived needs of autonomous driving and micro-robots and is one of the challenging tasks in computer vision. From the application point of view, the difficulty faced by semantic segmentation is how to satisfy inference speed, network parameters, and segmentation accuracy at the same time. This paper proposes a lightweight multi-dimensional dynamic convolutional network (LMDCNet) for real-time semantic segmentation to address this problem. At the core of our architecture is Multidimensional Dynamic Convolution (MDy-Conv), which uses an attention mechanism and factorial convolution to remain efficient while maintaining remarkable accuracy. Specifically, LMDCNet belongs to an asymmetric network architecture. Therefore, we design an encoder module containing MDy-Conv convolution: MS-DAB. The success of this module is attributed to the use of MDy-Conv convolution, which increases the utilization of local and contextual information of features. Furthermore, we design a decoder module containing a feature pyramid and attention: SC-FP, which performs a multi-scale fusion of features accompanied by feature selection. On the Cityscapes and CamVid datasets, LMDCNet achieves accuracies of 73.8 mIoU and 69.6 mIoU at 71.2 FPS and 92.4 FPS, respectively, without pre-training or post-processing. Our designed LMDCNet is trained and inferred only on one 1080Ti GPU. Our experiments show that LMDCNet achieves a good balance between segmentation accuracy and network parameters with only 1.05 M.

关键词： semantic segmentation lightweight network dynamic convolution encoder-decoder multi-dimension convolution

来源：评论

学校读者我要写书评

暂无评论

Multistep short-term wind speed forecasting using transformer

引用

ENERGY 2022年第PartA期261卷

作者： Wu, Huijuan Meng, Keqilao Fan, Daoerji Zhang, Zhanqiang Liu, Qing Inner Mongolia Univ Technol Coll Energy & Power Engn Aimin St 49 Hohhot 010051 Peoples R China Minist Educ Key Lab Wind Energy & Solar Energy Technol Aimin St 49 Hohhot 010051 Peoples R China Inner Mongolia Autonomous Reg Wind Power Technol & Aimin St 49 Hohhot 010051 Peoples R China Inner Mongolia Univ Coll Elect Informat Engn Coll Rd 235 Hohhot 010021 Peoples R China

Wind power can effectively alleviate the energy crisis. However, its integration into the grid affects power quality and power grid stability. Accurate wind speed prediction is a key factor in the efficient use of wind power. Because of its intermittent and nonstationary nature, wind speed forecasting is difficult, and is the topic of much research, especially long-time multistep forecasts. In this paper, the multistep wind speed prediction problem is regarded as a sequence-to-sequence mapping problem, and a multistep wind speed prediction model based on a transformer is proposed. This model is based on an encoder-decoder architecture, where the encoder generates representations of historical wind speed sequences of any length, the decoder generates arbitrarily long future wind speed sequences, and the encoder and decoder are associated by an attention mechanism. At the same time, the encoder and decoder of Transformer are completely based on a multi-head attention mechanism. For easy modeling, a 1-dimensional original wind speed sequence is transformed to a 16-dimensional sequence by ensemble empirical mode decomposition (EEMD), and the multidimensional wind speed data are directly modeled with Transformer. We trained the model with very large-scale (19 years of data) wind speed data averaged at 10-minute intervals, and performed the evaluation over one-year wind speed data. Results show that our one-step forecast model achieved an average mean absolute error (MAE) and root mean square error (RMSE) of 0.167 and 0.221, respectively. To the best of our knowledge, our 3-, 6-, 12-, and 24-hour multistep forecast model achieves a new state of the art in wind speed forecasting, with respective MAEs of 0.243, 0.290, 0.362, and 0.453, and RMSEs of 0.326, 0.401, 0.513, and 0.651. It is believed that performance can be further improved with better model parameter optimization.

关键词： Wind speed forecasting Transformer EEMD Multistep encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention 22

Text Detection and Recognition in Natural Scenes Based on TW...

引用

Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing

作者： Baicun Guo Computer Science North China Electric Power University China

ISBN: (纸本)9781450397810

In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.

关键词： two-dimention attention aggregation encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

SEMI-SUPERVISED LEARNING FOR SINGING SYNTHESIS TIMBRE

SEMI-SUPERVISED LEARNING FOR SINGING SYNTHESIS TIMBRE

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Bonada, Jordi Blaauw, Merlijn Univ Pompeu Fabra Mus Technol Grp Barcelona Spain

ISBN: (纸本)9781728176055

We propose a semi-supervised singing synthesizer, which is able to learn new voices from audio data only, without any annotations such as phonetic segmentation. Our system is an encoder-decoder model with two encoders, linguistic and acoustic, and one (acoustic) decoder. In a first step, the system is trained in a supervised manner, using a labeled multi-singer dataset. Here, we ensure that the embeddings produced by both encoders are similar, so that we can later use the model with either acoustic or linguistic input features. To learn a new voice in an unsupervised manner, the pretrained acoustic encoder is used to train a decoder for the target singer. Finally, at inference, the pretrained linguistic encoder is used together with the decoder of the new voice, to produce acoustic features from linguistic input. We evaluate our system with a listening test and show that the results are comparable to those obtained with an equivalent supervised approach.

关键词： Singing synthesis semi-supervised encoder-decoder autoregressive convolutional

来源：评论

学校读者我要写书评

暂无评论

Automatic concrete crack segmentation model based on transformer

引用

AUTOMATION IN CONSTRUCTION 2022年 139卷

作者： Wang, Wenjun Su, Chao Hohai Univ Coll Water Conservancy & Hydropower Engn Nanjing 210098 Peoples R China

Routine visual inspection of concrete structures is essential to maintain safe conditions. Therefore, studies of concrete crack segmentation using deep learning methods have been extensively conducted in recent years. However, insufficient performance remains a major challenge in diverse field-inspection scenarios. In this study, a novel SegCrack model for pixel-level crack segmentation is therefore proposed using a hierarchically structured Transformer encoder to output multiscale features and a top-down pathway with lateral connections to progressively up-sample and fuse features from the deepest layer of the encoder. Furthermore, an online hard example mining strategy was adopted to strengthen the detection of hard samples and improve the model performance. The effect of dataset size on the segmentation performance was then investigated. The results indicated that SegCrack achieved a precision, recall, F1 score, and mean intersection over union of 96.66%, 95.46%, 96.05%, and 92.63%, respectively, using the test set.

关键词： Concrete crack Pixel-wise segmentation Visual transformer Self-attention encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Development and Assessment of Water-Level Prediction Models for Small Reservoirs Using a Deep Learning Algorithm

引用

WATER 2022年第1期14卷 55-55页

作者： Kusudo, Tsumugu Yamamoto, Atsushi Kimura, Masaomi Matsuno, Yutaka Kindai Univ Fac Agr Dept Environm Management 3327-204 Nakamachi Nara 6318505 Japan

In this study, we aimed to develop and assess a hydrological model using a deep learning algorithm for improved water management. Single-output long short-term memory (LSTM SO) and encoder-decoder long short-term memory (LSTM ED) models were developed, and their performances were compared using different input variables. We used water-level and rainfall data from 2018 to 2020 in the Takayama Reservoir (Nara Prefecture, Japan) to train, test, and assess both models. The root-mean-squared error and Nash-Sutcliffe efficiency were estimated to compare the model performances. The results showed that the LSTM ED model had better accuracy. Analysis of water levels and water-level changes presented better results than the analysis of water levels. However, the accuracy of the model was significantly lower when predicting water levels outside the range of the training datasets. Within this range, the developed model could be used for water management to reduce the risk of downstream flooding, while ensuring sufficient water storage for irrigation, because of its ability to determine an appropriate amount of water for release from the reservoir before rainfall events.

关键词： reservoir-water level long short-term memory encoder-decoder flood control irrigation water-management tool

来源：评论

学校读者我要写书评

暂无评论

DUPnet: Water Body Segmentation with Dense Block and Multi-Scale Spatial Pyramid Pooling for Remote Sensing Images

引用

REMOTE SENSING 2022年第21期14卷 5567页

作者： Liu, Zhiheng Chen, Xuemei Zhou, Suiping Yu, Hang Guo, Jianhua Liu, Yanming Xidian Univ Sch Aerosp Sci & Technol Xian 710026 Peoples R China Tech Univ Munich TUM Dept Aerosp & Geodesy Data Sci Earth Observat D-80333 Munich Germany

Water body segmentation is an important tool for the hydrological monitoring of the Earth. With the rapid development of convolutional neural networks, semantic segmentation techniques have been used on remote sensing images to extract water bodies. However, some difficulties need to be overcome to achieve good results in water body segmentation, such as complex background, huge scale, water connectivity, and rough edges. In this study, a water body segmentation model (DUPnet) with dense connectivity and multi-scale pyramidal pools is proposed to rapidly and accurately extract water bodies from Gaofen satellite and Landsat 8 OLI (Operational Land Imager) images. The proposed method includes three parts: (1) a multi-scale spatial pyramid pooling module (MSPP) is introduced to combine shallow and deep features for small water bodies and to compensate for the feature loss caused by the sampling process;(2) dense blocks are used to extract more spatial features to DUPnet's backbone, increasing feature propagation and reuse;(3) a regression loss function is proposed to train the network to deal with the unbalanced dataset caused by small water bodies. The experimental results show that the F1, MIoU, and FWIoU of DUPnet on the 2020 Gaofen dataset are 97.67%, 88.17%, and 93.52%, respectively, and on the Landsat River dataset, they are 96.52%, 84.72%, 91.77%, respectively.

关键词： encoder-decoder multi-scale spatial pyramid pooling dense connection regression loss remote sensing water body semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Detailed feature extraction network-based fine-grained face segmentation

引用

KNOWLEDGE-BASED SYSTEMS 2022年第0期250卷

作者： Umirzakova, Sabina Whangbo, Taeg Keun Gachon Univ Dept IT Convergence Engn Seongnam South Korea Gachon Univ Dept Comp Sci Seongnam South Korea

Face parsing refers to the labeling of each facial component in a face image and has been employed in facial stimulation, expression recognition, and makeup use, effectively providing a basis for further analysis, computations, animation, modification, and numerous other applications. Although existing face parsing methods have demonstrated good performance, they fail to extract rich features and recover accurate segmentation maps, particularly for faces with high variations in expression and sufficiently similar appearances. Moreover, these approaches neglect the semantic gaps and dependencies between facial categories and their boundaries. To address these drawbacks, we propose an efficient dilated convolution network with different aspect ratios to attain accurate face parsing of the output by applying the feature extraction capability. The proposed network-structured multiscale dilated encoder-decoder convolution model obtains rich component information and efficiently improves the capture of global information by obtaining low- and high-level semantic features. To achieve a delicate parsing output of the face components along the borders and analyze the connections between the face categories and their border edges, the semantic edge map is learned using a conditional random field, which aims to distinguish border and non-border pixels during the modeling. We conducted experiments using three well-known publicly available face databases. The recorded results demonstrate the high accuracy and capacity of the proposed method in comparison to previous state-of-art methods. Our proposed model achieved a mean accuracy of 90% on the CelebAMask-HQdataset for the category case and 81.43% for the accessory case, and achieved accuracies of 91.58% and 92.44% on the HELEN and LaPa datasets, respectively, thereby demonstrating its effectiveness. (C) 2022 The Author(s). Published by Elsevier B.V.

关键词： Face segmentation Dilated convolution Multiscale encoder-decoder Conditional random field

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：