检索结果-内蒙古大学图书馆

Deep convolutional neural networks for semantic segmentation of cracks

STRUCTURAL CONTROL & HEALTH MONITORING 2022年第1期29卷 e2850-e2850页

作者： Wang, Jia-Ji Liu, Yu-Fei Nie, Xin Mo, Y. L. Tsinghua Univ Dept Civil Engn Beijing 100084 Peoples R China Univ Houston Dept Civil & Environm Engn Houston TX 77204 USA

A large crack detection dataset of 2446 manually labeled images is established to cover a wide range of noise and to evaluate the performance of end-to-end deep convolutional networks in detecting cracking. Five state-of-the-art end-to-end deep computer vision architectures for semantic segmentation are trained and evaluated, including Fully Convolutional Network (FCN), Global Convolutional Network (GCN), Pyramid Scene Parsing Network (PSPNet), UPerNet, and DeepLabv3+. For the backbones, the VGG, ResNet, and DenseNet are adopted. Based on the comparison of test set metrics, DeepLabv3+ with the ResNet101 backbone achieved the highest IoU of 0.6298, the highest recall of 0.6834, and the highest F1 score of 0.7732. The influence of database choice and image noise on crack detection performance is reported. Based on the comparison of predicted images, UperNet with ResNet101 backbone shows the highest performance for images with shadings, while DeepLabv3+ with ResNet101 backbone shows the best performance for images with blemishes. The research outcome can provide reference for the application of fast and accurate detection of cracks in civil engineering.

关键词： computer vision convolutional neural network crack deep learning encoder-decoder semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Asymmetric cross-modal activation network for RGB-T salient object detection

引用

KNOWLEDGE-BASED SYSTEMS 2022年 258卷

作者： Xu, Chang Li, Qingwu Zhou, Qingkai Jiang, Xiongbiao Yu, Dabing Zhou, Yaqin Hohai Univ Coll Internet Things Engn Changzhou 213022 Jiangsu Peoples R China Hohai Univ Jiangsu Key Lab Power Transmiss & Distribut Equipm Changzhou 213022 Jiangsu Peoples R China

RGB-thermal salient object detection (RGB-T SOD) has unique advantages in terms of handling challenging scenes with cluttered backgrounds, low illumination, and low contrast. However, because they do not consider the significant differences between different imaging mechanisms and inherent characteristics of thermal images, existing RGB-T SOD methods are generally unable to handle diverse feature fusion demands and may yield unsatisfactory performance. To overcome this problem and achieve more effective RGB-T SOD, we propose an asymmetric cross-modal activation network to exploit the interactions of modality-specific features based on an asymmetric feature fusion strategy. Specifically, a two-stream asymmetric feature aggregation encoder module is proposed to fuse multimodality features adaptively and extract complementary information. The self-attention of multimodality features is leveraged to guide cross-modal interactions, which can propagate long-range contextual dependencies and extract effective saliency cues. Furthermore, a multitask decoder is proposed to achieve SOD and thermal image reconstruction in a unified framework. Salient objects can be located and segmented accurately based on reconstructed high-resolution feature representations. Extensive experiments on public RGB-T and RGB-D SOD datasets demonstrate the superiority of the proposed network and ablation experiments highlight the effectiveness of each component. Our code and saliency maps are available at: ***/xanxuso/ACMANet.(c) 2022 Elsevier B.V. All rights reserved.

关键词： RGB-T salient object detection Multimodality Feature fusion encoder-decoder Long-range dependence

来源：评论

学校读者我要写书评

暂无评论

Rethinking Table Structure Recognition Using Sequence Labeling Methods 16th

Rethinking Table Structure Recognition Using Sequence Labeli...

引用

16th IAPR International Conference on Document Analysis and Recognition (ICDAR)

作者： Li, Yibo Huang, Yilun Zhu, Ziyi Pan, Lemeng Huang, Yongshuai Du, Lin Tang, Zhi Gao, Liangcai Peking Univ Wangxuan Inst Comp Technol Beijing Peoples R China Peking Univ Ctr Data Sci Beijing Peoples R China Huawei AI Applicat Res Ctr Huawei Peoples R China

ISBN: (纸本)9783030863319

Table structure recognition is an important task in document analysis and attracts the attention of many researchers. However, due to the diversity of table types and the complexity of table structure, the performances of table structure recognition methods are still not well enough in practice. Row and column separators play a significant role in the two-stage table structure recognition and a better row and column separator segmentation result can improve the final recognition results. Therefore, in this paper, we present a novel deep learning model to detect row and column separators. This model contains a convolution encoder and two parallel row and column decoders. The encoder can extract the visual features by using convolution blocks;the decoder formulates the feature map as a sequence and uses a sequence labeling model, bidirectional long short-term memory networks (BiLSTM) to detect row and column separators. Experiments have been conducted on PubTabNet and the model is benchmarked on several available datasets, including Pub-TabNet, UNLV ICDAR13, ICDAR19. The results show that our model has a state-of-the-art performance than other strong models. In addition, our model shows a better generalization ability. The code is available on this site (www ***/L597383845/row-col-table-recognition).

关键词： Table structure recognition encoder-decoder Row and column separators segmentation Sequence labeling model

来源：评论

学校读者我要写书评

暂无评论

HRLINKNET: LINKNET WITH HIGH-RESOLUTION REPRESENTATION FOR HIGH-RESOLUTION SATELLITE IMAGERY

HRLINKNET: LINKNET WITH HIGH-RESOLUTION REPRESENTATION FOR H...

引用

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Wu, Muyu Shu, Zhen Zhang, Jinming Hu, Xiangyun Wuhan Univ Sch Remote Sensing & Informat Engn Wuhan Hubei Peoples R China

ISBN: (纸本)9781665403696

Automatic extraction of buildings from high-resolution remote sensing imagery is very useful in many applications such as city management, mapping, urban planning and geographic information updating. However, due to the general texture of the building and the complexity of the image background, high-precision building segmentation from high-resolution sensing image is still a challenging task. Existing state-of-the-art frameworks use repeated pooling and step operations leading to the loss of detailed information. Thus, high-resolution representations are essential for building extraction. On this basis, our proposed network, named as HRLinkNet, maintains high-resolution representations through the whole process based on the LinkNet. We tested it on WHU Building dataset. Experimental results show that the proposed HRLinkNet is superior to the LinkNet, UNet, DLinkNet, segnet and so on.

关键词： Deep learning high-resolution remote sensing imagery building extraction semantic segmentation encoder-decoder fully convolutional networks

来源：评论

学校读者我要写书评

暂无评论

Deep learning-based multi-output quantile forecasting of PV generation

Deep learning-based multi-output quantile forecasting of PV ...

引用

14th IEEE Madrid PowerTech Conference (IEEE POWERTECH)

作者： Dumas, Jonathan Cointe, Colin Fettweis, Xavier Cornelusse, Bertrand Univ Liege Dept Comp Sci Liege Belgium Univ Liege Dept Elect Engn Liege Belgium Univ Liege Dept Geog Liege Belgium

ISBN: (纸本)9781665435970

This paper develops probabilistic PV forecasters by taking advantage of recent breakthroughs in deep learning. It tailored forecasting tool, named encoder-decoder, is implemented to compute intraday multi-output PV quantiles forecasts to efficiently capture the time correlation. The models are trained using quantile regression, a non-parametric approach that assumes no prior knowledge of the probabilistic forecasting distribution. The case study is composed of PV production monitored on-site at the University of Liege (ULiege), Belgium. The weather forecasts from the regional climate model provided by the Laboratory of Climatology are used as inputs of the deep learning models. The forecast quality is quantitatively assessed by the continuous ranked probability and interval scores. The results indicate this architecture improves the forecast quality and is computationally efficient to be incorporated in an intraday decision-making tool for robust optimization.

关键词： Quantile forecasting probabilistic PV forecasting LSTM deep learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Extractive-Abstractive Summarization of Judgment Documents Using Multiple Attention Networks 4th

Extractive-Abstractive Summarization of Judgment Documents U...

引用

4th International Conference on Logic and Argumentation (CLAR)

作者： Gao, Yan Liu, Zhengtao Li, Juan Guo, Fan Xiao, Fei Cent South Univ Sch Automat Changsha 410083 Peoples R China Cent South Univ Sch Law Changsha Peoples R China

ISBN: (纸本)9783030893910;9783030893903

Judgment documents contain rich legal information, they are simultaneously lengthy with complex structure. This requires summarizing judgment documents in an effective way. By analyzing the structural features of Chinese judgment documents, we propose an automatic summarization method, which consists of an extraction model and an abstraction model. In the extraction model, all the sentences are encoded by a Self-Attention network and are classified into key sentences and non-key sentences. In the abstraction model, the initial summarization is refined into a final summarization by a unidirectional-bidirectional attention network. Such a summarization could help improve the efficiency in case handling and make judgment documents more accessible to the general readers. The experimental results on CAIL2020 dataset are satisfactory.

关键词： Judgment documents Automatic summarization Attention network encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

ARRHYTHMIA CLASSIFICATION WITH HEARTBEAT-AWARE TRANSFORMER

ARRHYTHMIA CLASSIFICATION WITH HEARTBEAT-AWARE TRANSFORMER

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Wang, Bin Liu, Chang Hu, Chuanyan Liu, Xudong Cao, Jun Lepu Med Technol Beijing Peoples R China

ISBN: (纸本)9781728176055

Electrocardiography (ECG) is a conventional method in arrhythmia diagnosis. In this paper, we proposed a novel neural network model which treats typical heartbeat classification task as 'Translation' problem. By introducing Transformer structure into model, and adding heartbeat-aware attention mechanism to enhance the alignment between encoded sequence and decoded sequence, after trained with ECG database, (which are collected from 200k patients in over 2000 hospitals for more than 10 years), the validation result of independent test dataset shows that this new heartbeat-aware Transformer model can outperform classic Transformer and other sequence to sequence methods. Finally, we show that the visualization of encoder-decoder attention weights provides more interpretable information about how a Transformer make a diagnosis based on raw ECG signals, which has guiding significance in clinical diagnosis.

关键词： Transformer encoder-decoder ECG Heartbeat-Aware Time Series

来源：评论

学校读者我要写书评

暂无评论

SHOW AND SPEAK: DIRECTLY SYNTHESIZE SPOKEN DESCRIPTION OF IMAGES

SHOW AND SPEAK: DIRECTLY SYNTHESIZE SPOKEN DESCRIPTION OF IM...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Wang, Xinsheng Feng, Siyuan Zhu, Jihua Hasegawa-Johnson, Mark Scharenborg, Odette Xi An Jiao Tong Univ Sch Software Engn Xian Peoples R China Delft Univ Technol Multimedia Comp Grp Delft Netherlands Univ Illinois Dept Elect & Comp Engn Urbana IL USA

ISBN: (纸本)9781728176055

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.

关键词： Image-to-speech image captioning speech synthesis sequence-to-sequence encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

A Blended Attention-CTC Network Architecture for Amharic Text-image Recognition 10th

A Blended Attention-CTC Network Architecture for Amharic Tex...

引用

10th International Conference on Pattern Recognition Applications and Methods (ICPRAM)

作者： Belay, Birhanu Hailu Habtegebrial, Tewodros Liwicki, Marcus Belay, Gebeyehu Stricker, Didier Tech Univ Kaiserslautern Kaiserslautern Germany Lulea Univ Technol Lulea Sweden Bahir Dar Inst Technol Bahir Dar Ethiopia DFKI Augmented Vis Dept Kaiserslautern Germany

ISBN: (纸本)9789897584862

In this paper, we propose a blended Attention-Connectionist Temporal Classification (CTC) network architecture for a unique script, Amharic, text-image recognition. Amharic is an indigenous Ethiopic script that uses 34 consonant characters with their 7 vowel variants of each and 50 labialized characters which are derived, with a small change, from the 34 consonant characters. The change involves modifying the structure of these characters by adding a straight line, or shortening and/or elongating one of its main legs including the addition of small diacritics to the right, left, top or bottom of the character. Such a small change affects orthographic identities of character and results in shape similarly among characters which are interesting, but challenging task, for OCR research. Motivated with the recent success of attention mechanism on neural machine translation tasks, we propose an attention-based CTC approach which is designed by blending attention mechanism directly within the CTC network. The proposed model consists of an encoder module, attention module and transcription module in a unified framework. The efficacy of the proposed model on the Amharic language shows that attention mechanism allows learning powerful representations by integrating information from different time steps. Our method outperforms state-of-the-art methods and achieves 1.04% and 0.93% of the character error rate on ADOCR test datasets.

关键词： Amharic Script Blended Attention-CTC BLSTM CNN encoder-decoder Network Architecture OCR Pattern Recognition

来源：评论

学校读者我要写书评

暂无评论

Historical Report Assist Medical Report Generation 14

Historical Report Assist Medical Report Generation

引用

14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC) / 14th Int Conf on Bio-inspired Systems and Signal Processing (BIOSIGNALS) / 14th Int Conf on Biomedical Electronics and Devices (BIODEVICES)

作者： Ye, Shan Wang, Mei Dong, Yijie Donghua Univ Sch Comp Sci & Technol Shanghai Peoples R China Shanghai Jiao Tong Univ Sch Med Ruijin Hosp Dept Ultrasound Shanghai Peoples R China

ISBN: (纸本)9789897584909

How to automatically generate diagnostic reports with accurate content, standardized structure and clear semantics, brings great challenges due to the complexity of medical images and the detailed paragraph descriptions for medical images. The structure and the semantic contents of the historical report are very helpful for the current report generation. This paper proposes a text report generation method assisted by historical reports. In the proposed method, both the previous report and the keywords generated from the current images are modeled by using two encoders respectively. The co-attention mechanism is introduced to jointly learn the historical reports and the keywords. The decoder based on the co-attention is used to generate a long description of the image. The progress that learns from the historical report and the current report in the training set helps to generate an accurate report for the new image. Furthermore, the structure in the historical report helps to generate a more natural text report. We conducted experiments on the practical ultrasound data, which is provided by a prestigious hospital in China. The experimental results show that the reports generated by the proposed method are closer to the reports generated by radiologists.

关键词： Automatic Report Generation Historical Report encoder-decoder Co-attention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：