检索结果-内蒙古大学图书馆

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Al-Saad, Mina Aburaed, Nour Zitouni, M. Sami Alkhatib, Mohammed Q. Almansoori, Saeed Al Ahmad, Hussain Univ Dubai Coll Engn & IT Dubai U Arab Emirates Univ Strathclyde Dept Elect & Elect Engn Glasgow Lanark Scotland Mohammed Bin Rashid Space Ctr Dubai U Arab Emirates

ISBN: (纸本)9798350320107

Accurate flood mapping plays a critical role in disaster management, allowing for effective response and mitigation efforts. Thus, researchers seek to boost the accuracy of flood mapping algorithms, especially in terms of generalization capability and minimizing False Positive and False Negative detection. This paper presents a robust flood mapping algorithm from SAR images via Deep Convolutional Neural Network (DCNN) that follows encoder-decoder scheme. By introducing Bidirectional Convolutional LSTM (ConvLSTM) layers into its architecture, the proposed Temporal-Spatial encoder-decoder Network (TSEDN) network is able to extract temporal information and produce more accurate change maps. The training and testing are carried using OMBRIA dataset, which is known to be challenging to train. The proposed network is evaluated and compared to other state-of-the-art approaches in terms of Overall Accuracy (OA), Precision, Recall, and mean Intersection over Union (mIoU).

关键词： SAR change detection flood mapping encoder-decoder Bidirectional LSTM

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Model for Multilingual OCR 17th

A Hybrid Model for Multilingual OCR

引用

17th International Conference on Document Analysis and Recognition (ICDAR)

作者： Etter, David Carpenter, Cameron King, Nolan Johns Hopkins Univ Human Language Technol Ctr Excellence Baltimore MD 21218 USA SCALE Laurel MD USA

ISBN: (纸本)9783031416750;9783031416767

Large-scale document processing pipelines are required to recognize text in many different languages. The writing systems for these languages cover a diverse set of scripts, such as the standard Latin characters, the logograms of Chinese, and the cursive right-to-left of Arabic. Multilingual OCR continues to be a challenging task for document processing due to the large vocabulary sizes and diversity of scripts. This work introduces a multilingual model that recognizes over nine thousand unique characters and seamlessly switches between ten different scripts. Our transformer-based encoder-decoder approach combines a CTC objective on the encoder with a cross-entropy objective on the full autoregressive decoder. The hybrid approach allows the fast non-autoregressive encoder to be used in standalone mode or with the full autoregressive decoder. We evaluate our approach on a large multilingual dataset, where we achieve state-of-the-art character error rate results in all thirteen languages. We also extend the encoder with auxiliary heads to identify language, predict font, and detect vertical lines.

关键词： Multilingual OCR Transformer Hybrid encoder-decoder CTC Auxiliary heads Synthetic data

来源：评论

学校读者我要写书评

暂无评论

DSEAformer: Forecasting by De-stationary Autocorrelation with Edgebound 1

引用

16th International Conference on Knowledge Science, Engineering and Management (KSEM)

作者： Ding, Peihao Tang, Yan Chen, Yingpei Li, Xiaobing Southwest Univ Sch Comp & Informat Sci Chongqing Peoples R China

ISBN: (数字)9783031402838

ISBN: (纸本)9783031402821;9783031402838

Time series analysis is vital for various real-world scenarios. Enhancing multivariate long-sequence time-series forecasting (MLTF) accuracy is crucial due to the increasing data volume and dimensionality. Current MLTF methods face challenges such as over-stationarization and distribution shift, affecting prediction accuracy. This paper proposes DSEAformer, a unique MLTF method that addresses distribution shift by normalizing and de-normalizing time series data. To avoid over-stationarization, a de-stationary autocorrelation method is suggested. Additionally, a time series optimization regularization based on weighted moving average helps prevent overfitting. Tests on three datasets confirm that DSEAformer outperforms existing MLTF techniques. In conclusion, DSEAformer introduces innovative ideas and methods to enhance time series prediction and offers improved practical applications.

关键词： Multivariate long-sequence time-series forecasting encoder-decoder Autocorrelation mechanism Temporal optimization regularization Sequence decomposition

来源：评论

学校读者我要写书评

暂无评论

A novel method for 12-lead ECG reconstruction 57

A novel method for 12-lead ECG reconstruction

引用

57th Asilomar Conference on Signals, Systems and Computers

作者： EPMoghaddam, Dorsa Banta, Anton Post, Allison Razavi, Mehdi Aazhang, Behnaam Rice Univ Dept Elect & Comp Engn POB 1892 Houston TX 77251 USA Texas Heart Inst Electrophysiol Clin Res & Innovat Houston TX USA Texas Heart Inst Dept Cardiol Houston TX USA

ISBN: (纸本)9798350325744

This paper presents a novel approach to synthesize a standard 12-lead electrocardiogram (ECG) from any three independent ECG leads using a patient-specific encoder-decoder convolutional neural network. The objective is to decrease the number of recording locations required to obtain the same information as a 12-lead ECG, thereby enhancing patients' comfort during the recording process. We evaluate the proposed algorithm on a dataset comprising fifteen patients, as well as a randomly selected cohort of patients from the PTB diagnostic database. To evaluate the precision of the reconstructed ECG signals, we present two metrics: the correlation coefficient and root mean square error. Our proposed method achieves superior performance compared to most existing synthesis techniques, with an average correlation coefficient of 0.976 and 0.97 for datasets, respectively. These results demonstrate the potential of our approach to improve the efficiency and comfort of ECG recording for patients, while maintaining high diagnostic accuracy.

关键词： electrocardiogram (ECG) Signal reconstruction cardiovascular diseases convolutional neural network encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Triple Extraction with Generative Technique for Constructing Weighted Knowledge Graph 22

Triple Extraction with Generative Technique for Constructing...

引用

22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

作者： Parniani, Mohammad Sahand Reformat, Marek Z. Univ Alberta Elect & Comp Engn Edmonton AB Canada Univ Social Sci Inst Informat Technol PL-90113 Lodz Poland

ISBN: (纸本)9798350309188

Extracting relational facts from unstructured text is crucial in natural language processing used in many applications, particularly in constructing knowledge graphs. Relational facts are represented as triples in which two entities are connected through a relation. This work introduces a new and effective end-to-end method to generate triples from the input text. In the proposed method, we develop an encoder-decoder-based transformer model and warm-start both the encoder and decoder with pretrained checkpoints that are publicly accessible. These checkpoints can be taken from models such as BERT, GPT-2, and RoBERTa. Experimental results show that our method achieves better results for triple extraction on publicly available datasets (NYT and WebNLG) than the other state-of-the-art techniques. Further, the extracted triples are processed and used to build a knowledge graph. Complete control of this process allows for determining the weights of the relations (triples). The weights reflect the frequency of occurrences of facts represented by the relations and provide the degree of confidence in the facts.

关键词： triple extraction knowledge graph encoder-decoder transformer

来源：评论

学校读者我要写书评

暂无评论

Trajectory Prediction for SSL Robots Using Seq2seq Neural Networks 25th

Trajectory Prediction for SSL Robots Using Seq2seq Neural Ne...

引用

RoboCup Symposium

作者： Steuernagel, Lucas Maximo, Marcos R. O. A. Aeronaut Inst Technol Div Comp Sci Autonomous Computat Syst Lab LAB SCA Praca Marechal Eduardo Gomes 50Vila Acacias BR-12228900 Sao Jose Dos Campos SP Brazil

ISBN: (纸本)9783031284687;9783031284694

The RoboCup Small Size League employs cylindrical robots of 15 cm height and 18 cm diameter. Presently, most teams utilize a Kalman predictor to forecast the trajectory of other robots for better motion planning and decision making. The predictor is limited for such task, for it typically cannot generate complex movements that take into account the future actions of a robot. In this context, we introduce an encoder-decoder sequence-to-sequence neural network that outperforms the Kalman predictor in trajectory forecasting. The network consists of a Bi-LSTM encoder, an attention module and a LSTM decoder. It can predict 15 future time steps, given 30 past measurements, or 30 time steps, given 60 past observations. The proposed model is roughly 50% more performant than a Kalman predictor in terms of average displacement error and runs in less than 2 ms. We believe that our new architecture will improve our team's decision making and provide a better competitive advantage for all teams. We are looking forward to integrating it with our software pipeline and continuing our research by incorporating new training methods and new inputs to the model.

关键词： Trajectory prediction Neural networks encoder-decoder Small Size League Sequence-to-sequence

来源：评论

学校读者我要写书评

暂无评论

Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation

Learning Semantic Information from Machine Translation to Im...

引用

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)

作者： Deng, Pan Zhang, Jie Zhou, Xinyuan Ye, Zhongyi Zhang, Weitai Cui, Jianwei Dai, Lirong Univ Sci & Technol China USTC NERC SLIP Hefei Peoples R China iFlytek Res Hefei Peoples R China

ISBN: (纸本)9798350300673

End-to-end speech translation (ST) directly translates the source speech to the target text, following a typical encoder-decoder framework. However, it has shown that the conventional ST encoder is mainly used to extract long but locally attentive acoustic features, which may lead to a lack of global semantic features. In this work, we therefore propose to integrate a semantic decoder into the speech translation (SD-ST) model, where the semantic decoder can generate text-like features with more global semantic information analogously to the machine translation system (MT). We also investigate different strategies to ensure length consistency between text-like features and text sequences. Experimental results show that the proposed SD-ST model achieves the best BLEU score on the 40-hour subset of the Fisher Spanish English dataset and a comparable BLEU score on the MuST-C dataset. Furthermore, it is shown that the SD-ST model can even perform zero-shot ST.

关键词： End-to-end speech translation semantic information encoder-decoder speech recognition

来源：评论

学校读者我要写书评

暂无评论

An attention based dual learning approach for video captioning

引用

APPLIED SOFT COMPUTING 2022年第0期117卷 108332-108332页

作者： Ji, Wanting Wang, Ruili Tian, Yan Wang, Xun Liaoning Univ Sch Informat Shenyang Peoples R China Zhejiang Gongshang Univ Sch Comp Sci & Informat Engn Hangzhou Peoples R China

Video captioning aims to generate sentences/captions to describe video contents. It is one of the key tasks in the field of multimedia processing. However, most of the current video captioning approaches utilize only the visual information of a video to generate captions. Recently, a new encoder-decoderreconstructor architecture was developed for video captioning, which can capture the information in both raw videos and the generated captions through dual learning. Based on this architecture, this paper proposes a novel attention based dual learning approach (ADL) for video captioning. Specifically, ADL is composed of a caption generation module and a video reconstruction module. The caption generation module builds a translatable mapping between raw video frames and the generated video captions, i.e., using the visual features extracted from videos by an Inception-V4 network to produce video captions. Then the video reconstruction module reproduces raw video frames using the generated video captions, i.e., using the hidden states of the decoder in the caption generation module to reproduce/synthesize raw visual features. A multi-head attention mechanism is adopted to help the two modules focus on the most effective information in videos and captions, and a dual learning mechanism is adopted to fine-tune the performance of the two modules to generate final video captions. Therefore, ADL can minimize the semantic gap between raw videos and the generated captions by minimizing the differences between the reproduced and the raw videos, thereby improving the quality of the generated video captions. Experimental results demonstrate that ADL is superior to the state-of-the-art video captioning approaches on benchmark datasets. (C) 2021 Published by Elsevier B.V.

关键词： Attention mechanism Deep neural network Dual learning encoder-decoder Video captioning

来源：评论

学校读者我要写书评

暂无评论

A Novel Graph-Based Trajectory Predictor With Pseudo-Oracle

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022年第12期33卷 7064-7078页

作者： Yang, Biao Yan, Guocheng Wang, Pin Chan, Ching-Yao Song, Xiang Chen, Yang Changzhou Univ Dept Informat Sci & Engn Changzhou 213000 Jiangsu Peoples R China Univ Calif Berkeley Calif PATH Richmond CA 94804 USA Nanjing Xiaozhuang Univ Sch Elect Engn Nanjing 211171 Peoples R China

Pedestrian trajectory prediction in dynamic scenes remains a challenging and critical problem in numerous applications, such as self-driving cars and socially aware robots. Challenges concentrate on capturing pedestrians' motion patterns and social interactions, as well as handling the future uncertainties. Recent studies focus on modeling pedestrians' motion patterns with recurrent neural networks, capturing social interactions with pooling- or graph-based methods, and handling future uncertainties by using the random Gaussian noise as the latent variable. However, they do not integrate specific obstacle avoidance experiences (OAEs) that may improve prediction performance. For example, pedestrians' future trajectories are always influenced by others in front. Here, we propose the Graph-based Trajectory Predictor with Pseudo-Oracle (GTPPO), an encoder-decoder-based method conditioned on pedestrians' future behaviors. Pedestrians' motion patterns are encoded with a long short-term memory unit, which introduces temporal attention to highlight specific time steps. Their interactions are captured by a graph-based attention mechanism, which draws OAE into the data-driven learning process of graph attention. Future uncertainties are handled by generating multimodal outputs with an informative latent variable. Such a variable is generated by a novel pseudo-oracle predictor, which minimizes the knowledge gap between historical and ground-truth trajectories. Finally, the GTPPO is evaluated on ETH, UCY, and Stanford Drone datasets, and the results demonstrate state-of-the-art performance. Besides, the qualitative evaluations show successful cases of handling sudden motion changes in the future. Such findings indicate that GTPPO can peek into the future.

关键词： Trajectory Uncertainty Hidden Markov models Training Encoding Dynamics Recurrent neural networks encoder-decoder graph attention network latent variable predictor social attention trajectory prediction

来源：评论

学校读者我要写书评

暂无评论

A SHALLOW U-NET WITH SPLIT-FUSED ATTENTION MECHANISM FOR RETINAL VESSEL SEGMENTATION 30

A SHALLOW U-NET WITH SPLIT-FUSED ATTENTION MECHANISM FOR RET...

引用

30th IEEE International Conference on Image Processing (ICIP)

作者： Bhati, Amit Jain, Samir Gour, Neha Khanna, Pritee Ojha, Aparajita Werghi, Naoufel PDPM IIITDM Dept Comp Sci & Engn Jabalpur India Khalifa Univ Dept Elect Engn & Comp Sci C2PS Abu Dhabi U Arab Emirates

ISBN: (纸本)9781728198354

Extraction of retinal vascular parts is an important task in retinal disease diagnosis. Precise segmentation of the retinal vascular pattern is challenging due to its complex structure, overlapping with other anatomical structures, and crucial thin vascular structures. In recent years, complex and heavy deep learning networks have been proposed to segment retinal blood vessels accurately. However, these methods fail to detect the thin vascular structure among different patterns of thick vessels. An attention-based novel architecture is proposed to segment the thin vasculature to address this limitation. The proposed model comprises a shallow U-Net based encoder-decoder architecture with split-fuse attention (SFA) block. The proposed SFA block enables the network to identify the placement of pixels for the tree-shaped vessel patterns at their relative position during the reconstruction phase in the decoder. The attention block aggregates low-level and high-level semantic information, improving the vessel segmentation performance. Experimentation performed on publicly available fundus datasets, DRIVE, HRF, CHASE-DB1, and STARE show that the proposed method performs better than the current state-of-the-art methods. The results demonstrate the adaptability of the proposed model for clinical applications due to its low memory footprint and better performance.

关键词： Retinal Vessel Segmentation Shallow U-Net Split Fused Attention Fully Convolution Network encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：