Accurate flood mapping plays a critical role in disaster management, allowing for effective response and mitigation efforts. Thus, researchers seek to boost the accuracy of flood mapping algorithms, especially in term...
详细信息
ISBN:
(纸本)9798350320107
Accurate flood mapping plays a critical role in disaster management, allowing for effective response and mitigation efforts. Thus, researchers seek to boost the accuracy of flood mapping algorithms, especially in terms of generalization capability and minimizing False Positive and False Negative detection. This paper presents a robust flood mapping algorithm from SAR images via Deep Convolutional Neural Network (DCNN) that follows encoder-decoder scheme. By introducing Bidirectional Convolutional LSTM (ConvLSTM) layers into its architecture, the proposed Temporal-Spatial encoder-decoder Network (TSEDN) network is able to extract temporal information and produce more accurate change maps. The training and testing are carried using OMBRIA dataset, which is known to be challenging to train. The proposed network is evaluated and compared to other state-of-the-art approaches in terms of Overall Accuracy (OA), Precision, Recall, and mean Intersection over Union (mIoU).
Large-scale document processing pipelines are required to recognize text in many different languages. The writing systems for these languages cover a diverse set of scripts, such as the standard Latin characters, the ...
详细信息
ISBN:
(纸本)9783031416750;9783031416767
Large-scale document processing pipelines are required to recognize text in many different languages. The writing systems for these languages cover a diverse set of scripts, such as the standard Latin characters, the logograms of Chinese, and the cursive right-to-left of Arabic. Multilingual OCR continues to be a challenging task for document processing due to the large vocabulary sizes and diversity of scripts. This work introduces a multilingual model that recognizes over nine thousand unique characters and seamlessly switches between ten different scripts. Our transformer-based encoder-decoder approach combines a CTC objective on the encoder with a cross-entropy objective on the full autoregressive decoder. The hybrid approach allows the fast non-autoregressive encoder to be used in standalone mode or with the full autoregressive decoder. We evaluate our approach on a large multilingual dataset, where we achieve state-of-the-art character error rate results in all thirteen languages. We also extend the encoder with auxiliary heads to identify language, predict font, and detect vertical lines.
Time series analysis is vital for various real-world scenarios. Enhancing multivariate long-sequence time-series forecasting (MLTF) accuracy is crucial due to the increasing data volume and dimensionality. Current MLT...
详细信息
ISBN:
(数字)9783031402838
ISBN:
(纸本)9783031402821;9783031402838
Time series analysis is vital for various real-world scenarios. Enhancing multivariate long-sequence time-series forecasting (MLTF) accuracy is crucial due to the increasing data volume and dimensionality. Current MLTF methods face challenges such as over-stationarization and distribution shift, affecting prediction accuracy. This paper proposes DSEAformer, a unique MLTF method that addresses distribution shift by normalizing and de-normalizing time series data. To avoid over-stationarization, a de-stationary autocorrelation method is suggested. Additionally, a time series optimization regularization based on weighted moving average helps prevent overfitting. Tests on three datasets confirm that DSEAformer outperforms existing MLTF techniques. In conclusion, DSEAformer introduces innovative ideas and methods to enhance time series prediction and offers improved practical applications.
This paper presents a novel approach to synthesize a standard 12-lead electrocardiogram (ECG) from any three independent ECG leads using a patient-specific encoder-decoder convolutional neural network. The objective i...
详细信息
ISBN:
(纸本)9798350325744
This paper presents a novel approach to synthesize a standard 12-lead electrocardiogram (ECG) from any three independent ECG leads using a patient-specific encoder-decoder convolutional neural network. The objective is to decrease the number of recording locations required to obtain the same information as a 12-lead ECG, thereby enhancing patients' comfort during the recording process. We evaluate the proposed algorithm on a dataset comprising fifteen patients, as well as a randomly selected cohort of patients from the PTB diagnostic database. To evaluate the precision of the reconstructed ECG signals, we present two metrics: the correlation coefficient and root mean square error. Our proposed method achieves superior performance compared to most existing synthesis techniques, with an average correlation coefficient of 0.976 and 0.97 for datasets, respectively. These results demonstrate the potential of our approach to improve the efficiency and comfort of ECG recording for patients, while maintaining high diagnostic accuracy.
Extracting relational facts from unstructured text is crucial in natural language processing used in many applications, particularly in constructing knowledge graphs. Relational facts are represented as triples in whi...
详细信息
ISBN:
(纸本)9798350309188
Extracting relational facts from unstructured text is crucial in natural language processing used in many applications, particularly in constructing knowledge graphs. Relational facts are represented as triples in which two entities are connected through a relation. This work introduces a new and effective end-to-end method to generate triples from the input text. In the proposed method, we develop an encoder-decoder-based transformer model and warm-start both the encoder and decoder with pretrained checkpoints that are publicly accessible. These checkpoints can be taken from models such as BERT, GPT-2, and RoBERTa. Experimental results show that our method achieves better results for triple extraction on publicly available datasets (NYT and WebNLG) than the other state-of-the-art techniques. Further, the extracted triples are processed and used to build a knowledge graph. Complete control of this process allows for determining the weights of the relations (triples). The weights reflect the frequency of occurrences of facts represented by the relations and provide the degree of confidence in the facts.
The RoboCup Small Size League employs cylindrical robots of 15 cm height and 18 cm diameter. Presently, most teams utilize a Kalman predictor to forecast the trajectory of other robots for better motion planning and d...
详细信息
ISBN:
(纸本)9783031284687;9783031284694
The RoboCup Small Size League employs cylindrical robots of 15 cm height and 18 cm diameter. Presently, most teams utilize a Kalman predictor to forecast the trajectory of other robots for better motion planning and decision making. The predictor is limited for such task, for it typically cannot generate complex movements that take into account the future actions of a robot. In this context, we introduce an encoder-decoder sequence-to-sequence neural network that outperforms the Kalman predictor in trajectory forecasting. The network consists of a Bi-LSTM encoder, an attention module and a LSTM decoder. It can predict 15 future time steps, given 30 past measurements, or 30 time steps, given 60 past observations. The proposed model is roughly 50% more performant than a Kalman predictor in terms of average displacement error and runs in less than 2 ms. We believe that our new architecture will improve our team's decision making and provide a better competitive advantage for all teams. We are looking forward to integrating it with our software pipeline and continuing our research by incorporating new training methods and new inputs to the model.
End-to-end speech translation (ST) directly translates the source speech to the target text, following a typical encoder-decoder framework. However, it has shown that the conventional ST encoder is mainly used to extr...
详细信息
ISBN:
(纸本)9798350300673
End-to-end speech translation (ST) directly translates the source speech to the target text, following a typical encoder-decoder framework. However, it has shown that the conventional ST encoder is mainly used to extract long but locally attentive acoustic features, which may lead to a lack of global semantic features. In this work, we therefore propose to integrate a semantic decoder into the speech translation (SD-ST) model, where the semantic decoder can generate text-like features with more global semantic information analogously to the machine translation system (MT). We also investigate different strategies to ensure length consistency between text-like features and text sequences. Experimental results show that the proposed SD-ST model achieves the best BLEU score on the 40-hour subset of the Fisher Spanish English dataset and a comparable BLEU score on the MuST-C dataset. Furthermore, it is shown that the SD-ST model can even perform zero-shot ST.
Video captioning aims to generate sentences/captions to describe video contents. It is one of the key tasks in the field of multimedia processing. However, most of the current video captioning approaches utilize only ...
详细信息
Video captioning aims to generate sentences/captions to describe video contents. It is one of the key tasks in the field of multimedia processing. However, most of the current video captioning approaches utilize only the visual information of a video to generate captions. Recently, a new encoder-decoderreconstructor architecture was developed for video captioning, which can capture the information in both raw videos and the generated captions through dual learning. Based on this architecture, this paper proposes a novel attention based dual learning approach (ADL) for video captioning. Specifically, ADL is composed of a caption generation module and a video reconstruction module. The caption generation module builds a translatable mapping between raw video frames and the generated video captions, i.e., using the visual features extracted from videos by an Inception-V4 network to produce video captions. Then the video reconstruction module reproduces raw video frames using the generated video captions, i.e., using the hidden states of the decoder in the caption generation module to reproduce/synthesize raw visual features. A multi-head attention mechanism is adopted to help the two modules focus on the most effective information in videos and captions, and a dual learning mechanism is adopted to fine-tune the performance of the two modules to generate final video captions. Therefore, ADL can minimize the semantic gap between raw videos and the generated captions by minimizing the differences between the reproduced and the raw videos, thereby improving the quality of the generated video captions. Experimental results demonstrate that ADL is superior to the state-of-the-art video captioning approaches on benchmark datasets. (C) 2021 Published by Elsevier B.V.
Pedestrian trajectory prediction in dynamic scenes remains a challenging and critical problem in numerous applications, such as self-driving cars and socially aware robots. Challenges concentrate on capturing pedestri...
详细信息
Pedestrian trajectory prediction in dynamic scenes remains a challenging and critical problem in numerous applications, such as self-driving cars and socially aware robots. Challenges concentrate on capturing pedestrians' motion patterns and social interactions, as well as handling the future uncertainties. Recent studies focus on modeling pedestrians' motion patterns with recurrent neural networks, capturing social interactions with pooling- or graph-based methods, and handling future uncertainties by using the random Gaussian noise as the latent variable. However, they do not integrate specific obstacle avoidance experiences (OAEs) that may improve prediction performance. For example, pedestrians' future trajectories are always influenced by others in front. Here, we propose the Graph-based Trajectory Predictor with Pseudo-Oracle (GTPPO), an encoder-decoder-based method conditioned on pedestrians' future behaviors. Pedestrians' motion patterns are encoded with a long short-term memory unit, which introduces temporal attention to highlight specific time steps. Their interactions are captured by a graph-based attention mechanism, which draws OAE into the data-driven learning process of graph attention. Future uncertainties are handled by generating multimodal outputs with an informative latent variable. Such a variable is generated by a novel pseudo-oracle predictor, which minimizes the knowledge gap between historical and ground-truth trajectories. Finally, the GTPPO is evaluated on ETH, UCY, and Stanford Drone datasets, and the results demonstrate state-of-the-art performance. Besides, the qualitative evaluations show successful cases of handling sudden motion changes in the future. Such findings indicate that GTPPO can peek into the future.
Extraction of retinal vascular parts is an important task in retinal disease diagnosis. Precise segmentation of the retinal vascular pattern is challenging due to its complex structure, overlapping with other anatomic...
详细信息
ISBN:
(纸本)9781728198354
Extraction of retinal vascular parts is an important task in retinal disease diagnosis. Precise segmentation of the retinal vascular pattern is challenging due to its complex structure, overlapping with other anatomical structures, and crucial thin vascular structures. In recent years, complex and heavy deep learning networks have been proposed to segment retinal blood vessels accurately. However, these methods fail to detect the thin vascular structure among different patterns of thick vessels. An attention-based novel architecture is proposed to segment the thin vasculature to address this limitation. The proposed model comprises a shallow U-Net based encoder-decoder architecture with split-fuse attention (SFA) block. The proposed SFA block enables the network to identify the placement of pixels for the tree-shaped vessel patterns at their relative position during the reconstruction phase in the decoder. The attention block aggregates low-level and high-level semantic information, improving the vessel segmentation performance. Experimentation performed on publicly available fundus datasets, DRIVE, HRF, CHASE-DB1, and STARE show that the proposed method performs better than the current state-of-the-art methods. The results demonstrate the adaptability of the proposed model for clinical applications due to its low memory footprint and better performance.
暂无评论