Synthetic aperture radar (SAR) image ship detection has important applications in marine surveillance. There are two limitations when applying advanced detection methods naively for SAR ship detection. First, most det...
详细信息
Synthetic aperture radar (SAR) image ship detection has important applications in marine surveillance. There are two limitations when applying advanced detection methods naively for SAR ship detection. First, most detectors construct the model as an encoder and rely on the feature pyramid network (FPN) head for accurate prediction, which may lead to high computational costs. Second, the background noises in the ground truth (annotated as rectangular bounding boxes) of angular ships bring difficulties for model training. To meet these challenges, we propose an efficient encoder-decoder network with estimated direction for ship detection in SAR images. First, we present an anchor-free encoder-decoder model that can efficiently extract multiple-level features. Second, we formulate ship detection as a multitask learning problem, including a bounding box prediction and a ship direction regression. The estimated ship direction can weakly supervise and benefit ship detection. Furthermore, we develop a center-weighted labeling method for overlapped annotations. Comprehensive experiments on SAR-Ship-Detection and SSDD datasets show that our method achieves state-of-the-art performance with a high running speed.
Precipitation nowcasting is quite important and fundamental. It underlies various public services ranging from rainstorm warnings to flight safety. In order to further improve the prediction accuracy for the spatiotem...
详细信息
Precipitation nowcasting is quite important and fundamental. It underlies various public services ranging from rainstorm warnings to flight safety. In order to further improve the prediction accuracy for the spatiotemporal sequence forecasting problem, we propose an encoder-decoder deep residual attention prediction network, which adaptively rescales the multiscale sequence- and spatial-wise features and achieves very deep trainable residual prediction by integrating global residual learning and local deep residual sequence and spatial attention blocks (RSSABs). Experiments in a real-world radar echo map dataset of South China show that compared with the ingenious PredRNN++, TrajGRU methods, and newly proposed Unet-based methods, our ED-DRAP network performs better on the precipitation nowcasting metrics, as well as occupies small GPU memory.
Silicon content is a significant index in the process of blast furnace ironmaking. It is used to measure the quality of molten iron *** only meets the requirements if it is too high or too low. In the production proce...
详细信息
ISBN:
(纸本)9798350321050
Silicon content is a significant index in the process of blast furnace ironmaking. It is used to measure the quality of molten iron *** only meets the requirements if it is too high or too low. In the production process,the silicon content in molten iron needs to be controlled within a stable *** the same time,due to the time lag, nonlinear and dynamic characteristics of blast furnace itself, it is difficult to predict the silicon content accurately. This paper proposes a multi-head self-attention-based gate recurrent unit encoder-decoder framework that can better extract global dynamic features and local features, improve prediction accuracy and pass the experimental verification.
Speed prediction is a crucial yet complicated task for intelligent transportation systems. The challenge derives from the complex spatiotemporal dependencies of traffic parameters. In the past few years, deep neural n...
详细信息
Speed prediction is a crucial yet complicated task for intelligent transportation systems. The challenge derives from the complex spatiotemporal dependencies of traffic parameters. In the past few years, deep neural networks have achieved the best traffic speed prediction performance. However, most models depend on short-term input sequences to predict short/long-term traffic speed (e.g., predicting speed for the next hour using data from the past hour). These models fail to consider the daily and weekly periodic behavior of traffic. Another problem posed by neural networks is the lack of interpretability as they often operate as ``black boxes''. In this paper, an attention-based multi-encoder-decoder (Att-MED) model is proposed to predict traffic speed. The model uses convolutional-LSTMs to capture the spatiotemporal relationship of multiple input sequences, namely short-term, daily and weekly traffic patterns. The model also employs an LSTM to model the output predictions sequentially. Furthermore, attention mechanism is used to weigh the contribution of each traffic sequence towards the output predictions. The proposed network architecture, when trained end-to-end, results in a superior prediction accuracy compared to baseline models. In addition to contributing towards performance, the attention mechanism creates weight values, which when visualized, provide insights into the decision-making process of the neural network, and consequently produce explainable outputs. Att-MED's extracted attention weights highlight the contribution of daily and weekly periodic input towards speed prediction.
Backscatter communication networks have attracted much attention due to their small size and low power waste, but their spectrum resources are very limited and are often affected by link bursts. Channel prediction is ...
详细信息
Backscatter communication networks have attracted much attention due to their small size and low power waste, but their spectrum resources are very limited and are often affected by link bursts. Channel prediction is a method to effectively utilize the spectrum resources and improve communication quality. Most channel prediction methods have failed to consider both spatial and frequency diversity. Meanwhile, there are still deficiencies in the existing channel detection methods in terms of overhead and hardware dependency. For the above reasons, we design a sequence-to-sequence channel prediction scheme. Our scheme is designed with three modules. The channel prediction module uses an encoder-decoder based deep learning model (EDChannel) to predict the sequence of channel indicator measurements. The channel detection module decides whether to perform a channel detection by a trigger that reflects the prediction effect. The channel selection module performs channel selection based on the channel coefficients of the prediction results. We use a commercial reader to collect data in a real environment, and build an EDChannel model based on the deep learning module of Tensorflow and Keras. As a result, we have implemented the channel prediction module and completed the overall channel selection process. The experimental results show that the EDChannel algorithm has higher prediction accuracy than the previous state-of-the-art methods. The overall throughput of our scheme is improved by approximately 2.9% and 14.1% over Zhao's scheme in both stable and unstable environments.
Human action recognition is important for many applications such as surveillance monitoring, safety, and health-care. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we pr...
详细信息
ISBN:
(数字)9781510661653
ISBN:
(纸本)9781510661646;9781510661653
Human action recognition is important for many applications such as surveillance monitoring, safety, and health-care. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we propose a 3D skeleton-based human action method. Different from the existing skeleton-based methods that use only geometric features for action recognition, we propose a physics-augmented encoder and decoder model that produces physically plausible geometric features for human action recognition. Specifically, given the input skeleton sequence, the encoder performs a spatiotemporal graph convolution to produce spatiotemporal features for both predicting human actions and estimating the generalized positions and forces of body joints. The decoder, implemented as an ODE solver, takes the joint forces and solves the Euler-Lagrangian equation to reconstruct the skeletons in the next frame. By training the model to simultaneously minimize the action classification and the 3D skeleton reconstruction errors, the encoder is ensured to produce features that are consistent with both body skeletons and the underlying body dynamics as well as being discriminative. The physics-augmented spatiotemporal features are used for human action classification. We evaluate the proposed method on NTU-RGB+D, a large-scale dataset for skeleton-based action recognition. Compared with existing methods, our method achieves higher accuracy and better generalization ability.
Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the ...
详细信息
Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5 km x 4.5 km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.9143 (out of 1.0) in the presence of field noise at 10 dB signal-to-noise ratio.
Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visua...
详细信息
Automatic caption generation from images has become an active research topic in the field of Computer Vision (CV) and Natural Language Processing (NLP). Machine generated image caption plays a vital role for the visually impaired people by converting the caption to speech to have a better understanding of their surrounding. Though significant amount of research has been conducted for automatic caption generation in other languages, far too little effort has been devoted to Bangla image caption generation. In this paper, we propose an encoder-decoder based model which takes an image as input and generates the corresponding Bangla caption as output. The encoder network consists of a pretrained image feature extractor called ResNet-50, while the decoder network consists of Bidirectional LSTMs for caption generation. The model has been trained and evaluated using a Bangla image captioning dataset named BanglaLekhaImageCaptions. The proposed model achieved a training accuracy of 91% and BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.81, 0.67, 0.57, and 0.51 respectively. Moreover, a comparative study for different pretrained feature extractors such as VGG-16 and Xception is presented. Finally, the proposed model has been deployed on an embedded device for analysing the inference time and power consumption.
Three model configurations are presented for multi-step time series predictions of the heat absorbed by thewater and steam in a thermal power plant. The models predict over horizons of 2, 4, and 6 steps into thefuture...
详细信息
Three model configurations are presented for multi-step time series predictions of the heat absorbed by thewater and steam in a thermal power plant. The models predict over horizons of 2, 4, and 6 steps into thefuture, where each step is a 5-minute increment. The evaluated models are a pure machine learning model, anovel hybrid machine learning and physics-based model, and the hybrid model with an incomplete dataset. Thehybrid model deconstructs the machine learning into individual boiler heat absorption units: economizer, waterwall, superheater, and reheater. Each configuration uses a gated recurrent unit (GRU) or a GRU-based encoder–decoder as the deep learning architecture. Mean squared error is used to evaluate the models compared totarget values. The encoder–decoder architecture is over 11% more accurate than the GRU only models. Thehybrid model with the incomplete dataset highlights the importance of the manipulated variables to the *** hybrid model, compared to the pure machine learning model, is over 10% more accurate on averageover 20 iterations of each model. Automatic differentiation is applied to the hybrid model to perform a localsensitivity analysis to identify the most impactful of the 72 manipulated variables on the heat absorbed in theboiler. The models and sensitivity analyses are used in a discussion about optimizing the thermal power plant.
This study presents a novel end-to-end trainable network named IDM-Net (Inverse Design Network for Magnetic Fields) that facilitates multi-task supported inverse design of magnetic fields. Employing the encoder-Decode...
详细信息
暂无评论