Human action recognition is important for many applications such as surveillance monitoring, safety, and health-care. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we pr...
详细信息
ISBN:
(数字)9781510661653
ISBN:
(纸本)9781510661646;9781510661653
Human action recognition is important for many applications such as surveillance monitoring, safety, and health-care. As 3D body skeletons can accurately characterize body actions and are robust to camera views, we propose a 3D skeleton-based human action method. Different from the existing skeleton-based methods that use only geometric features for action recognition, we propose a physics-augmented encoder and decoder model that produces physically plausible geometric features for human action recognition. Specifically, given the input skeleton sequence, the encoder performs a spatiotemporal graph convolution to produce spatiotemporal features for both predicting human actions and estimating the generalized positions and forces of body joints. The decoder, implemented as an ODE solver, takes the joint forces and solves the Euler-Lagrangian equation to reconstruct the skeletons in the next frame. By training the model to simultaneously minimize the action classification and the 3D skeleton reconstruction errors, the encoder is ensured to produce features that are consistent with both body skeletons and the underlying body dynamics as well as being discriminative. The physics-augmented spatiotemporal features are used for human action classification. We evaluate the proposed method on NTU-RGB+D, a large-scale dataset for skeleton-based action recognition. Compared with existing methods, our method achieves higher accuracy and better generalization ability.
We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder...
详细信息
We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.
Automatic segmentation of skin lesions is an important step in computer-aided diagnosis systems for melanoma detection. Although numerous methods have been proposed in the literature, this task is still a challenging ...
详细信息
Automatic segmentation of skin lesions is an important step in computer-aided diagnosis systems for melanoma detection. Although numerous methods have been proposed in the literature, this task is still a challenging issue due to the similarity between different lesions and complex visual characteristics that may be presented in the images. In this paper, we propose major modifications to the state-of-the-art U-Net structure to further improve its capability in skin lesion segmentation. These modifications are presented in both the encoding and the decoding paths. Instead of using only standard convolutional layers like U-Net, the proposed encoding path consists of 10 standard convolutional layers, which are inspired from the Visual Geometry Group (VGG16) network, followed by a pyramid pooling module and a dilated convolutional block. This combination enables to learn better representative feature maps and preserve more spatial resolution. Furthermore, dilated residual blocks are introduced in the decoding path to further refine the segmentation maps. The experimental results on three datasets including the IEEE International Symposium on Biomedical Imaging (ISBI) 2017, ISBI 2016, and PH2 showed that our proposed method has better performance than the basic U-Net, FCN, SegNet, and U-Net + + , and achieved the performance of state-of-the-art segmentation techniques, with minimum pre- and post-processing operations.
Precipitation nowcasting is quite important and fundamental. It underlies various public services ranging from rainstorm warnings to flight safety. In order to further improve the prediction accuracy for the spatiotem...
详细信息
Precipitation nowcasting is quite important and fundamental. It underlies various public services ranging from rainstorm warnings to flight safety. In order to further improve the prediction accuracy for the spatiotemporal sequence forecasting problem, we propose an encoder-decoder deep residual attention prediction network, which adaptively rescales the multiscale sequence- and spatial-wise features and achieves very deep trainable residual prediction by integrating global residual learning and local deep residual sequence and spatial attention blocks (RSSABs). Experiments in a real-world radar echo map dataset of South China show that compared with the ingenious PredRNN++, TrajGRU methods, and newly proposed Unet-based methods, our ED-DRAP network performs better on the precipitation nowcasting metrics, as well as occupies small GPU memory.
Silicon content is a significant index in the process of blast furnace ironmaking. It is used to measure the quality of molten iron *** only meets the requirements if it is too high or too low. In the production proce...
详细信息
ISBN:
(纸本)9798350321050
Silicon content is a significant index in the process of blast furnace ironmaking. It is used to measure the quality of molten iron *** only meets the requirements if it is too high or too low. In the production process,the silicon content in molten iron needs to be controlled within a stable *** the same time,due to the time lag, nonlinear and dynamic characteristics of blast furnace itself, it is difficult to predict the silicon content accurately. This paper proposes a multi-head self-attention-based gate recurrent unit encoder-decoder framework that can better extract global dynamic features and local features, improve prediction accuracy and pass the experimental verification.
Accurate solar irradiance prediction is crucial for harnessing solar energy resources. However, the pattern of irradiance sequence is intricate due to its nonlinear and non-stationary characteristics. In this paper, a...
详细信息
Accurate solar irradiance prediction is crucial for harnessing solar energy resources. However, the pattern of irradiance sequence is intricate due to its nonlinear and non-stationary characteristics. In this paper, a deep hybrid model based on encoder-decoder is proposed to cope with the complex pattern for hourly irradiance forecasting. The hybrid deep model integrates complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), encoder-decoder module, and dynamic error compensation (DEC) architecture. The CEEMDAN is implemented to reduce the nonlinear and non-stationarity of the irradiance sequence. The encoder-decoder integrates temporal convolutional networks (TCN), long short-term memory networks (LSTM), and multi-layer perceptron (MLP) for temporal features extraction and multi-step prediction. The DEC architecture dynamically updates the model based on adjacent error information to mine the predictable components of error information. Furthermore, a new loss function is further proposed for multi-objective optimization to balance the performance of multi-step forecasting. In the hourly irradiance forecasting experiments on the three public datasets, the root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) of the proposed model are observed to be in a range of 30.693-34.433 W/m2, 19.398-22.900 W/m2, and 0.9872-0.9902, respectively. Compared to the benchmark models (including MLP, LSTM, and TCN), the RMSE and MAE reduce by 10.76%-22.00% and 5.47%-20.40%, respectively. The experimental results indicate that the proposed model shows accurate and robust forecasting performance and is a reliable alternative to hourly irradiance forecasting.
Backscatter communication networks have attracted much attention due to their small size and low power waste, but their spectrum resources are very limited and are often affected by link bursts. Channel prediction is ...
详细信息
Backscatter communication networks have attracted much attention due to their small size and low power waste, but their spectrum resources are very limited and are often affected by link bursts. Channel prediction is a method to effectively utilize the spectrum resources and improve communication quality. Most channel prediction methods have failed to consider both spatial and frequency diversity. Meanwhile, there are still deficiencies in the existing channel detection methods in terms of overhead and hardware dependency. For the above reasons, we design a sequence-to-sequence channel prediction scheme. Our scheme is designed with three modules. The channel prediction module uses an encoder-decoder based deep learning model (EDChannel) to predict the sequence of channel indicator measurements. The channel detection module decides whether to perform a channel detection by a trigger that reflects the prediction effect. The channel selection module performs channel selection based on the channel coefficients of the prediction results. We use a commercial reader to collect data in a real environment, and build an EDChannel model based on the deep learning module of Tensorflow and Keras. As a result, we have implemented the channel prediction module and completed the overall channel selection process. The experimental results show that the EDChannel algorithm has higher prediction accuracy than the previous state-of-the-art methods. The overall throughput of our scheme is improved by approximately 2.9% and 14.1% over Zhao's scheme in both stable and unstable environments.
Automatic and accurate segmentation of the optic disc (OD) region has practical applications in the medical field. In this study, a novel encoder-decoder network is proposed to segment the ODs automatically and accura...
详细信息
ISBN:
(纸本)9781665453837
Automatic and accurate segmentation of the optic disc (OD) region has practical applications in the medical field. In this study, a novel encoder-decoder network is proposed to segment the ODs automatically and accurately. The encoder consists of three parts: (1) A low-level feature extraction module composed of dense connectivity block (Dense Block) which can output rich low-level features;(2) A High-resolution Block (HR Block) which can extract sufficient semantic information while reducing parameters;(3) An Atrous Spatial Pyramid Pooling (ASPP) module is used to obtain high-level features. Therefore, the network is named DHA-Net. The proposed decoder takes advantage of the multi-scale features from the encoder to predict OD regions. Compared with the existing methods on three datasets, it is proved that the proposed method is better than the current excellent methods in the segmentation results of normal and abnormal fundus. The ablation studies proved the influence of each module on the segmentation performance, and explained the network structure reasonably. In the case of fewer network parameters, DHA-Net achieves better prediction performance on intersection over union (IoU), dice similarity coefficient (DSC) and other evaluation metrics. DHA-Net is lightweight and can use multi-scale features to predict OD regions.
Convolutional neural networks consist of state-of-the-art models used for the solution of computer vision problems. This paper contributes by evaluating the efficiency of several encoder-decoder neural networks, train...
详细信息
ISBN:
(纸本)9798350315387
Convolutional neural networks consist of state-of-the-art models used for the solution of computer vision problems. This paper contributes by evaluating the efficiency of several encoder-decoder neural networks, trained to perform the segmentation of the soccer field in Humanoid KidSize Robot Soccer competitions. To compare the efficiency of several encoders, a total of fourteen neural network models, based on the U-Net and SegNet architectures, were tested and compared in terms of accuracy, cost function value, IoU, and average inference time. Based on that, the networks based on U-Net that utilized the MobileNetv3Small or the ResNet18 for the encoding process were found to be the optimal solution among the considered alternatives to segment the soccer field.
Printed mathematical expression recognition is to transform printed mathematical formula image into LaTeX sequence. Recently, many methods based on deep learning have been proposed to solve this task. However, the pos...
详细信息
ISBN:
(纸本)9783031416750;9783031416767
Printed mathematical expression recognition is to transform printed mathematical formula image into LaTeX sequence. Recently, many methods based on deep learning have been proposed to solve this task. However, the positional relationship between mathematical symbols is often ignored or represented insufficient, leading to the loss of structural features of mathematical formulas. To overcome this challenge, we propose a position-aware encoder-decoder model for printed mathematical expression recognition. We design a two-dimensional position encoding algorithm based on sin/cos function to capture positional relationship between mathematical symbols. Meanwhile, we adopt a more advanced image feature extraction network. In decoder component, we use Bi-GRU as the translator, and add attention mechanism to make decoder focus on the important local information. We conduct experiments on the public dataset IM2LaTeX-100K, and the results show that our proposed approach is more excellent than the majority of advanced methods.
暂无评论