Future pedestrian trajectory prediction offers great prospects for many practical applications such as unmanned vehicles, building evacuation design and robotic path planning. Most existing methods focus on social int...
详细信息
Future pedestrian trajectory prediction offers great prospects for many practical applications such as unmanned vehicles, building evacuation design and robotic path planning. Most existing methods focus on social interaction among pedestrians but ignore the fact that heterogeneous traffic objects (cars, dogs, bicycles, motorcycles, etc.) have significant influence on the future trajectory of a subject pedestrian. Also, the walking direction intention of a pedestrian may be referred by his/her facial keypoints. Considering this, this work proposes to predict a pedestrian's future trajectory by jointly using neighboring heterogeneous traffic information and his/her facial keypoints. To fulfill this, an end-to-end facial keypoints-based convolutional encoder-decoder network (FK-CEN) is designed, in which the heterogeneous traffic and facial keypoints are input. After training, FK-CEN is evaluated on 5 crowded video sequences collected from the public datasets MOT-16 and MOT-17. Experimental results demonstrate that it outperforms state-of-the-art approaches, in terms of prediction errors.
Synthetic aperture radar (SAR) image ship detection has important applications in marine surveillance. There are two limitations when applying advanced detection methods naively for SAR ship detection. First, most det...
详细信息
Synthetic aperture radar (SAR) image ship detection has important applications in marine surveillance. There are two limitations when applying advanced detection methods naively for SAR ship detection. First, most detectors construct the model as an encoder and rely on the feature pyramid network (FPN) head for accurate prediction, which may lead to high computational costs. Second, the background noises in the ground truth (annotated as rectangular bounding boxes) of angular ships bring difficulties for model training. To meet these challenges, we propose an efficient encoder-decoder network with estimated direction for ship detection in SAR images. First, we present an anchor-free encoder-decoder model that can efficiently extract multiple-level features. Second, we formulate ship detection as a multitask learning problem, including a bounding box prediction and a ship direction regression. The estimated ship direction can weakly supervise and benefit ship detection. Furthermore, we develop a center-weighted labeling method for overlapped annotations. Comprehensive experiments on SAR-Ship-Detection and SSDD datasets show that our method achieves state-of-the-art performance with a high running speed.
Low dose computed tomography (CT) is a mainstream for clinical applications. However, compared to normal dose CT, in the low dose CT (LDCT) images, there are stronger noise and more artifacts which are obstacles for p...
详细信息
ISBN:
(纸本)9783030875886;9783030875893
Low dose computed tomography (CT) is a mainstream for clinical applications. However, compared to normal dose CT, in the low dose CT (LDCT) images, there are stronger noise and more artifacts which are obstacles for practical applications. In the last few years, convolution-based end-to-end deep learning methods have been widely used for LDCT image denoising. Recently, transformer has shown superior performance over convolution with more feature interactions. Yet its applications in LDCT denoising have not been fully cultivated. Here, we propose a convolution-free T2T vision transformer-based encoderdecoder Dilation Network (TED-Net) to enrich the family of LDCT denoising algorithms. The model is free of convolution blocks and consists of a symmetric encoder-decoder block with sole transformer. Our model (Codes are available at https://***/wdayang/TED- Net) is evaluated on the AAPM-Mayo clinic LDCT Grand Challenge dataset, and results show outperformance over the state-of-the-art denoising methods.
This paper presents a deep learning approach for a versatile Microclimate prediction framework (DeepMC). Micro climate predictions are of critical importance across various applications, such as Agriculture, Forestry,...
详细信息
ISBN:
(纸本)9781450383325
This paper presents a deep learning approach for a versatile Microclimate prediction framework (DeepMC). Micro climate predictions are of critical importance across various applications, such as Agriculture, Forestry, Energy, Search & Rescue, etc. To the best of our knowledge, there is no other single framework which can accurately predict various micro-climate entities using Internet of Things (IoT) data. We present a generic framework (DeepMC) which predicts various climatic parameters such as soil moisture, humidity, wind speed, radiation, temperature based on the requirement over a period of 12 hours - 120 hours with a varying resolution of 1 hour - 6 hours, respectively. This framework proposes the following new ideas: 1) Localization of weather forecast to IoT sensors by fusing weather station forecasts with the decomposition of IoT data at multiple scales and 2) A multi-scale encoder and two levels of attention mechanisms which learns a latent representation of the interaction between various resolutions of the IoT sensor data and weather station forecasts. We present multiple real-world agricultural and energy scenarios, and report results with uncertainty estimates from the live deployment of DeepMC, which demonstrate that DeepMC outperforms various baseline methods and reports 90%+ accuracy with tight error bounds.
Pulmonary nodule detection in low-dose computed tomography (CT) images is essential for early screening and treatment of lung cancer. Previous related researches based on deep convolutional neural networks generally r...
详细信息
ISBN:
(纸本)9780738133669
Pulmonary nodule detection in low-dose computed tomography (CT) images is essential for early screening and treatment of lung cancer. Previous related researches based on deep convolutional neural networks generally rely on 2D or 2.5D components and only focus on the output feature information under a single receptive field. Considering the 3D nature of lung CT images and the performance limitation of state-of-the-art nodule detection methods, we develop a novel 3D multi-branch region proposal network with an encoder-decoder structure. Specifically, each parallel branch is designed with 3D residual blocks and U-Net-like structure to effectively extract multi-scale fusion features based on 3D spatial information of CT scans, and the strategies of varying receptive fields and sharing weight parameters are used to improve the sensitivity of the detection network to nodules with scale variation and maintain the original parameters. Besides, we propose a multi-scale attentional feature fusion module to better fuse high-resolution and semantically strong features and adaptively learn the inter-dependency information of different feature maps. Finally, we compare a dynamically scaled cross entropy loss and online hard example mining (OHEM) to combat the imbalance of positive and negative samples during training, which is aimed at assisting with network optimization. Our extensive experiments on publicly available CT scans obtained from LUNA16 and TianChi(1) competition dataset demonstrate that our method outperform state-of-the-art pulmonary nodule detection models.
Constrained image splicing detection and localization (CISDL) is a newly formulated image forensics task and plays an important role in verifying the generating process of a forged image. CISDL conducts dense matching...
详细信息
Constrained image splicing detection and localization (CISDL) is a newly formulated image forensics task and plays an important role in verifying the generating process of a forged image. CISDL conducts dense matching between two investigated images and detects whether one image has forged regions pasted from the other. In this work, we introduce a novel attention-aware encoder-decoder deep matching network named as AttentionDM for CISDL. An encoder-decoder with atrous convolution is newly designed for hierarchical features dense matching and fine-grained masks generation. A novel attention-aware correlation computation module is built on normalization operations and informative features recalibration with channel attention blocks. Last but not least, VGG and ResNets are respectively formulated as feature extractors for comprehensive comparisons in CISDL. Extensive experiments demonstrate the superior performance of AttentionDM over the state-of-the-art methods.
Pedestrian behavior modeling is a challenging problem especially in crowded transportation scenarios. Some recent studies have addressed this problem using deep neural network, but the accuracy of trajectory predictio...
详细信息
Pedestrian behavior modeling is a challenging problem especially in crowded transportation scenarios. Some recent studies have addressed this problem using deep neural network, but the accuracy of trajectory prediction is still not high because the internal structure of the typical deep neural network with long short-term memory (LSTM) is a one-dimensional vector, which destroys the spatial information around a pedestrian. Therefore, these models cannot fully learn spatial sensing behavior of pedestrians. To solve this, we recommend using multi-channel tensors to represent the environmental information of pedestrians. Meanwhile, the spatiotemporal interactions among the pedestrians are represented by convolution operations of these tensors. Then, an end-to-end fully convolutional LSTM encoder-decoder is designed, trained and tested. Finally, our approach is compared with existing LSTM-based methods using five crowded video sequences with public datasets. The results show that our method reduces the displacement offset error and provides more realistic trajectory prediction in manifold cases. (c) 2020 Published by Elsevier B.V.
Thermal cameras can capture images even in low light conditions. However, humans cannot recognize human faces in thermal images. Translation of thermal images to visible domain is one solution to the problem of face r...
详细信息
Thermal cameras can capture images even in low light conditions. However, humans cannot recognize human faces in thermal images. Translation of thermal images to visible domain is one solution to the problem of face recognition in thermal images. Most of the research works have proposed Generative Adversarial Networks (GANs) based solutions for thermal to visible image translation. However, GAN is a heavy network that consumes huge amount of resource for thermal to visible image translation. In this paper, we propose an encoder-decoder architecture for thermal to visible image translation of human faces. Since our proposed architecture is not based on GANs, it is lightweight. The proposed method works well for both disguised and non-disguised thermal facial images. Standard comparison parameters such as Peak Signal-to-noise Ratio (PSNR), Structural Similarity Index (SSIM), and Multiscale Structural Similarity Index (MS-SSIM) are used to evaluate the quality of the generated visible images with respect to the ground truth. It has been found that our proposed architecture outperforms the current state-of-the-art image translator architectures namely pix2pix, Cycle-GAN, modified thermal to visible GAN and Dual GAN by a considerable margin for both disguised as well as non-disguised dataset. (c) 2022 Elsevier B.V. All rights reserved.
Daily peak load forecasting is a challenging problem in the filed of electric power load forecasting. Since the nonlinear and dynamic of influence factors and their sequential dependencies are significant for modeling...
详细信息
Daily peak load forecasting is a challenging problem in the filed of electric power load forecasting. Since the nonlinear and dynamic of influence factors and their sequential dependencies are significant for modeling daily peak load, a prediction model based on long short-term memory (LSTM) enhanced by dual-attention-based encoder-decoder is presented. Functioned as the specific encoder and decoder, LSTM is utilized to participate in the nonlinear dynamic temporal modeling. The encoder-decoder is used for information utilization of both the influence factors and daily peak load. Moreover, a dual-attention mechanism, which is inserted into the encoderdecoder, is designed to take into account the effects of different influence factors and time nodes on the daily peak load simultaneously. It is benefit for the above mechanism design to analyze the characteristics of daily peak load precisely and to achieve more accurate prediction results. Comprehensive experiments are performed based on a real set of one provincial capital city in eastern China. The case study shows that the proposed methodology provides the most accurate results with an average MAPE 2.07%, an average RMSE 133 MW and an average MAE 326.6 MW.
Cancer detection in its early stages may allow patients to receive the proper treatment and save lives along with recovering the routine lifestyles. Breast cancer is of the top leading causes of mortality among women ...
详细信息
Cancer detection in its early stages may allow patients to receive the proper treatment and save lives along with recovering the routine lifestyles. Breast cancer is of the top leading causes of mortality among women all around the globe. A source to find these cancerous nuclei is through analyzing histopathology images. These images, however, are very complex and large. Thus, locating the cancerous nuclei in them is very challenging. Hence, if an expert fails to diagnose their patients via these images, the situation may be exacerbated. Therefore, this study aims to introduce a method to mask as many cancer nuclei on histopathology images as possible with a high visual aesthetic to make them distinguishable by experts easily. A tailored residual fully convolutional encoderdecoder neural network based on end-to-end learning is proposed to issue the matter. The proposed method is evaluated quantitatively and qualitatively on ER + BCa H&E-stained dataset. The average detection accuracy achieved by the method is 98.61%, which is much better than that of competitors.
暂无评论