Object detection in motion pictures is always a challenging task due to the presence of dynamic background. Deep learning architectures especially encoder-decoder type has shown promising performance in segmenting for...
详细信息
ISBN:
(纸本)9781728170978
Object detection in motion pictures is always a challenging task due to the presence of dynamic background. Deep learning architectures especially encoder-decoder type has shown promising performance in segmenting foreground objects against the background in video sequences. Thus, in this work, a VGG-16 based encoder-decoder architecture is investigated and several modifications are proposed to improve the efficiency the model. The modified models are evaluated on two different standard databases- CDNet 2014 and SBI2015 with various scenes and achieved the highest precision of 0.99 which is competitive in nature with the current schemes in the state-of-the-art.
A challenging task in medical dermoscopic image segmentation is detecting the location of the *** detection of a polyp is even more difficult owing to its low contrast with the surrounding *** paper proposed an end-to...
详细信息
A challenging task in medical dermoscopic image segmentation is detecting the location of the *** detection of a polyp is even more difficult owing to its low contrast with the surrounding *** paper proposed an end-to-end network based on the convolutional neural network,which does not require any prior processing to detect and segment the polyps in the medical *** proposed convolutional neural network is based on the encoder-decoder architecture,which advantages the benefits of skip *** connections can play an effective role in the encoder-decoder-based networks but have not received enough attention in network *** advantage of the proposed model is that it has a unique skip connection design that leads to better accuracy in the ***,the proposed model is tested on the CVC-ClinicDB dataset,and the experimental results illustrate our model outperforms other state-of-the-art approaches.
Cracks are one of the most common categories of pavement distress that may potentially threaten road and highway safety. Thus, a reliable and efficient pixel-level method of crack detection is necessary for real-time ...
详细信息
Cracks are one of the most common categories of pavement distress that may potentially threaten road and highway safety. Thus, a reliable and efficient pixel-level method of crack detection is necessary for real-time measurement of the crack. However, many existing encoder-decoder architectures for crack detection are time-consuming because the part of decoder module always has lots of convolutional layers and feature channels that lead to performance that highly relies on computing resources, which is a handicap in scenarios with limited resources. In this study, we propose a simple and effective method to boost the algorithmic efficiency based on encoder-decoder architecture for crack detection. We develop a switch module, called SWM, to predict whether the image is positive or negative and then skip the decoder module to save computation time when it is negative. This method uses the encoder module as the fixed feature extractor and only needs to place a light-weight classifier head on the end of the encoder module to output the final class probability. We choose the classical UNet and DeepCrack as examples of the encoder-decoder architectures to show how SWM is integrated into the architectures to reduce computation complexity. Evaluations on the public CrackTree206 and AIMCrack datasets demonstrate that our method can significantly boost the efficiency of the encoder-decoder architectures in all tasks, while without affecting the performance. The SWM can also be easily embedded into other encoder-decoder architectures for further improvement. The source code is available at https://***/hanshenchen/crack-detection.
Pancreatic cancer poses significant challenges in early diagnosis, with a high mortality rate of 98%, and is responsible for 4.7% of cancer deaths. Early diagnosis, mainly done by imaging exams, is the main factor tha...
详细信息
Pancreatic cancer poses significant challenges in early diagnosis, with a high mortality rate of 98%, and is responsible for 4.7% of cancer deaths. Early diagnosis, mainly done by imaging exams, is the main factor that determines prognosis. While it is common to perform cascaded segmentation, the use of ensemble strategies is rarely explored in the literature for pancreatic mass segmentation. In cascaded methods, the performance of the pancreatic mass segmentation step is also directly affected by the previous steps. In this paper we aim to study the impact of a localization step in three levels of precision while using an ensemble method to carry out pancreatic mass segmentation. A voting ensemble method is proposed that combine three encoder-decoder based networks, namely U-Net, Feature Pyramid Network (FPN) and LinkNet. The results obtained were competitive with existing literature, achieving a Dice Score of 63.89 ± 2.88% on the smallest resolution and 60.35 ± 4.86% on the biggest resolution on the MSD dataset. The results show that an ensemble method can mitigate the impact of previous steps in a cascaded method without significant loss in performance.
Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based *** the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resol...
详细信息
Medical image segmentation has witnessed rapid advancements with the emergence of encoder-decoder based *** the encoder-decoder structure,the primary goal of the decoding phase is not only to restore feature map resolution,but also to mitigate the loss of feature information incurred during the encoding ***,this approach gives rise to a challenge:multiple up-sampling operations in the decoder segment result in the loss of feature *** address this challenge,we propose a novel network that removes the decoding structure to reduce feature information loss(CBL-Net).In particular,we introduce a Parallel Pooling Module(PPM)to counteract the feature information loss stemming from conventional and pooling operations during the encoding ***,we incorporate a Multiplexed Dilation Convolution(MDC)module to expand the network's receptive ***,although we have removed the decoding stage,we still need to recover the feature map ***,we introduced the Global Feature Recovery(GFR)*** uses attention mechanism for the image feature map resolution recovery,which can effectively reduce the loss of feature *** conduct extensive experimental evaluations on three publicly available medical image segmentation datasets:DRIVE,CHASEDB and MoNuSeg *** results show that our proposed network outperforms state-of-the-art methods in medical image *** addition,it achieves higher efficiency than the current network of coding and decoding structures by eliminating the decoding component.
Flash floods pose significant threats as immediate and highly destructive natural hazards. Extending the forecast horizon of flash flood prediction models has been a key objective to enable timely warning or other mit...
详细信息
Flash floods pose significant threats as immediate and highly destructive natural hazards. Extending the forecast horizon of flash flood prediction models has been a key objective to enable timely warning or other mitigating measures. The integration of precipitation predictions into data-driven flash flood models remains unexplored. In this study, we propose an encoder-decoder LSTM-based model architecture for short-term flash flood prediction, which incorporates short-term rainfall forecasts and evaluates the influence of the associated uncertainty on these predictions. We conducted three sets of experiments to predict flash flood occurrences within a watershed with a 30-minute response time. The first set employed a baseline LSTM model without rainfall forecast integration. The second one utilized a proposed encoder-decoder LSTM model that incorporated accurate rainfall forecasts. Lastly, the third set of experiments introduced errors into the rainfall forecasts to evaluate the impact of forecast uncertainty on flood prediction. Computational experiments demonstrate that incorporating accurate rainfall nowcasts significantly enhances flash flood predictability, with F1-score improvements ranging from 10 to 60%, depending on the hydrological year. Furthermore, even when errors in rainfall magnitude and timing were introduced, overall the proposed framework outperformed models that did not use rainfall forecasts, delivering reliable predictions for up to two hours.
Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to s...
详细信息
Due to the real-time acquisition and reasonable cost of consumer cameras, monocular depth maps have been employed in a variety of visual applications. Regarding ongoing research in depth estimation, they continue to suffer from low accuracy and enormous sensor noise. To improve the prediction of depth maps, this paper proposed a lightweight neural facial depth estimation model based on single image frames. Following a basic encoder-decoder network design, the features are extracted by initializing the encoder with a high-performance pre-trained network and reconstructing high-quality facial depth maps with a simple decoder. The model can employ pixel representations and recover full details in terms of facial features and boundaries by employing a feature fusion module. When tested and evaluated across four public facial depth datasets, the suggested network provides more reliable and state-of-the-art results, with significantly less computational complexity and a reduced number of parameters. The training procedure is primarily based on the use of synthetic human facial images, which provide a consistent ground truth depth map, and the employment of an appropriate loss function leads to higher performance. Numerous experiments have been performed to validate and demonstrate the usefulness of the proposed approach. Finally, the model performs better than existing comparative facial depth networks in terms of generalization ability and robustness across different test datasets, setting a new baseline method for facial depth maps.
Achieving high-precision automatic classification in real-world applications for airborne laser scanning point clouds is a challenging task duo to their unstructured nature, uneven density distribution, high redundanc...
详细信息
Achieving high-precision automatic classification in real-world applications for airborne laser scanning point clouds is a challenging task duo to their unstructured nature, uneven density distribution, high redundancy, incompleteness and scene complexity. Graph Convolutional Neural Networks can process scattered point clouds directly without regularization, which avoids the loss of depth information?and has recently become a topic of increased interest. Therefore, this study proposed an extension of the Graph-Unet network architecture named DGCN-ED for airborne LiDAR point classification, which uses a Graph Convolutional Neural Network as a representation to describe complex object relationships and an encoder-decoder architecture to capture the multi-scaled point features and describe objects in the high-level feature space. The two-layer dynamic update Graph Convolutional Neural Network is designed to expand the effective range of nodes and enhance the representation ability of learned pointwise features. The effectiveness of the proposed method is evaluated by an experiment on the ISPRS Vaihingen 3D semantic labelling benchmark dataset. Moreover, experiments on the IEEE 2019 Data Fusion Contest Dataset were conducted to demonstrate the generalization abilities of the proposed method. The results show that our method achieved on average 2.8% higher overall accuracy than existing methods, with an overall accuracy of 98.0% and an average F1 score of 0.797.
Although many learning-based studies have been conducted to detect cracks, there are still many problems in practice, such as slow inference speed due to a large number of hyperparameters required in network archiotec...
详细信息
Although many learning-based studies have been conducted to detect cracks, there are still many problems in practice, such as slow inference speed due to a large number of hyperparameters required in network archiotectures and compromised detection accuracy in different environments. To address these issues, the current study employed a Hybrid Lightweight encoder-decoder Network (HLEDNet) as an ad-hoc crack segmentation and measurement system on real-world images captured from various concrete bridges. The proposed HLEDNet model was trained and tested with 3000 annotated images with further extensive data augmentation, which achieved 86.92%, 85.71%, 86.31, and 86.01% in precision, recall, F1 score, and mean intersection over union (mIoU), respectively. A crack measurement module was proposed using combined postprocessing techniques, where the R-squared values of the regression lines in crack length and average crack width are 0.9857 and 0.9925, respectively. Finally, an experimental study was undertaken to convert the crack measuring unit from pixel to millimetre.
Pedestrian trajectories are crucial for self-driving cars to plan their paths effectively. The sensors implanted in these self-driving vehicles, despite being state-of-the-art ones, often face inaccuracies in the perc...
详细信息
Pedestrian trajectories are crucial for self-driving cars to plan their paths effectively. The sensors implanted in these self-driving vehicles, despite being state-of-the-art ones, often face inaccuracies in the perception of surrounding environments due to technical challenges in adverse weather conditions, interference from other vehicles' sensors and electronic devices, and signal reception failure, leading to incompleteness in the trajectory data. But for real-time decision making for autonomous driving, trajectory imputation is no less crucial. Previous attempts to address this issue, such as statistical inference and machine learning approaches, have shown promise. Yet, the landscape of deep learning is rapidly evolving, with new and more robust models emerging. In this research, we have proposed an encoder-decoder architecture, the Human Trajectory Imputation Model, coined HTIM, to tackle these challenges. This architecture aims to fill in the missing parts of pedestrian trajectories. The model is evaluated using the Intersection drone the inD dataset, containing trajectory data at suitable altitudes, preserving naturalistic pedestrian behavior with varied dataset sizes. To assess the effectiveness of our model, we utilize L1, MSE, and quantile and ADE loss. Our experiments demonstrate that HTIM outperforms the majority of the state-of-the-art methods in this field, thus indicating its superior performance in imputing pedestrian trajectories.
暂无评论