The robustness of object detection models is a major concern when applied to real-world scenarios. The performance of most models tends to degrade when confronted with images affected by corruptions, since they are us...
详细信息
The robustness of object detection models is a major concern when applied to real-world scenarios. The performance of most models tends to degrade when confronted with images affected by corruptions, since they are usually trained and evaluated on clean datasets. While numerous studies have explored the robustness of object detection models on natural images, there is a paucity of research focused on models applied to aerial images, which feature complex backgrounds, substantial variations in scales, and orientations of objects. This article addresses the challenge of assessing the robustness of object detection models on aerial images, with a specific emphasis on scenarios where images are affected by clouds. In this study, we introduce two novel benchmarks based on DOTA-v1.0. The first benchmark encompasses 19 prevalent corruptions, while the second focuses on the cloud-corrupted condition-a phenomenon uncommon in natural images yet frequent in aerial photography. We systematically evaluate the robustness of mainstream object detection models and perform necessary ablation experiments. Through our investigations, we find that rotation-invariant modeling and enhanced backbone architectures can improve the robustness of models. Furthermore, increasing the capacity of Transformer-based backbones can strengthen their robustness. The benchmarks we propose and our comprehensive experimental analyses can facilitate research on robust objectdetection on aerial images.
Reducing the defects in the additively manufactured components using Laser-Directed Energy Deposition (L-DED) process is important for ensuring structural integrity, surface quality, and functional performance. The fi...
详细信息
Reducing the defects in the additively manufactured components using Laser-Directed Energy Deposition (L-DED) process is important for ensuring structural integrity, surface quality, and functional performance. The first required step for reducing defects in the L-DED manufactured components is the identification and understanding of the type of defects using the objectdetection approach. This paper aims to use a YOLO-based object detection models to classify and detect defects in the horizontal wall, vertical wall, and cuboid structures manufactured using various combinations of L-DED process parameters. The objectives involved are training, testing and validating of YOLOv7, YOLOv8, YOLOv9, and YOLOv9-GELAN models on the independent dataset of defects such as flash formation, void and rough texture, identifying the best YOLO model capable of detecting small and big size multiple defects within a single image and comparing the defects captured by YOLO model with previously used conventional CNN model such as VGG16. The results revealed that YOLOv9-GELAN exhibited good performance indicators compared to other YOLO models. The increasing trend for mAP0.5:0.95 signifies YOLOv9-GELAN as a good choice for defect detection of multiple defects in a single image. It also gave mAP of 95.7%, precision of 94%, recall of 96%, and F1-score of 90%, indicating accuracy in defect localisation and classification with minimal false positives and negatives. These high values for YOLOv9-GELAN indicate its capability to accurately highlight the defects using the bounding box compared to the previously proposed VGG16 model. In addition, YOLOv9-GELAN capability of processing 62 images per second showed its potential for higher frames processing compared to other YOLO models. This research will progress the development of AI-based in-situ defect monitoring for the L-DED process.
Modern applications, such as autonomous vehicles, require deploying deep learning algorithms on resource-constrained edge devices for real-time image and video processing. However, there is limited understanding of th...
详细信息
ISBN:
(纸本)9789819608041;9789819608058
Modern applications, such as autonomous vehicles, require deploying deep learning algorithms on resource-constrained edge devices for real-time image and video processing. However, there is limited understanding of the efficiency and performance of various object detection models on these devices. In this paper, we evaluate the performance of several state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet), on popular edge devices such as the Raspberry Pi 3, 4, and 5 (with and without TPU accelerators), as well as the Jetson Orin Nano. We collect key performance metrics, including energy consumption, inference time, and Mean Average Precision (mAP). Our findings highlight models with lower mAP such as SSD MobileNet V1 are more energy-efficient and faster in inference, whereas higher mAP models like YOLOv8 Medium generally consume more energy and have slower inference, though with exceptions when accelerators like TPUs are used. Among the edge devices, Jetson Orin Nano stands out as the fastest and most energy-efficient option for request handling, despite having the highest idle energy consumption.
The objective of this study was to develop an interpretable system that could detect specific lung features in neonates. A challenging aspect of this work was that normal lungs showed the same visual features (as that...
详细信息
The objective of this study was to develop an interpretable system that could detect specific lung features in neonates. A challenging aspect of this work was that normal lungs showed the same visual features (as that of Pneumothorax (PTX)). M-mode is typically necessary to differentiate between the two cases, but its generation in clinics is time-consuming and requires expertise for interpretation, which remains limited. Therefore, our system automates M-mode generation by extracting Regions of Interest (ROIs) without human in the loop. object detection models such as faster Region Based Convolutional Neural Network (fRCNN) and RetinaNet models were employed to detect seven common Lung Ultrasound (LUS) features. fRCNN predictions were then stored and further used to generate M-modes. Beyond static feature extraction, we used a Hough transform based statistical method to detect "lung sliding" in these M-modes. Results showed that fRCNN achieved a greater mean Average Precision (mAP) of 86.57% (Intersection-over-Union (IoU) = 0.2) than RetinaNet, which only displayed a mAP of 61.15%. The calculated accuracy for the generated RoIs was 97.59% for Normal videos and 96.37% for PTX videos. Using this system, we successfully classified 5 PTX and 6 Normal video cases with 100% accuracy. Automating the process of detecting seven prominent LUS features addresses the time-consuming manual evaluation of Lung ultrasound in a fast paced environment. Clinical impact: Our research work provides a significant clinical impact as it provides a more accurate and efficient method for diagnosing lung diseases in neonates.
In numerous nations, the imperative role of traffic monitoring systems is essential for overseeing and controlling vehicular and pedestrian traffic. In recent years, various techniques have been presented for automate...
详细信息
ISBN:
(纸本)9798350384901;9798350384895
In numerous nations, the imperative role of traffic monitoring systems is essential for overseeing and controlling vehicular and pedestrian traffic. In recent years, various techniques have been presented for automated detection to optimize traffic. The methods presented in the literature have their own pros and cons. This paper proposes two different traffic detectionmodels using YOLOv5 and YOLOv8. In addition, this paper proposes an efficient data pre-processing algorithm to achieve better accuracy for detecting various classes of vehicles in the traffic including pedestrians. An efficient loss optimization strategy is proposed and adopted while training the model to reduce the training loss. This paper discusses the choice between two deep learning models, YOLOv5 and YOLOv8, for identifying different types of objects of interest on the road in urban areas. The efficiency of the proposed models is evaluated in this paper using multiple performance metrics, including their accuracy. The comparative analysis of the proposed models with existing models indicate that the proposed models are on par in terms of accuracy with existing strategies while integrated with the additional complexity of pedestrian detection.
Modern strides in autonomous vehicles and embedded advanced driver assistant part systems(ADAS) have forced the need of an efficient in addition to accurate system intended for clear road lane as in addition to vehicl...
详细信息
ISBN:
(纸本)9798331505264;9798331505271
Modern strides in autonomous vehicles and embedded advanced driver assistant part systems(ADAS) have forced the need of an efficient in addition to accurate system intended for clear road lane as in addition to vehicle detection. For road lane detection, this study introduces a new proposal through the combination of YOLOv8 (You Only Look Once), and YOLO for vehicle detection, creating a complete package for road understanding and navigation. To achieve accurate road lane detection, YOLOv8, which is the fastest and most accurate model to date, is used to ensure that it can keep track of these lanes in real-time with consistent accuracy whenever the camera is placed under various scenarios with insufficient lighting and occlusions as well. At the same time it uses YOLO for vehicle detection, which improves the recognition and interpretation of the presence and motion of vehicles in real time. The system uses Streamlit, which is an open-source app framework for Machine Learning and Data Science projects to provide an intuitive user experience. Finally, we build an interface to provide results of lane and vehicle detection in real-time manner to allow ease of monitoring and evaluation on the overall system. The combination of YOLOv8 and YOLO with Streamlit provides a powerful, scalable, and deployable solution for practical computer vision applications. Combining well established objectdetection algorithms with a simplified deployment platform, the proposed system tackles some of the hardest and time consuming areas of autonomous driving. The goal is to make significant contributions to both the safety and efficiency of autonomous vehicles and pave the way for further inevitable advances in ADAS.
Food waste presents a significant issue, contributing extensively to greenhouse gas emissions and climate change. When food waste decomposes in landfills, it generates methane, a greenhouse gas that is significantly d...
详细信息
ISBN:
(纸本)9798350361711;9798350361704
Food waste presents a significant issue, contributing extensively to greenhouse gas emissions and climate change. When food waste decomposes in landfills, it generates methane, a greenhouse gas that is significantly derogation effective to the atmosphere than carbon dioxide. Additionally, the production, transportation, and disposal of food waste significantly contribute to global greenhouse gas emissions. This study explores an innovative approach to mitigate the environmental impact of food waste by using food scraps to create compost for animal feed, specifically utilizing the Black Soldier Fly Larva (BSFL). Accurate control of food quantity for various larval stages is essential, necessitating precise stage classification. This process is complex due to the larvae's similar and small color variations. In this paper, we introduce a mobile application designed to classify and detect the growth stages of BSFL, ranging from stage 1 to stage 4, which are high in protein and beneficial for animal feed, to stages 5 and 6, which are ideal for preparing pupae that can be used in skincare products. Our approach employs the YOLOv8 model for larval stage classification and detection, achieving an impressive mAP50-95 of 0.812, surpassing the performance of YOLOv7 (mAP50-95 of 0.781) and YOLOv5 (mAP50-95 of 0.789).
Over the last few decades, Lung Ultrasound (LUS) has been increasingly used to diagnose and monitor different lung diseases in neonates. It is a noninvasive tool that allows a fast bedside examination while minimally ...
详细信息
ISBN:
(纸本)9781728111797
Over the last few decades, Lung Ultrasound (LUS) has been increasingly used to diagnose and monitor different lung diseases in neonates. It is a noninvasive tool that allows a fast bedside examination while minimally handling the neonate. Acquiring a LUS scan is easy, but understanding the artifacts concerned with each respiratory disease is challenging. Mixed artifact patterns found in different respiratory diseases may limit LUS readability by the operator. While machine learning (ML), especially deep learning can assist in automated analysis, simply feeding the ultrasound images to an ML model for diagnosis is not enough to earn the trust of medical professionals. The algorithm should output LUS features that are familiar to the operator instead. Therefore, in this paper we present a unique approach for extracting seven meaningful LUS features that can be easily associated with a specific pathological lung condition: Normal pleura, irregular pleura, thick pleura, A- lines, Coalescent B-lines, Separate B-lines and Consolidations. These artifacts can lead to early prediction of infants developing later respiratory distress symptoms. A single multi-class region proposal-based objectdetection model faster-RCNN (fRCNN) was trained on lower posterior lung ultrasound videos to detect these LUS features which are further linked to four common neonatal diseases. Our results show that fRCNN surpasses single stage models such as RetinaNet and can successfully detect the aforementioned LUS features with a mean average precision of 86.4%. Instead of a fully automatic diagnosis from images without any interpretability, detection of such LUS features leave the ultimate control of diagnosis to the clinician, which can result in a more trustworthy intelligent system.
In the rapidly evolving construction industry, timely and accurate monitoring of construction activities is paramount. This paper introduces a novel approach to quantifying construction activity using high-resolution ...
详细信息
Purpose object detection models have gained considerable popularity as they aid in lot of applications, like monitoring, video surveillance, etc. objectdetection through the video tracking faces lot of challenges, as...
详细信息
Purpose object detection models have gained considerable popularity as they aid in lot of applications, like monitoring, video surveillance, etc. objectdetection through the video tracking faces lot of challenges, as most of the videos obtained as the real time stream are affected due to the environmental factors. Design/methodology/approach This research develops a system for crowd tracking and crowd behaviour recognition using hybrid tracking model. The input for the proposed crowd tracking system is high density crowd videos containing hundreds of people. The first step is to detect human through visual recognition algorithms. Here, a priori knowledge of location point is given as input to visual recognition algorithm. The visual recognition algorithm identifies the human through the constraints defined within Minimum Bounding Rectangle (MBR). Then, the spatial tracking model based tracks the path of the human object movement in the video frame, and the tracking is carried out by extraction of color histogram and texture features. Also, the temporal tracking model is applied based on NARX neural network model, which is effectively utilized to detect the location of moving objects. Once the path of the person is tracked, the behaviour of every human object is identified using the Optimal Support Vector Machine which is newly developed by combing SVM and optimization algorithm, namely MBSO. The proposed MBSO algorithm is developed through the integration of the existing techniques, like BSA and MBO. Findings The dataset for the object tracking is utilized from Tracking in high crowd density dataset. The proposed OSVM classifier has attained improved performance with the values of 0.95 for accuracy. Originality/value This paper presents a hybrid high density video tracking model, and the behaviour recognition model. The proposed hybrid tracking model tracks the path of the object in the video through the temporal tracking and spatial tracking. The features train th
暂无评论