multi-class object detection in infrared images is important in military and civilian use. Deep learning methods can obtain high accuracy but require a large-scale dataset. We propose a generative data augmentation fr...
详细信息
multi-class object detection in infrared images is important in military and civilian use. Deep learning methods can obtain high accuracy but require a large-scale dataset. We propose a generative data augmentation framework DOCI-GAN, for infrared multi-class object detection with limited data. Contributions of this paper are four-folds. Firstly, DOCI-GAN is designed as a conditional image inpainting framework, yielding paired infrared multi-classobject image and annotation. Secondly, a text-to-image converter is formulated to transform text-format object annotations to bounding box mask images, leading the augmentation to be mask-imageto-raw-image translation. Thirdly, a multiscale morphological erosion-based loss is created to alleviate the intensity inconsistency between inpainted local backgrounds and global background. Finally, for generating diverse images, artificial multi-classobject annotations are integrated with real ones during augmentation. Experimental results demonstrated that DOCI-GAN augments dataset with high-quality infrared multi-classobject images, consequently improving the accuracy of objectdetection baselines.
In order to address the challenge of labor shortages, and to reduce costs of apple harvesting, a targeted shake-and-catch technique is being developed at Washington State University for fresh market apple harvesting. ...
详细信息
In order to address the challenge of labor shortages, and to reduce costs of apple harvesting, a targeted shake-and-catch technique is being developed at Washington State University for fresh market apple harvesting. This technique is showing promising results for some varieties of apples trained to a formal, fruiting wall tree architecture. However, the operators are still required to manually engage the shaker on target branches. To further improve the shake-and-catch apple harvesting system, a multi-class object detection algorithm was developed in this study for automatically detecting apples, branches and trunks in the natural environment using a Faster R-CNN (Regions-Convolutional Neural Network) model. This study deployed transfer learning and fine-tuning for the pre-trained networks (Alexnet, VGG16 and VGG19) and activated the feature of different layers to realize the detection of these objects. The Precision and Recall (PR) curve, F1-score and mean Average Precision (mAP) were used to evaluate the performance of Faster R-CNN in detecting different objectclasses. VGG19 achieved the highest mAP of 82.4%, which was 10.8% higher than Alexnet and 0.4% higher than VGG16 respectively. The computational time consumed by the entire algorithm was also assessed in this study;Faster R-CNN completed the detection of one image, on average, in 0.45 s. Based on the multi-class object detection results, a polynomial fitting method was used to predict the skeleton equation of branches and trunks. The average Goodness of Fit (R-2), Root Mean Squared Error (RMSE) and correlation coefficient (r) between the predicted and reference skeleton were calculated to represent the accuracy of skeleton fitting. VGG16 and VGG19 both achieved higher accuracy than Alexnet for the skeleton fitting of branches and trunks. An algorithm was then developed to estimate shaking locations on the branches using the results of previous steps. Compared with the human experts' input, a total of 72.7%
Deep convolutional neural networks (CNNs) have shown great success in single-class fabric image detection. However, realworld fabric defect images generally contain several types of defects in one image. Accurately re...
详细信息
Deep convolutional neural networks (CNNs) have shown great success in single-class fabric image detection. However, realworld fabric defect images generally contain several types of defects in one image. Accurately recognizing and classifying multi-class fabric defect images is still an unsolved issue due to the complexity of intersected defects, as well as difficulty in distinguishing small-size defects. To address these challenges, this study develops a methodology based on the deep learning feature pyramid networks (FPN) approach to detect multi-class fabric defects. To evaluate the proposed detection model, we built a unique multi-class fabric defects database (DHU-MO1000), where multi-class defect images are generated by industrial monitors from a textile factory. We used the dataset as the benchmark for multi-class defects detection training and testing the FPN. Furthermore, we conducted extensive experimental validations for various design choices. The experimental results show that the model outperformed existing multi-class object detection methods.
Real-time multi-class object detection becomes popular for various applications such as vehicle vision systems, computer vision and image processing. Boosted cascades achieve fast and reliable objectdetection for one...
详细信息
ISBN:
(纸本)9781479928934
Real-time multi-class object detection becomes popular for various applications such as vehicle vision systems, computer vision and image processing. Boosted cascades achieve fast and reliable objectdetection for one objectclass, but require parallel usage of multiple cascades for multi-classdetection. The multi-class capable cascade splits the root-cascade into sub-cascades iteratively until each sub-cascade contains one class. That requires a huge number of classifiers in the generated hierarchy of interlinked cascades. In this paper, we propose a boosted multi-classobject cascade that only splits one classobject from the upper-level-cascade when building the sub-cascades. Since only once classobject is split so we can reduce the number of classifiers in each stage. From the simulation results, the boosted multi-class object detection can reduce 46% weak classifiers compared to the multi-class capable cascade for the MIT CBCL database. The proposed method achieves high detection rate(95.54%) and low false positive rate(1.94%). We implement our proposed algorithm with a parallel architecture to accelerate the detection operation using TSMC 90nm CMOS technology. The implementation results show that the design achieves an operation frequency of 100MHz of processing images of 30 fps with size 160 x 120.
One significant aspect of surgical education and training is autonomous surgical skill assessment with feedback. In this paper, an autonomous two-level fuzzy logic assessment system for tracking and evaluation of lapa...
详细信息
ISBN:
(纸本)9781665487689
One significant aspect of surgical education and training is autonomous surgical skill assessment with feedback. In this paper, an autonomous two-level fuzzy logic assessment system for tracking and evaluation of laparoscopic instruments' tooltip movements for the FLS peg transfer task is proposed. The surgeon's left and right-hand movements are detected by using an Artificial Intelligence Network through instrument tooltip detection and position coordinates calculations. A first of its kind, custom laparoscopic box trainer dataset was built from experimental peg transfer task video recordings which were carried out by 9 doctors and OB/GYN residents, of the Homer Stryker M.D. School of Medicine, WMU, in the Intelligent Fuzzy Controllers Laboratory, WMU. A multi-class object detection algorithm, based on Deep Neural Networks, was developed.
The purpose of the Fundamentals of Laparoscopic Surgery (FLS) training is to develop laparoscopic surgery skills by using simulation experiences. Several advanced training methods based on simulation have been created...
详细信息
The purpose of the Fundamentals of Laparoscopic Surgery (FLS) training is to develop laparoscopic surgery skills by using simulation experiences. Several advanced training methods based on simulation have been created to enable training in a non-patient environment. Laparoscopic box trainers-cheap, portable devices-have been deployed for a while to offer training opportunities, competence evaluations, and performance reviews. However, the trainees must be under the supervision of medical experts who can evaluate their abilities, which is an expensive and time-consuming operation. Thus, a high level of surgical skill, determined by assessment, is necessary to prevent any intraoperative issues and malfunctions during a real laparoscopic procedure and during human intervention. To guarantee that the use of laparoscopic surgical training methods results in surgical skill improvement, it is necessary to measure and assess surgeons' skills during tests. We used our intelligent box-trainer system (IBTS) as a platform for skill training. The main aim of this study was to monitor the surgeon's hands' movement within a predefined field of interest. To evaluate the surgeons' hands' movement in 3D space, an autonomous evaluation system using two cameras and multi-thread video processing is proposed. This method works by detecting laparoscopic instruments and using a cascaded fuzzy logic assessment system. It is composed of two fuzzy logic systems executing in parallel. The first level assesses the left and right-hand movements simultaneously. Its outputs are cascaded by the final fuzzy logic assessment at the second level. This algorithm is completely autonomous and removes the need for any human monitoring or intervention. The experimental work included nine physicians (surgeons and residents) from the surgery and obstetrics/gynecology (OB/GYN) residency programs at WMU Homer Stryker MD School of Medicine (WMed) with different levels of laparoscopic skills and experience. They
Military aircraft detection holds critical significance in defense operations, ensuring accurate identification and classification of aircraft for effective decision-making. However, existing methodologies face challe...
详细信息
ISBN:
(纸本)9789819756148;9789819756155
Military aircraft detection holds critical significance in defense operations, ensuring accurate identification and classification of aircraft for effective decision-making. However, existing methodologies face challenges due to disparate data collection, limited data availability, and the complexity of aggregating remote datasets. In response to these challenges, we propose a novel approach FedMATD, utilizing Federated Meta-Learning techniques to address the difficulties in data collection. To figure out the limitation in the scale of dataset, we integrate Federated Meta-Learning with a strategy focusing on training with small sample sizes. This innovative fusion aims to enhance target detection accuracy by leveraging the advantages of federated learning while mitigating the limitations posed by insufficient data quantities and remote data aggregation complexities. Our proposed method is evaluated using one open-source dataset, and our results demonstrate that FedMATD achieves a better level.
Intelligent systems focused on traffic management have been in evidence in recent years, and applications related to vehicle detection and tracking, speed estimation, and traffic flow identification have become an int...
详细信息
ISBN:
(纸本)9798350360875;9798350360868
Intelligent systems focused on traffic management have been in evidence in recent years, and applications related to vehicle detection and tracking, speed estimation, and traffic flow identification have become an interesting research topic. For the previously mentioned tasks, a large number of data has to be gathered to train deep learning algorithms, but collecting that data can be a time and resource-consuming task. Therefore, the use of synthetic data has become a viable option that helps to minimize data acquisition problems, but when misused, it can negatively impact the model's quality. This paper presents a systematic literature review based on the use of synthetic images to train objectdetection models in urban scenarios, aiming at identifying the ideal ratio between real and synthetic images that can benefit those models and the best methods to produce synthetic images. This study identified that there is no consensus on the number of synthetic images that can help to generate a more accurate model, due to the low number of papers addressing this relationship, however, it was noted that the use of generative adversarial networks (GANs) can create synthetic images that are more similar to real images, bringing benefits for training detection models, although without identifying how the use of images generated by this method can help in the relationship between synthetic and real.
Airport objects are hotspots in the field of image objectdetection because of their specific features and value for applications. In this study, we developed a complex objectdetection method based on improved Faster...
详细信息
Airport objects are hotspots in the field of image objectdetection because of their specific features and value for applications. In this study, we developed a complex objectdetection method based on improved Faster R-CNN to achieve higher detection precision to detect seven types of remote sensing image objects in airport areas under complex conditions such as different scales, different visual angles, and different backgrounds. When building the network, we used deeper basic networks and feature fusion components to extract more robust features. At the same time, we had also modified the selection of positive and negative samples to improve sample imbalance. The main improvements in the algorithm concern the anchor size generation rule, and the addition of an a priori judgment network for the network. The effectiveness of the improved algorithm was verified in experiments. Compared with the original Faster R-CNN, the improved network brings a 12.7% increase in mAP, at the detection time of 0.307s. Finally, the model with trained weights was used to test the detection of the seven types of objects in airport areas on different data sets, and comparisons were conducted with other algorithms. The experimental results showed that the method improved the average detection accuracy and had a good performance in remote sensing airport objectdetection tasks.
In this paper we demonstrate how the post-processing of gray-scale images with algorithms which have a singularity enhancement effect can assume the role of auxiliary modalities, as in the case where an intelligent sy...
详细信息
ISBN:
(数字)9781510629707
ISBN:
(纸本)9781510629707
In this paper we demonstrate how the post-processing of gray-scale images with algorithms which have a singularity enhancement effect can assume the role of auxiliary modalities, as in the case where an intelligent system fuses information from multiple physical modalities. We show that as in multimodal AI-fusion, "virtual" multimodal inputs can improve the performance of objectdetection. We design, implement and test a novel Convolutional Neural Network architecture, based on the Faster R-CNN network for multi-class object detection and classification. Our architecture combines deep feature representations of the input images, generated by networks trained independently on physical and virtual imaging modalities. Using an analog of the ROC curve, the Average Recall over Precision curve, we show that the fusion of certain virtual modality inputs, capable of enhancing singularities and neutralizing illumination, improve recognition accuracy.
暂无评论