Wood processing is one of the most widely used in agriculture and industry. Low precision and high time delay of machine learning in wood defect detection are currently the main factors restricting the production effi...
详细信息
Wood processing is one of the most widely used in agriculture and industry. Low precision and high time delay of machine learning in wood defect detection are currently the main factors restricting the production efficiency and product quality of the wood processing industry. An SPP-improved deeplearning method was proposed to detect wood defects based on the basic framework of the YOLO V3 network to improve accuracy and real-time performance. The extended dataset was firstly established by image data enhancement and preprocessing based on the limited samples of the wood defect dataset. Anchor box scale re-clustering of the wood defect dataset was carried out according to the defect features. The spatial pyramid pooling (SPP) network was applied to improve the feature pyramid (FP) network in YOLO V3. The validity and real-time performance of the proposed algorithm were verified by a randomly selected test set. The results show that the overall detection accuracy rate on the wood defect test dataset reaches 93.23% while the detection time for each image is within 13 ms.
Accurate forecasting of solar irradiance is a key tool for optimizing the efficiency and service quality of solar energy systems. In this paper, a novel approach is proposed for multi-step solar irradiation forecastin...
详细信息
Accurate forecasting of solar irradiance is a key tool for optimizing the efficiency and service quality of solar energy systems. In this paper, a novel approach is proposed for multi-step solar irradiation forecasting using deeplearning models optimized for low computational resource environments. Traditional forecasting models often lack accuracy, and modern, deep-learning based models, while accurate, require substantial computational resources, making them impractical for real-time or resource-constrained environments. Our method uniquely combines dimensionality reduction via imageprocessing with an LSTM-based architecture, achieving significant input data reduction by a factor of 4250 while preserving essential sky condition information, resulting in a lightweight neural network architecture that balances prediction accuracy with computational efficiency. The forecasts are generated simultaneously for multiple time steps: 1 minute, 5 minutes, 10 minutes and 20 minutes. Models are evaluated against a custom dataset, spanning across more than 3 years, containing 1 min samples encompassing both all-sky imagery and meteorological measurements. The approach is demonstrated to achieve better forecasting accuracy, namely a forecast skill of 10 % compared to persistence, and a significantly reduced computational overhead compared to benchmark ConvLSTM models. Moreover, utilizing the preprocessed image features reduces input size by a factor of 6 compared to the raw images. Our findings suggest that the proposed models are well-suited for deployment in embedded systems, remote sensors, and other scenarios where computational resources are limited.
When using deeplearning applications for human posture estimation (HPE), especially on devices with limited resources, accuracy and efficiency must be balanced. Common deep-learning architectures have a propensity to...
详细信息
When using deeplearning applications for human posture estimation (HPE), especially on devices with limited resources, accuracy and efficiency must be balanced. Common deep-learning architectures have a propensity to use a large amount of processing power while yielding low accuracy. This work proposes the implementation of Efficient YoloPose, a new architecture based on You Only Look Once version 8 (YOLOv8)-Pose, in an attempt to address these issues. Advanced lightweight methods like Depthwise Convolution, Ghost Convolution, and the C3Ghost module are used by Efficient YoloPose to replace traditional convolution and C2f (a quicker implementation of the Cross Stage Partial Bottleneck). This approach greatly decreases the inference, parameter count, and computing complexity. To improve posture estimation even further, Efficient YoloPose integrates the Squeeze Excitation (SE) attention method into the network. The main focus of this process during posture estimation is the significant areas of an image. Experimental results show that the suggested model performs better than the current models on the COCO and OCHuman datasets. The proposed model lowers the inference time from 1.1 milliseconds (ms) to 0.9 ms, the computational complexity from 9.2 Giga Floating-point operations (GFlops) to 4.8 GFlops and the parameter count from 3.3 million to 1.3 million when compared to YOLOv8-Pose. In addition, this model maintains an average precision (AP) score of 78.8 on the COCO dataset. The source code for Efficient YoloPose has been made publicly available at [https://***/malareeqi/Efficient-YoloPose].
image captioning is a challenging task in computer vision that automatically generates a textual description of an image by integrating visual and linguistic information, as the generated captions must accurately desc...
详细信息
image captioning is a challenging task in computer vision that automatically generates a textual description of an image by integrating visual and linguistic information, as the generated captions must accurately describe the image's content while also adhering to the conventions of natural language. We adopt the encoder-decoder framework employed by various CNN-RNN-based models for image captioning in the past few years. Recently, we observed that the CNN-Transformer-based models have achieved great success and surpassed traditional CNN-RNN-based models in the area. Many researchers have concentrated on Transformers, exploring and uncovering its vast possibilities. Unlike conventional CNN-RNN-based models in image captioning, transformer-based models have achieved notable success and offer the benefit of handling longer input sequences more efficiently. However, they are resource-intensive to train and deploy, particularly for large-scale tasks or for tasks that require real-timeprocessing. In this work, we introduce a lightweight and efficient transformer-based model called the Efficient Transformer Captioner (ETransCap), which consumes fewer computation resources to generate captions. Our model operates in linear complexity and has been trained and tested on MS-COCO dataset. Comparisons with existing state-of-the-art models show that ETransCap achieves promising results. Our results support the potential of ETransCap as a good approach for image captioning tasks in real-time applications. Code for this project will be available at https://***/albertmundu/etranscap.
Detecting pavement cracks is critical for road safety and infrastructure *** methods,relying on manual inspection and basic imageprocessing,are time-consuming and prone to *** deep-learning(DL)methods automate crack ...
详细信息
Detecting pavement cracks is critical for road safety and infrastructure *** methods,relying on manual inspection and basic imageprocessing,are time-consuming and prone to *** deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental *** study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deeplearning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement *** evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and *** findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both ***,the model attained a DSC of 80.04%on Crack500 and 91.37%on deepCrack,demonstrating superior segmentation accuracy and *** high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.
In this research, a low-cost autonomous chess robot system is developed using computer vision, deeplearning, and robot control. The system comprises a chessboard, a camera system, and a 4-DOF SCARA robot. The entire ...
详细信息
In this research, a low-cost autonomous chess robot system is developed using computer vision, deeplearning, and robot control. The system comprises a chessboard, a camera system, and a 4-DOF SCARA robot. The entire system is managed by software running on a computer. Additionally, a deeplearning model has been created for chess piece recognition and position detection. The calculation of chess moves is performed using the minimax algorithm within the Stockfish chess engine. Results indicate that the computation time for a chess move is approximately 2 s per chess position, while the average time for the robot to execute a chess piece movement is from 20 to 90 s for one position, depending on the type of chess move. The developed chess robot system operates stably and accurately, capable of autonomously playing a complete chess game against humans or identifying chess positions for a pre-arranged setup. Moreover, the fabrication cost of the robotic arm and its control system is approximately $100, making it both affordable and suitable for training and entertainmentfocused chess robot systems. The results demonstrated that the autonomous chess robot system developed in this study is feasible for real-world applications for chess playing or chess training systems.
Traditional inspection methods for highways include fixed cameras and automatic detection systems. Although they automate the inspection process to a certain extent, they still face problems such as limited coverage, ...
详细信息
Traditional inspection methods for highways include fixed cameras and automatic detection systems. Although they automate the inspection process to a certain extent, they still face problems such as limited coverage, delayed data processing, and insufficient identification accuracy. These systems may face compatibility challenges when integrated with new technologies. To address these issues, this article explores the application of intelligent image identification technology in highway inspections. High-resolution cameras are used for data acquisition, and convolutional neural networks (CNN) are applied to extract image features and generate feature maps. The faster region-based convolutional neural network (Faster R-CNN) is used to classify and perform the bounding box regression on the features extracted by CNN in these regions, determining the category and precise location of the object. At the same time, the deeplearning algorithm Autoencoder is used for anomaly detection, and multi-source data is integrated through Kalman filtering. The model parameters and computational graphs are also optimized to improve system efficiency. A 12-km-long highway was selected for testing. The average processingtime of the system was within 400 ms, and the processingtime for each frame of the image was about 150 ms. It can be seamlessly integrated with the existing system, meeting the real-time requirements. The results show that intelligent image recognition technology can achieve higher highway inspection efficiency and accuracy while maintaining good compatibility with existing equipment.
This paper proposes a deeplearning-based activity recognition for the Human-Robot Interaction environment. The observations of the object state are acquired from the vision sensor in the real-time scenario. The activ...
详细信息
This paper proposes a deeplearning-based activity recognition for the Human-Robot Interaction environment. The observations of the object state are acquired from the vision sensor in the real-time scenario. The activity recognition system examined in this paper comprises activities labeled as classes (pour, rotate, drop objects, and open bottles). The imageprocessing unit processes the images and predicts the activity performed by the robot using deeplearning methods so that the robot will do the actions (sub-actions) according to the predicted activity.
Optical Coherence Tomography (OCT) plays a crucial role in diagnosing ocular diseases, yet conventional CNN-based models face limitations such as high computational overhead, noise sensitivity, and data imbalance. Thi...
详细信息
Optical Coherence Tomography (OCT) plays a crucial role in diagnosing ocular diseases, yet conventional CNN-based models face limitations such as high computational overhead, noise sensitivity, and data imbalance. This paper introduces HDL-ACO, a novel Hybrid deeplearning (HDL) framework that integrates Convolutional Neural Networks with Ant Colony Optimization (ACO) to enhance classification accuracy and computational efficiency. The proposed methodology involves pre-processing the OCT dataset using discrete wavelet transform and ACO-optimized augmentation, followed by multiscale patch embedding to generate image patches of varying sizes. The hybrid deeplearning model leverages ACO-based hyperparameter optimization to enhance feature selection and training efficiency. Furthermore, a Transformer-based feature extraction module integrates content-aware embeddings, multi-head self-attention, and feedforward neural networks to improve classification performance. Experimental results demonstrate that HDL-ACO outperforms state-of-the-art models, including ResNet-50, VGG-16, and XGBoost, achieving 95% training accuracy and 93% validation accuracy. The proposed framework offers a scalable, resource-efficient solution for real-time clinical OCT image classification.
Motor vehicles significantly contribute to the escalating levels of air and noise pollution in urban centers worldwide. Numerous studies have established a strong correlation between vehicle exhaust emissions, noise l...
详细信息
Motor vehicles significantly contribute to the escalating levels of air and noise pollution in urban centers worldwide. Numerous studies have established a strong correlation between vehicle exhaust emissions, noise levels, and various factors such as traffic flow rate, vehicle composition, fleet speed, as well as deceleration and acceleration speeds. This research monitors ambient air quality and noise levels in diverse city centers during peak hours, shedding light on the impact of vehicular activities. The study investigates into the intricate relationship between vehicular composition and the concentration of particulate matter (PM). Furthermore, it conducts a comprehensive analysis of how traffic composition influences roadside noise pollution, identifying key factors contributing to this environmental concern. Employing an efficient deeplearning process, the research employs image detection and tracking of vehicles to enhance understanding. Additionally, various machine learning tools are applied for the prediction of traffic-related air and noise pollution. This research makes a significant contribution to sustainable transportation planning, offering valuable insights into the complex dynamics of vehicular impact on urban environments. The findings not only enhance our understanding of pollution sources but also pave the way for informed decision-making in developing strategies to mitigate the adverse effects of motor vehicle activities.
暂无评论