检索结果-内蒙古大学图书馆

real-time detection of wood defects based on SPP-improved YOLO algorithm

MULtimeDIA TOOLS AND APPLICATIONS 2023年第14期82卷 21031-21044页

作者： Cui, Yuming Lu, Shuochen Liu, Songyong Jiangsu Normal Univ Sch Mechatron Engn Xuzhou 221116 Peoples R China China Univ Min & Technol Sch Mechatron Engn Xuzhou 221116 Peoples R China

Wood processing is one of the most widely used in agriculture and industry. Low precision and high time delay of machine learning in wood defect detection are currently the main factors restricting the production efficiency and product quality of the wood processing industry. An SPP-improved deep learning method was proposed to detect wood defects based on the basic framework of the YOLO V3 network to improve accuracy and real-time performance. The extended dataset was firstly established by image data enhancement and preprocessing based on the limited samples of the wood defect dataset. Anchor box scale re-clustering of the wood defect dataset was carried out according to the defect features. The spatial pyramid pooling (SPP) network was applied to improve the feature pyramid (FP) network in YOLO V3. The validity and real-time performance of the proposed algorithm were verified by a randomly selected test set. The results show that the overall detection accuracy rate on the wood defect test dataset reaches 93.23% while the detection time for each image is within 13 ms.

关键词： Transfer learning Wood defects detection real-time detection Full convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

Hybrid ultra-short term solar irradiation forecasting using resource-efficient multi-step long-short term memory

引用

RENEWABLE ENERGY 2025年 247卷

作者： Barancsuk, Lilla Groma, Veronika Kocziha, Barnabas Budapest Univ Technol & Econ Dept Elect Power Engn Muegyet Quay 3 H-1111 Budapest Hungary HUN REN Ctr Energy Res Konkoly Thege Miklos St 29-33 H-1121 Budapest Hungary

Accurate forecasting of solar irradiance is a key tool for optimizing the efficiency and service quality of solar energy systems. In this paper, a novel approach is proposed for multi-step solar irradiation forecasting using deep learning models optimized for low computational resource environments. Traditional forecasting models often lack accuracy, and modern, deep-learning based models, while accurate, require substantial computational resources, making them impractical for real-time or resource-constrained environments. Our method uniquely combines dimensionality reduction via image processing with an LSTM-based architecture, achieving significant input data reduction by a factor of 4250 while preserving essential sky condition information, resulting in a lightweight neural network architecture that balances prediction accuracy with computational efficiency. The forecasts are generated simultaneously for multiple time steps: 1 minute, 5 minutes, 10 minutes and 20 minutes. Models are evaluated against a custom dataset, spanning across more than 3 years, containing 1 min samples encompassing both all-sky imagery and meteorological measurements. The approach is demonstrated to achieve better forecasting accuracy, namely a forecast skill of 10 % compared to persistence, and a significantly reduced computational overhead compared to benchmark ConvLSTM models. Moreover, utilizing the preprocessed image features reduces input size by a factor of 6 compared to the raw images. Our findings suggest that the proposed models are well-suited for deployment in embedded systems, remote sensors, and other scenarios where computational resources are limited.

关键词： Solar irradiation forecast Multistep forecasting deep learning LSTM image features Resource-efficient Total sky imager

来源：评论

学校读者我要写书评

暂无评论

Resource-aware strategies for real-time multi-person pose estimation

引用

image AND VISION COMPUTING 2025年 155卷

作者： Esmail, Mohammed A. Wang, Jinlei Wang, Yihao Sun, Li Zhu, Guoliang Zhang, Guohe Xi An Jiao Tong Univ Sch Microelect Xian 710049 Peoples R China Beijing Inst Technol Beijing 100076 Peoples R China Beijing Res Inst Telemetry Beijing 100076 Peoples R China

When using deep learning applications for human posture estimation (HPE), especially on devices with limited resources, accuracy and efficiency must be balanced. Common deep-learning architectures have a propensity to use a large amount of processing power while yielding low accuracy. This work proposes the implementation of Efficient YoloPose, a new architecture based on You Only Look Once version 8 (YOLOv8)-Pose, in an attempt to address these issues. Advanced lightweight methods like Depthwise Convolution, Ghost Convolution, and the C3Ghost module are used by Efficient YoloPose to replace traditional convolution and C2f (a quicker implementation of the Cross Stage Partial Bottleneck). This approach greatly decreases the inference, parameter count, and computing complexity. To improve posture estimation even further, Efficient YoloPose integrates the Squeeze Excitation (SE) attention method into the network. The main focus of this process during posture estimation is the significant areas of an image. Experimental results show that the suggested model performs better than the current models on the COCO and OCHuman datasets. The proposed model lowers the inference time from 1.1 milliseconds (ms) to 0.9 ms, the computational complexity from 9.2 Giga Floating-point operations (GFlops) to 4.8 GFlops and the parameter count from 3.3 million to 1.3 million when compared to YOLOv8-Pose. In addition, this model maintains an average precision (AP) score of 78.8 on the COCO dataset. The source code for Efficient YoloPose has been made publicly available at [https://***/malareeqi/Efficient-YoloPose].

关键词： deep learning Human pose estimation (HPE) Efficient YoloPose Lightweight techniques Computational efficiency

来源：评论

学校读者我要写书评

暂无评论

ETransCap: efficient transformer for image captioning

引用

APPLIED INTELLIGENCE 2024年第21期54卷 10748-10762页

作者： Mundu, Albert Singh, Satish Kumar Dubey, Shiv Ram IIIT Allahabad Dept IT Comp Vis & Biometr Lab CVBL Allahabad India

image captioning is a challenging task in computer vision that automatically generates a textual description of an image by integrating visual and linguistic information, as the generated captions must accurately describe the image's content while also adhering to the conventions of natural language. We adopt the encoder-decoder framework employed by various CNN-RNN-based models for image captioning in the past few years. Recently, we observed that the CNN-Transformer-based models have achieved great success and surpassed traditional CNN-RNN-based models in the area. Many researchers have concentrated on Transformers, exploring and uncovering its vast possibilities. Unlike conventional CNN-RNN-based models in image captioning, transformer-based models have achieved notable success and offer the benefit of handling longer input sequences more efficiently. However, they are resource-intensive to train and deploy, particularly for large-scale tasks or for tasks that require real-time processing. In this work, we introduce a lightweight and efficient transformer-based model called the Efficient Transformer Captioner (ETransCap), which consumes fewer computation resources to generate captions. Our model operates in linear complexity and has been trained and tested on MS-COCO dataset. Comparisons with existing state-of-the-art models show that ETransCap achieves promising results. Our results support the potential of ETransCap as a good approach for image captioning tasks in real-time applications. Code for this project will be available at https://***/albertmundu/etranscap.

关键词： deep learning Natural language processing image captioning Scene understanding Transformers Efficient transformers

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Approach for Pavement Crack Detection Using Mask R-CNN and Vision Transformer Model

引用

Computers, Materials & Continua 2025年第1期82卷 561-577页

作者： Shorouq Alshawabkeh Li Wu Daojun Dong Yao Cheng Liping Li Faculty of Engineering China University of GeosciencesWuhan430074China

Detecting pavement cracks is critical for road safety and infrastructure *** methods,relying on manual inspection and basic image processing,are time-consuming and prone to *** deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental *** study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement *** evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and *** findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both ***,the model attained a DSC of 80.04%on Crack500 and 91.37%on deepCrack,demonstrating superior segmentation accuracy and *** high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.

关键词： Pavement crack segmentation transportation deep learning vision transformer Mask R-CNN image segmentation

来源：评论

学校读者我要写书评

暂无评论

Development of an autonomous chess robot system using computer vision and deep learning

引用

RESULTS IN ENGINEERING 2025年 25卷

作者： Phuc, Truong Duc Son, Bui Cao Hanoi Univ Sci & Technol Sch Mech Engn Hanoi Vietnam

In this research, a low-cost autonomous chess robot system is developed using computer vision, deep learning, and robot control. The system comprises a chessboard, a camera system, and a 4-DOF SCARA robot. The entire system is managed by software running on a computer. Additionally, a deep learning model has been created for chess piece recognition and position detection. The calculation of chess moves is performed using the minimax algorithm within the Stockfish chess engine. Results indicate that the computation time for a chess move is approximately 2 s per chess position, while the average time for the robot to execute a chess piece movement is from 20 to 90 s for one position, depending on the type of chess move. The developed chess robot system operates stably and accurately, capable of autonomously playing a complete chess game against humans or identifying chess positions for a pre-arranged setup. Moreover, the fabrication cost of the robotic arm and its control system is approximately $100, making it both affordable and suitable for training and entertainmentfocused chess robot systems. The results demonstrated that the autonomous chess robot system developed in this study is feasible for real-world applications for chess playing or chess training systems.

关键词： Autonomous chess robot SCARA robot Computer vision deep learning image processing

来源：评论

学校读者我要写书评

暂无评论

Application and optimization of intelligent image identification technology in highway inspection data

引用

INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT 2025年 1-13页

作者： Wang, Wenjuan Li, Chengwu Sun, Shouwei Shen, Chao Wang, Yunbin Beijing Yunxuehui Informat Technol Co LTD Beijing 100080 Peoples R China Yunnan Transportat Res Inst Co LTD Kunming 650011 Yunnan Peoples R China

Traditional inspection methods for highways include fixed cameras and automatic detection systems. Although they automate the inspection process to a certain extent, they still face problems such as limited coverage, delayed data processing, and insufficient identification accuracy. These systems may face compatibility challenges when integrated with new technologies. To address these issues, this article explores the application of intelligent image identification technology in highway inspections. High-resolution cameras are used for data acquisition, and convolutional neural networks (CNN) are applied to extract image features and generate feature maps. The faster region-based convolutional neural network (Faster R-CNN) is used to classify and perform the bounding box regression on the features extracted by CNN in these regions, determining the category and precise location of the object. At the same time, the deep learning algorithm Autoencoder is used for anomaly detection, and multi-source data is integrated through Kalman filtering. The model parameters and computational graphs are also optimized to improve system efficiency. A 12-km-long highway was selected for testing. The average processing time of the system was within 400 ms, and the processing time for each frame of the image was about 150 ms. It can be seamlessly integrated with the existing system, meeting the real-time requirements. The results show that intelligent image recognition technology can achieve higher highway inspection efficiency and accuracy while maintaining good compatibility with existing equipment.

关键词： Intelligent image identification technology Convolutional neural network System integration real-time data processing Performance optimization

来源：评论

学校读者我要写书评

暂无评论

DL-DARE: deep learning-based different activity recognition for the human-robot interaction environment

引用

NEURAL COMPUTING & APPLICATIONS 2023年第16期35卷 12029-12037页

作者： Kansal, Sachin Jha, Sagar Samal, Prathamesh Thapar Inst Engn Technol Patiala Comp Sci Engn Dept Patiala 147004 Punjab India

This paper proposes a deep learning-based activity recognition for the Human-Robot Interaction environment. The observations of the object state are acquired from the vision sensor in the real-time scenario. The activity recognition system examined in this paper comprises activities labeled as classes (pour, rotate, drop objects, and open bottles). The image processing unit processes the images and predicts the activity performed by the robot using deep learning methods so that the robot will do the actions (sub-actions) according to the predicted activity.

关键词： RGB sensor VGG-16 Resnet-50 Hyper-parameters deep learning Visual tracking Feature extraction

来源：评论

学校读者我要写书评

暂无评论

HDL-ACO hybrid deep learning and ant colony optimization for ocular optical coherence tomography image classification

引用

SCIENTIFIC REPORTS 2025年第1期15卷 1-12页

作者： Agarwal, Shivani Dohare, Anand Kumar Saxena, Pranshu Singh, Jagendra Singh, Indrasen Sahu, Umesh Kumar Ajay Kumar Garg Engn Coll Dept Informat Technol Ghaziabad India Greater Noida Inst Technol Engn Inst Dept Informat Technol Greater Noida India Bennett Univ Sch Comp Sci Engn & Technol Greater Noida India Vellore Inst Technol Sch Elect Engn Vellore 632014 Tamil Nadu India Manipal Inst Technol Manipal Acad Higher Educ Dept Mechatron Manipal 576104 Karnataka India

Optical Coherence Tomography (OCT) plays a crucial role in diagnosing ocular diseases, yet conventional CNN-based models face limitations such as high computational overhead, noise sensitivity, and data imbalance. This paper introduces HDL-ACO, a novel Hybrid deep learning (HDL) framework that integrates Convolutional Neural Networks with Ant Colony Optimization (ACO) to enhance classification accuracy and computational efficiency. The proposed methodology involves pre-processing the OCT dataset using discrete wavelet transform and ACO-optimized augmentation, followed by multiscale patch embedding to generate image patches of varying sizes. The hybrid deep learning model leverages ACO-based hyperparameter optimization to enhance feature selection and training efficiency. Furthermore, a Transformer-based feature extraction module integrates content-aware embeddings, multi-head self-attention, and feedforward neural networks to improve classification performance. Experimental results demonstrate that HDL-ACO outperforms state-of-the-art models, including ResNet-50, VGG-16, and XGBoost, achieving 95% training accuracy and 93% validation accuracy. The proposed framework offers a scalable, resource-efficient solution for real-time clinical OCT image classification.

关键词： Optical coherence tomography Hybrid deep learning Ant colony optimization Hyperparameter tuning Data Imbalance OCT image classification

来源：评论

学校读者我要写书评

暂无评论

real-time traffic data: estimating noise and air pollution, comparative ML techniques analysis

引用

PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-TRANSPORT 2025年

作者： Madhu, Kavitha Athira, P. R. Rohini, S. Sikhin, V. C. Balakrishnan, Srijith TKM Coll Engn Dept Civil Engn Kollam India Rockwell Automat Co Bengaluru Karnataka India Delft Univ Technol Delft Netherlands

Motor vehicles significantly contribute to the escalating levels of air and noise pollution in urban centers worldwide. Numerous studies have established a strong correlation between vehicle exhaust emissions, noise levels, and various factors such as traffic flow rate, vehicle composition, fleet speed, as well as deceleration and acceleration speeds. This research monitors ambient air quality and noise levels in diverse city centers during peak hours, shedding light on the impact of vehicular activities. The study investigates into the intricate relationship between vehicular composition and the concentration of particulate matter (PM). Furthermore, it conducts a comprehensive analysis of how traffic composition influences roadside noise pollution, identifying key factors contributing to this environmental concern. Employing an efficient deep learning process, the research employs image detection and tracking of vehicles to enhance understanding. Additionally, various machine learning tools are applied for the prediction of traffic-related air and noise pollution. This research makes a significant contribution to sustainable transportation planning, offering valuable insights into the complex dynamics of vehicular impact on urban environments. The findings not only enhance our understanding of pollution sources but also pave the way for informed decision-making in developing strategies to mitigate the adverse effects of motor vehicle activities.

关键词： traffic engineering environmental aspects sustainability image processing real-time vehicle detection air pollution air quality meter noise pollution sound level meter sustainable development goal (SDG) 11: sustainable cities and communities

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：