Wood processing is one of the most widely used in agriculture and industry. Low precision and high time delay of machine learning in wood defect detection are currently the main factors restricting the production effi...
详细信息
Wood processing is one of the most widely used in agriculture and industry. Low precision and high time delay of machine learning in wood defect detection are currently the main factors restricting the production efficiency and product quality of the wood processing industry. An SPP-improved deeplearning method was proposed to detect wood defects based on the basic framework of the YOLO V3 network to improve accuracy and real-time performance. The extended dataset was firstly established by image data enhancement and preprocessing based on the limited samples of the wood defect dataset. Anchor box scale re-clustering of the wood defect dataset was carried out according to the defect features. The spatial pyramid pooling (SPP) network was applied to improve the feature pyramid (FP) network in YOLO V3. The validity and real-time performance of the proposed algorithm were verified by a randomly selected test set. The results show that the overall detection accuracy rate on the wood defect test dataset reaches 93.23% while the detection time for each image is within 13 ms.
In recent years, deeplearning has emerged as the dominant approach for hyperspectral image (HSI) classification. However, deep neural networks require large annotated datasets to generalize well. This limits the appl...
详细信息
In recent years, deeplearning has emerged as the dominant approach for hyperspectral image (HSI) classification. However, deep neural networks require large annotated datasets to generalize well. This limits the applicability of deeplearning for real-world HSI classification problems, as manual labeling of thousands of pixels per scene is costly and time consuming. In this article, we tackle the problem of few-shot HSI classification by leveraging state-of-the-art self-supervised contrastive learning with an improved view-generation approach. Traditionally, contrastive learning algorithms heavily rely on hand-crafted data augmentations tailored for natural imagery to generate positive pairs. However, these augmentations are not directly applicable to HSIs, limiting the potential of self-supervised learning in the hyperspectral domain. To overcome this limitation, we introduce two positive pair-mining strategies for contrastive learning on HSIs. The proposed strategies mitigate the need for high-quality data augmentations, providing an effective solution for few-shot HSI classification. Through extensive experiments, we show that the proposed approach improves accuracy and label efficiency on four popular HSI classification benchmarks. Furthermore, we conduct a thorough analysis of the impact of data augmentation in contrastive learning, highlighting the advantage of our positive pair-mining approach.
This letter describes autonomous landing of an unmanned aircraft system on a moving platform using vision and deep reinforcement learning. Landing on the moving platform offers several benefits, such as more mission f...
详细信息
This letter describes autonomous landing of an unmanned aircraft system on a moving platform using vision and deep reinforcement learning. Landing on the moving platform offers several benefits, such as more mission flexibility and reduced flight time. In particular, the end-to-end vision approach (i.e., an input to the reinforcement learning is a raw image from the camera) with the deep regularized Q algorithm and custom designed reward is utilized. The custom reward was specifically devised to encourage useful feature extraction from the state space. Additionally, the proposed reinforcement learning algorithm has full 3D velocity control including the vertical channel. The simulation results show that the proposed approach can outperform existing approaches which use high-level extracted features (such as relative position and velocity of the landing pad). The simulation results are then successfully transferred to the real-world experiment by utilizing domain randomization.
Due to the limited sample quantity and the complex data collection process of the blurred space-timeimage (BSTI) dataset, the deeplearning-based space-timeimage velocimetry (STIV) results in larger errors when appl...
详细信息
Due to the limited sample quantity and the complex data collection process of the blurred space-timeimage (BSTI) dataset, the deeplearning-based space-timeimage velocimetry (STIV) results in larger errors when applied to blurry videos. To enhance the measurement accuracy, we propose the use of STIV in blurred scenes based on BSTI-deep convolutional generative adversarial network (DCGAN) data augmentation. Firstly, BSTI-DCGAN is developed based on the DCGAN. This network utilizes a bilinear interpolation-convolution module for upsampling and integrates coordinated attention and multi-concatenation attention to enhance the resemblance between generated and realimages. Next, further expanding the dataset by using artificially synthesized space-timeimages subsequently, all space-timeimages are transformed into spectrograms to create a training dataset for the classification network. Finally, the primary spectral direction is detected using the classification network. The experimental results indicate that our approach effectively augments the dataset and improves the accuracy of practical measurements. Under the condition of video blur, the relative errors of the average flow velocity and discharge are 3.92% and 2.72%, respectively.
Accurate forecasting of solar irradiance is a key tool for optimizing the efficiency and service quality of solar energy systems. In this paper, a novel approach is proposed for multi-step solar irradiation forecastin...
详细信息
Accurate forecasting of solar irradiance is a key tool for optimizing the efficiency and service quality of solar energy systems. In this paper, a novel approach is proposed for multi-step solar irradiation forecasting using deeplearning models optimized for low computational resource environments. Traditional forecasting models often lack accuracy, and modern, deep-learning based models, while accurate, require substantial computational resources, making them impractical for real-time or resource-constrained environments. Our method uniquely combines dimensionality reduction via imageprocessing with an LSTM-based architecture, achieving significant input data reduction by a factor of 4250 while preserving essential sky condition information, resulting in a lightweight neural network architecture that balances prediction accuracy with computational efficiency. The forecasts are generated simultaneously for multiple time steps: 1 minute, 5 minutes, 10 minutes and 20 minutes. Models are evaluated against a custom dataset, spanning across more than 3 years, containing 1 min samples encompassing both all-sky imagery and meteorological measurements. The approach is demonstrated to achieve better forecasting accuracy, namely a forecast skill of 10 % compared to persistence, and a significantly reduced computational overhead compared to benchmark ConvLSTM models. Moreover, utilizing the preprocessed image features reduces input size by a factor of 6 compared to the raw images. Our findings suggest that the proposed models are well-suited for deployment in embedded systems, remote sensors, and other scenarios where computational resources are limited.
A regression model represents the relationship between explanatory and response variables. In real life, explanatory variables often affect a response variable with a certain time lag, rather than immediately. For exa...
详细信息
A regression model represents the relationship between explanatory and response variables. In real life, explanatory variables often affect a response variable with a certain time lag, rather than immediately. For example, the marriage rate affects the birth rate with a time lag of 1 to 2 years. Although deeplearning models have been successfully used to model various relationships, most of them do not consider the time lags between explanatory and response variables. Therefore, in this paper, we propose an extension of deeplearning models, which automatically finds the time lags between explanatory and response variables. The proposed method finds out which of the past values of the explanatory variables minimize the error of the model, and uses the found values to determine the time lag between each explanatory variable and response variables. After determining the time lags between explanatory and response variables, the proposed method trains the deeplearning model again by reflecting these time lags. Through various experiments applying the proposed method to a few deeplearning models, we confirm that the proposed method can find a more accurate model whose error is reduced by more than 60% compared to the original model.
This article deals with classifying dog behavior using motion sensors, leveraging a transformer-based deep neural network (DNN) model. Understanding dog behavior is essential for fostering positive relationships betwe...
详细信息
This article deals with classifying dog behavior using motion sensors, leveraging a transformer-based deep neural network (DNN) model. Understanding dog behavior is essential for fostering positive relationships between dogs and humans and ensuring their well-being. Traditional methods often fall short in capturing temporal dependencies and efficiently processing high-dimensional sensor data. Our proposed architecture, inspired by its success in natural language processing (NLP), utilizes the self-attention mechanism of the transformer to effectively identify relevant features across various time scales, making it ideal for real-time applications. The architecture includes only the encoder part with a classifier's head to output probabilities of dog behavior. We used an open-access dataset focusing on seven different dog behavior, captured by motion sensors on top of the dog's back. Through experimentation and optimization, our model demonstrates superior performance with an impressive accuracy rate of 98.5%, outperforming time series DNN models. The model's efficiency is further highlighted by its reduced computational complexity, lower latency, and smaller size, making it well-suited for deployment in resource-constrained environments.
When using deeplearning applications for human posture estimation (HPE), especially on devices with limited resources, accuracy and efficiency must be balanced. Common deep-learning architectures have a propensity to...
详细信息
When using deeplearning applications for human posture estimation (HPE), especially on devices with limited resources, accuracy and efficiency must be balanced. Common deep-learning architectures have a propensity to use a large amount of processing power while yielding low accuracy. This work proposes the implementation of Efficient YoloPose, a new architecture based on You Only Look Once version 8 (YOLOv8)-Pose, in an attempt to address these issues. Advanced lightweight methods like Depthwise Convolution, Ghost Convolution, and the C3Ghost module are used by Efficient YoloPose to replace traditional convolution and C2f (a quicker implementation of the Cross Stage Partial Bottleneck). This approach greatly decreases the inference, parameter count, and computing complexity. To improve posture estimation even further, Efficient YoloPose integrates the Squeeze Excitation (SE) attention method into the network. The main focus of this process during posture estimation is the significant areas of an image. Experimental results show that the suggested model performs better than the current models on the COCO and OCHuman datasets. The proposed model lowers the inference time from 1.1 milliseconds (ms) to 0.9 ms, the computational complexity from 9.2 Giga Floating-point operations (GFlops) to 4.8 GFlops and the parameter count from 3.3 million to 1.3 million when compared to YOLOv8-Pose. In addition, this model maintains an average precision (AP) score of 78.8 on the COCO dataset. The source code for Efficient YoloPose has been made publicly available at [https://***/malareeqi/Efficient-YoloPose].
Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In this paper, we propose a new camera exposure control framework that rapidly controls camera exposure while performing real-timeprocessing by exploiting deep reinforcement learning. The proposed framework consists of four contributions: 1) a simplified training ground to simulate real-world's diverse and dynamic lighting changes, 2) flickering and image attribute-aware reward design, along with lightweight state design for real-timeprocessing, 3) a static-to-dynamic lighting curriculum to gradually improve the agent's exposure-adjusting capability, and 4) domain randomization techniques to alleviate the limitation of the training ground and achieve seamless generalization in the wild. As a result, our proposed method rapidly reaches a desired exposure level within five steps with real-timeprocessing (1ms). Also, the acquired images are well-exposed and show superiority in various computer vision tasks, such as feature extraction and object detection.
To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm comb...
详细信息
ISBN:
(纸本)9798400707032
To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deeplearning and transfer learning is proposed. This algorithm is based on the Densely Connected Convolutional Networks (DenseNet) structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks.
暂无评论