In order to solve the problem that existing fatigue driving detection methods have high model complexity and are difficult to deploy to embedded devices, this paper designs and implements a deeplearning-based fatigue...
详细信息
real-time monitoring of insects has important applications in entomology, such as managing agricultural pests and monitoring species populations-which are rapidly declining. However, most monitoring methods are labor ...
详细信息
ISBN:
(纸本)9781510650817;9781510650800
real-time monitoring of insects has important applications in entomology, such as managing agricultural pests and monitoring species populations-which are rapidly declining. However, most monitoring methods are labor intensive, invasive, and not automated. Lidar-based methods are a promising, non-invasive alternative, and have been used in recent years for various insect detection and classification studies. In a previous study, we used supervised machine learning to detect insects in lidar images that were collected near Hyalite Creek in Bozeman, Montana. Although the classifiers we tested successfully detected insects, the analysis was performed offline on a laptop computer. For the analysis to be useful in real-time settings, the computing system needs to be an embedded system capable of computing results in real-time. In this paper, we present work-in-progress towards implementing our software routines in hardware on a field programmable gate array.
This work presents two approaches to imageprocessing in brain magnetic resonance imaging (MRI) to enhance slice planning during examinations. The first approach involves capturing images from the operator's conso...
详细信息
This work presents two approaches to imageprocessing in brain magnetic resonance imaging (MRI) to enhance slice planning during examinations. The first approach involves capturing images from the operator's console during slice planning for two different brain examinations. From these images, Scale-Invariant Feature Transform (SIFT) descriptors are extracted from the regions of interest. These descriptors are then utilized to train and test a model for image matching. The second approach introduces a novel method based on the YOLO (You Only Look Once) neural network, which is designed to automatically align and orient cutting planes. Both methods aim to automate and assist operators in decision making during MRI slice planning, thereby reducing human dependency and improving examination accuracy. The SIFT-based method demonstrated satisfactory results, meeting the necessary requirements for accurate brain examinations. Meanwhile, the YOLO-based method provides a more advanced and automated solution to detect and align structures in brain MRI images. These two distinct approaches are intended to be compared, highlighting their respective strengths and weaknesses in the context of brain MRI slice planning.
In agricultural robotics, the integration of multispectral imageprocessing and deeplearning (DL) has become the state-of-the-art (SOTA) in crop monitoring, yield estimation, and efficient land management. This work ...
详细信息
ISBN:
(纸本)9798350376371;9798350376364
In agricultural robotics, the integration of multispectral imageprocessing and deeplearning (DL) has become the state-of-the-art (SOTA) in crop monitoring, yield estimation, and efficient land management. This work addresses the impact of different DL-segmentation models and evaluation protocols on multispectral imagery datasets collected by a UAV over vineyards. In terms of evaluation protocols, we have considered train-test split, standard k-fold cross-validation, and group k-fold cross-validation. While the first two assume that the training and test data are drawn from the same underlying distribution, the group k-fold cross-validation protocol assumes that each fold represents distinct distributions. Most works either adopt train-test split or k-fold cross-validation under the assumption that both the training and test sets are drawn from the same distribution. However, this assumption is rarely met in real-world applications. Therefore, the objective of this study is to evaluate and compare different evaluation protocols within the context of a real-world agricultural task, highlighting their limitations and weaknesses. Two SOTA DL-based segmentation models, SegNet and deepLabV3, are employed to perform semantic segmentation on datasets of three Vineyards. The models have been trained and tested considering single-modality representations. In addition to the RGB modality, models trained on NDVI, GNDVI and early fusion are also evaluated. The performance of the models are evaluated using the IoU metric across different dataset configurations. The results indicate that the early fusion representation achieves the highest performance across the various splitting protocols, compared to the single-input representations. The results also show that the train-test and random k-fold splitting approaches report similar results. However, when employing group k-fold the performance drops consistently across both models and the modalities. This indicates that the models
Leaf vein is a common visual pattern in nature which provides potential clues for species identification, health evaluation, and variety selection of plants. However, as a critical step in leaf vein pattern analysis, ...
详细信息
Leaf vein is a common visual pattern in nature which provides potential clues for species identification, health evaluation, and variety selection of plants. However, as a critical step in leaf vein pattern analysis, segmenting vein from leaf image remains unaddressed due to its hierarchical curvilinear structure and busy background. In this study, we for the first time design a deep model which is tailored to address the segmentation of overall leaf vein structure. The proposed deep model, termed Collaborative Up-sampling Decoder U-Net (CUDU-Net), is an improved U-Net structure consisting of a fine-tuned ResNet extractor and a collaborative up-sampling decoder. The ResNet extractor utilizes residual module to explore high-dimensional features that are representative and abstract in the hidden layers of the network. The core of CUDU-Net is the collaborative up-sampling decoder which utilizes the complementarity of the bilinear-interpolation and deconvolution, to enhance the decoding capability of the model. The bilinear-interpolation can recovery key veins while the deconvolution actively learns to supplement more fine-grained features of the tertiary veins. In addition, we embed the strip pooling in the skip-connection to distill the vein-related semantics for performance boosting. Two leaf vein segmentation datasets, termed SoyVein500 and CottVein20, are built for model validation and generalization ability test. The extensive experimental results show that our proposed CUDU-Net outperforms the state-of-the-art methods in both segmentation accuracy and generalization ability.
The image process is a technique for applying certain operations to a photograph in order to produce an enhanced image or extract some useful information from it. It’s a type of signal processing where the input is a...
详细信息
Rising urban air pollution poses serious health dangers. Through robots, cloud computing, and deep reinforcement learning, this research proposes a new air purification method. The suggested autonomous system analyzes...
详细信息
We present the system architecture for real-timeprocessing of data that originates in large format tiled imaging arrays used in wide area motion imagery ubiquitous surveillance. High performance and high throughput i...
详细信息
ISBN:
(纸本)9798350305081
We present the system architecture for real-timeprocessing of data that originates in large format tiled imaging arrays used in wide area motion imagery ubiquitous surveillance. High performance and high throughput is achieved through approximate computing and fixed point variable precision (6 bits to 18 bits) arithmetic. The architecture implements a variety of processing algorithms in what we consider today as Third Wave AI and Machine Intelligence ranging from convolutional networks (CNNs) to linear and non-linear morphological processing, probabilistic inference using exact and approximate Bayesian methods and deep Neural Networks based classification. The processing pipeline is implemented entirely using event based neuromorphic and stochastic computational primitives. An emulation of the system architecture demonstrated processing in real-time 160 x 120 raw pixel data running on a reconfigurable computing platform (5 Xilinx Kintex-7 FPGAs). The reconfigurable computing implementation was developed to emulate the computational structures for a 2.5D System chiplet design, that was fabricated in the 55nm GF CMOS technology. To optimize for energy efficiency of a mixed level system, a general energy aware methodology is applied through the design process at all levels from algorithms and architecture all the way down to technology and devices, while at the same time keeping the operational requirements and specifications for the task at focus.
Previous visual object tracking methods employ image-feature regression models or coordinate autoregression models for bounding box prediction. image-feature regression methods heavily depend on matching results and d...
Previous visual object tracking methods employ image-feature regression models or coordinate autoregression models for bounding box prediction. image-feature regression methods heavily depend on matching results and do not utilize positional prior, while the autoregressive approach can only be trained using bounding boxes available in the training set, potentially resulting in suboptimal performance during testing with unseen data. Inspired by the diffusion model, denoising learning enhances the model's robustness to unseen data. Therefore, We introduce noise to bounding boxes, generating noisy boxes for training, thus enhancing model robustness on testing data. We propose a new paradigm to formulate the visual object tracking problem as a denoising learning process. However, tracking algorithms are usually asked to run in real-time, directly applying the diffusion model to object tracking would severely impair tracking speed. Therefore, we decompose the denoising learning process into every denoising block within a model, not by running the model multiple times, and thus we summarize the proposed paradigm as an in-model latent denoising learning process. Specifically, we propose a denoising Vision Transformer (ViT), which is composed of multiple denoising blocks. In the denoising block, template and search embeddings are projected into every denoising block as conditions. A denoising block is responsible for removing the noise in a predicted bounding box, and multiple stacked denoising blocks cooperate to accomplish the whole denoising process. Subsequently, we utilize image features and trajectory information to refine the denoised bounding box. Besides, we also utilize trajectory memory and visual memory to improve tracking stability. Experimental results validate the effectiveness of our approach, achieving competitive performance on several challenging datasets. The proposed in-model latent denoising tracker achieve real-time speed, rendering denoising learning
Electric robot will obtain a large amount of image information during inspection, and if these images are checked whether there are faults in power inspection is time-consuming and labor-intensive. There is an urgent ...
详细信息
ISBN:
(数字)9798350377033
ISBN:
(纸本)9798350377040;9798350377033
Electric robot will obtain a large amount of image information during inspection, and if these images are checked whether there are faults in power inspection is time-consuming and labor-intensive. There is an urgent need for power image Chinese title generation technology to solve it. However, existing image Chinese title generation methods face the problems of small training data sets, differences in specific applications, and few methods for generating Chinese titles for power images. To this end, this paper proposes a self-supervised learning-based image Chinese title generation algorithm for fault detection in electric robot inspection. Specifically, a contrastive learning-based model to automatically capture the semantic relationship between images and text. Then, we propose an end-to-end encoding-decoding model combined with an attention mechanism to obtain Chinese title generation for inspection images. The effectiveness of the proposed algorithm is experimentally verified on two real datasets.
暂无评论