In this study, we propose a technique to improve the accuracy and reduce the size of convolutional neural networks (CNNs) running on edge devices for real-world robotvision applications. CNNs running on edge devices ...
详细信息
ISBN:
(纸本)9798350379068;9798350379051
In this study, we propose a technique to improve the accuracy and reduce the size of convolutional neural networks (CNNs) running on edge devices for real-world robotvision applications. CNNs running on edge devices must have a small architecture, and CNNs for robotvision applications involving on-site object recognition must be able to be trained efficiently to identify specific visual targets from data obtained under a limited variation of conditions. The visual nervous system (VNS) is a good example that meets the above requirements because it learns from few visual experiences. Therefore, we used a Gabor filter, a model of the feature extractor of the VNS, as a preprocessor for CNNs to investigate the accuracy of the CNNs trained with small amounts of data. To evaluate how well CNNs trained on image data acquired under a limited variation of conditions generalize to data acquired under other conditions, we created an image dataset consisting of images acquired from different camera positions, and investigated the accuracy of the CNNs that trained using images acquired at a certain distance. The results were compared after training on multiple CNN architectures with and without Gabor filters as preprocessing. The results showed that preprocessing with Gabor filters improves the generalization performance of CNNs and contributes to reducing the size of CNNs.
In this paper, a robotvision recognition system is developed based on the robot Operating System (ROS) and the Open Source Computer vision (Open CV), which mainly implements face recognition, object detection, motion...
详细信息
ISBN:
(纸本)9781665464680
In this paper, a robotvision recognition system is developed based on the robot Operating System (ROS) and the Open Source Computer vision (Open CV), which mainly implements face recognition, object detection, motion analysis and object segmentation of the robot.
Pathogenic bacterial growth detection and monitoring is an important scientific process in the field of quality control in the food, water, and medical industries. Very-large-scale process of such bacteria growth moni...
详细信息
In recent years, due to UV human exposure, the number of skin cancers 'subjects' cases have been increased, therefore, the accurate detection of malign skin cancer at early stage is considered as very crucial ...
详细信息
ISBN:
(纸本)9798350351491;9798350351484
In recent years, due to UV human exposure, the number of skin cancers 'subjects' cases have been increased, therefore, the accurate detection of malign skin cancer at early stage is considered as very crucial for patients' therapy and to increase the survival rates. Melanomas is considered as the most frequent and dangerous type of skin cancer. Even a huge number of deep-learning (DL) and Machine Learning (ML) based-classification methods have been introduced in the literature, there have been suspected cases during the clinical diagnosis of malignant lesions. This paper investigates and explores various DL-based models for an accurate diagnosis and detection of malign and benign skin lesions. Basically, Transfer learning (TL) techniques are adapted to efficient and accurate pre-trained models, mainly EfficientNet-B0-V2 and vision Transformers ViT-b16, on the image-Net datasets. Furthermore, a modified Convolutional Neural Network (CNN) model have been adopted and trained from scratch. A publicly available benchmark dataset has been used in order to evaluate the proposed models 'performances and to compare their effectiveness with state-of-the-arts exiting methods. The obtained results are respectively 79,70%, 86,52%, and 86.97% respectively for CNN, EfficientNet-B0-V2, and ViT-b16 models. The experiments have revealed the effectiveness of our proposed models compared to exiting DL and ML models for classification into benign and malignant skin lesions.
With the promotion and application of LNG loading skid automatic control technology and information management technology, LNG loading stations are gradually developing towards unmanned stations. In the unmanned stati...
详细信息
Needle localization in ultrasound images is pivotal for the successful execution of ultrasound-guided core needle biopsies. Automating the needle detection process can decrease the procedure time and lead to a more pr...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Needle localization in ultrasound images is pivotal for the successful execution of ultrasound-guided core needle biopsies. Automating the needle detection process can decrease the procedure time and lead to a more precise diagnosis. In this article, we introduce an automatic method for detecting the core needle and determining its trajectory in 2D ultrasound images. In our approach, the vision Transformer architecture, renowned for its self-attention mechanisms is used for needle detection and segmentation, and is followed by the analysis of the Radon transformed segmentation mask to identify the needle's trajectory. The experiments, performed over two clinical datasets of more than 600 ultrasound images rigorously split into various training-test subsets and backed up with a variety of statistical analyses revealed that our approach offers high-quality needle segmentation, and significantly outperforms other techniques in identifying the needle's trajectory, with the trajectory localization errors reduced up to more than 5x when compared to the most competitive deep learning algorithm. We believe that our work may pave the way for more accurate and efficient ultrasound-guided procedures, ultimately improving patient outcomes.
vision Language Pre-training Models have shown significant potential in various domains, but there are few attempts to introduce it in the field of continual learning for video action recognition. We propose Video Cla...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
vision Language Pre-training Models have shown significant potential in various domains, but there are few attempts to introduce it in the field of continual learning for video action recognition. We propose Video Class-Incremental Learner with CLIP based Transformer (VCIL-CT), which uses CLIP based vision transformer to train action recognition task by class-incremental learning pipeline. To specifically address the issue of catastrophic forgetting in transformer, we introduce Attention Distillation, which distilling the attention feature from each transformer decoder. In the process of incremental learning of classes, there may be a problem of high bias towards new classes, we incorporate Class Balance Module to prevent bias on new task. Furthermore, we adopt Exemplar Augment strategy to improve exemplar quality on data replay step. We evaluate our proposed method based on the incremental action recognition benchmark presented by TCD, using UCF101, HMDB51, and UESTC-MMEA-CL datasets, and demonstrate the effectiveness of our algorithm compared to existing state-of-the-art continuous learning methods for action recognition.
This paper presents an application of the Complex Fuzzy Set (CFS) concept to the adaptation of an automated condition monitoring method (CMM). It is founded on the previous work from Ramot, which introduced the core a...
详细信息
ISBN:
(纸本)9798350358513;9798350358520
This paper presents an application of the Complex Fuzzy Set (CFS) concept to the adaptation of an automated condition monitoring method (CMM). It is founded on the previous work from Ramot, which introduced the core aspects of the CFS and defined a related technique for measuring the similarity between two signals. The technique is adapted to the important problem of predicting the health of a system or machine. Some analyses based on synthetic signals are performed to theoretically support main aspects of the method. Other results focus on the use of synthetic signals from a robot model to monitor robot joint degradation, showing the potential of the CMM for different industrial applications that could benefit from a soft online condition monitoring approach.
With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligen...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
With the advance of deep learning in the BigData era, image/video coding for machines (VCM) as called for proposals by the moving picture experts group (MPEG) now becomes the pivotal technique for extensive intelligent vision tasks. However, existing VCM methods typically focus on compressing features independently at each scale, ignoring the redundancy of features across multiple scales. This paper thus introduces a simple yet effective architecture called hybrid single input and multiple output (H-SIMO) for VCM, which can significantly reduce the redundancy across scales of features. More specifically, as the pyramid structure is commonly employed for localising multi-scale objects, our HSIMO method proposes to compress all features by inputting a single-scale feature while retaining the ability to decompress all the features. Moreover, an entropy model is seamlessly integrated into the training process to efficiently reduce the statistical redundancy of features. During the testing phase, the hybrid coding method, in conjunction with the versatile video coding (VVC), is employed to compress the features from both images and videos. We comprehensively evaluate the performance of our H-SIMO method in two standard machine vision tasks: object detection and instance segmentation, in which the experimental results verify the superior performances of our H-SIMO method.
This paper introduces the structure and operation mode of automatic production line based on the actual situation of laser quenching automatic production line of tool in enterprises. robotvision integrates workpiece ...
详细信息
ISBN:
(纸本)9781665464680
This paper introduces the structure and operation mode of automatic production line based on the actual situation of laser quenching automatic production line of tool in enterprises. robotvision integrates workpiece positioning coordinates with robot coordinates to realize the positioning and grasping function of robot through machine vision. Focus on OpenCV image processing methods. This paper describes its principle and possible problems from the aspects of system structure, robot coordinate calibration, visual identification and positioning and software design.
暂无评论