Convolutional neural networks (CNNs) play an important role in an increasing number of imageprocessing tasks. There is an obvious demand to improve their classification performance and efficiency. Current research in...
详细信息
ISBN:
(纸本)9783031611360;9783031611377
Convolutional neural networks (CNNs) play an important role in an increasing number of imageprocessing tasks. There is an obvious demand to improve their classification performance and efficiency. Current research in this area tends to focus on developing increasingly complex models and algorithms to achieve this end. However, research into computer vision techniques and data augmentation tends to be neglected. This paper demonstrates that even a very simple CNN model achieves high performance in surface defect classification on the NEU dataset thanks to image preprocessing and data augmentation. The initial F1-score of 0.9646 without image preprocessing increases to 0.9727 when preprocessing is carried out. The simple CNN then achieves an F1-score of 0.9854 after data augmentation.
Research on vision-language models has seen rapid development, enabling natural language-based processing for image generation and manipulation. Existing text-driven image manipulation is typically implemented by GAN ...
详细信息
ISBN:
(数字)9783031442100
ISBN:
(纸本)9783031442094;9783031442100
Research on vision-language models has seen rapid development, enabling natural language-based processing for image generation and manipulation. Existing text-driven image manipulation is typically implemented by GAN inversion or fine-tuning diffusion models. The former is limited by the inversion capability ofGANs, which fail to reconstruct pictures with novel poses and perspectives. The latter methods require expensive optimization for each input, and fine-tuning is still a complex process. To mitigate these problems, we propose a novel approach, dubbed Diffusion-Adapter, which performs text-driven image manipulation using frozen pre-trained diffusion models. In this work, we design an Adapter architecture to modify the target attributes without fine-tuning the pretrained models. Our approach can be applied to diffusion models in any domain and only take a few examples to train the Adapter that could successfully edit images from unknown data. Compared with previous work, Diffusion-Adapter preserves a maximal amount of details from the original image without unintended changes to the input content. Extensive experiments demonstrate the advantages of our approach over competing baselines, and we make a novel attempt at text-driven image manipulation.
The increasing deployment of Advanced Driver Assistance Systems (ADAS) alongside the continual rise in camera sensor resolution has led to high bandwidth, and generally high cost, computation, and intra-vehicle commun...
详细信息
Autonomous terrain classification is an important problem in planetary navigation, whether the goal is to identify scientific sites of interest or to traverse treacherous areas safely. Past Martian rovers have relied ...
详细信息
In medical image analysis, unsupervised domain adaptation models require retraining when receiving samples from a new data distribution, and multi-source domain generalization methods might be infeasible when there is...
详细信息
machinevision in quality control and sorting applications enhance organizational *** vision systems are coupled with a pre-trained Convolutional Neural Network (CNN) to enhance the capability of the system for classi...
详细信息
ISBN:
(纸本)9780791888605
machinevision in quality control and sorting applications enhance organizational *** vision systems are coupled with a pre-trained Convolutional Neural Network (CNN) to enhance the capability of the system for classification and identification *** overarching research goal of this study is a) to understand how a CNN decides on classifying threaded fasteners, and b) how well does the CNN's decision making compare with that of a *** order to answer the first research question, an image-based fastener identification model augmented with a pre-trained CNN was *** CNN used is called Efficient-Net-b0, that can perform a wide range of image classification *** training set provided to the Efficient-Net-b0 model consisted of labeled images of 12 types of threaded *** data set was enlarged by using image augmentation techniques such as varying the brightness, contrast, and orientation of the captured *** results produced by the CNN classifier were then parsed through Gradient-weighted Class Activation Mapping (Grad-CAM).This technique produce visual explanations of the decisions made by *** is the XAI component of this *** provides transparency in the reasons for the identification and classification done by the Efficient-Net-b0, thereby providing context for the key feature of the threaded fastener that was used to classify and identify *** order to answer the second research question, a user study was *** participants of this study are novice and experienced mechanical engineers enrolled in a Bachelor's and a Master's program at two universities in the United *** aim of this study was to answer three research sub-questions, each of which was compared to the results from the Efficient-Netb0 as explained by *** three questions are: i) Can human subjects distinguish between fasteners within the same category?, ii) What features do human subjects look at when distinguishing betw
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes t...
详细信息
ISBN:
(数字)9798350365504
ISBN:
(纸本)9798350365511
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes to bring said processing to the focal plane, next to a high dynamic range image sensor working on the principle of lateral overflow capacitor. This brings the benefits of processing scenes with a wide dynamic range in a power efficient manner. Circuit simulations for edge detection, as an example of early visionprocessing conveyed in this paper, show that our proposal meets the accuracy typically found in applications like machinevision. Simulations are in XFAB’s XS018 technology.
The field of machinevision has witnessed a significant surge in the application of deep learning technology, as researchers increasingly leverage its capabilities in their work. While deep learning has been extensive...
详细信息
The field of machinevision has witnessed a significant surge in the application of deep learning technology, as researchers increasingly leverage its capabilities in their work. While deep learning has been extensively used in object detection and semantic segmentation, research on deep learning-based instance segmentation has gained significant traction only in recent years. Instance segmentation is a computer vision task that is closest to the real human visual experience and provides a deep understanding of image scenes. Instance segmentation encompasses more than just pixel-level segmentation of various object categories;it also involves the ability to distinguish and separate individual instances within each category. It can be widely applied in fields such as autonomous driving, assisted medical treatment, and remote sensing imaging. This article systematically summarizes some typical instance segmentation models in two parts: two-stage and single-stage, analyzes and compares the advantages and disadvantages of different algorithms, and conducts performance tests on the COCO dataset. This article also provides a brief introduction to the COCO dataset and instance segmentation evaluation indicators. Finally, the possible future development directions and challenges faced by instance segmentation are discussed.
In the world, several sign languages (SL) are used, and BSL (Baby Sign Language) is the process of communication between the parents and baby using gestures. Communication by gestures is a non-verbal process that util...
详细信息
In the world, several sign languages (SL) are used, and BSL (Baby Sign Language) is the process of communication between the parents and baby using gestures. Communication by gestures is a non-verbal process that utilizes motion to pass on realities, expressions and feelings to people. SL is the communication mode in which the information is conveyed via movement of body parts like cheeks, eyebrows and head. Even though many research works based on SL are available, research in BSL remains a challenge. Hence, this paper presents an optimization-based automated recognition of the deep BSL system, which determines the gesture signalled by the kids. Initially, the image frames are extracted from the videos and data augmentation processes are performed. After pre-processing, the features are extracted from the frames using the Enhanced Convolution Neural Network (ECNN). The optimal characteristics are then selected by a new Life Choice Based Optimizer (LCBO). Finally, the classification is carried out by the Deep Long Short-Term Memory (DLSTM) scheme. The implementation is performed on the Python platform, and the performances are evaluated using several performance metrics such as accuracy, precision, kappa, f1-score and recall. The performance of the proposed approach (ECNN-DLSTM) is compared with several deep and machine learning approaches and obtains an accuracy of 99% and a kappa of 96%.
暂无评论