image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decisi...
详细信息
ISBN:
(纸本)9789819612413;9789819612420
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://***/Tanghaha1424/gpt-fashionmnist.
This research introduces "Jaddah,"an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresses...
详细信息
This study investigates the capabilities and flexibility of edge devices for real-time data processing near the source. A configurable Nvidia Jetson Nano system is used to deploy nine pre-trained computer vision model...
详细信息
Object Detection and Tracking with Recognition (ODTR) is a predominant field that finds significant applications in surveillance and assisting visually Impaired (vi) people. Though ODTR is possible with the implicatio...
详细信息
The reconfiguration of machinevision systems heavily depends on the collection and availability of large datasets, rendering them inflexible and vulnerable to even minor changes in the data. This paper proposes a ref...
详细信息
ISBN:
(纸本)9798350337440
The reconfiguration of machinevision systems heavily depends on the collection and availability of large datasets, rendering them inflexible and vulnerable to even minor changes in the data. This paper proposes a refinement of Miller's Cartesian Genetic Programming methodology, aimed at generating filter pipelines for imageprocessing tasks. The approach is based on CGP-IP, but specifically adapted for imageprocessing in industrial monitoring applications. The suggested method allows for retraining of filter pipelines using small datasets;this concept of self-adaptivity renders high-precision machinevision more resilient to faulty machine settings or changes in the environment and provides compact programs. A dependency graph is introduced to rule out invalid pipeline solutions. Furthermore, we suggest to not only generate pipelines from scratch, but store and reapply previous solutions and re-adjust filter parameters. Our modifications are designed to increase the likelihood of early convergence and improvement in the fitness indicators. This form of self-adaptivity allows for a more resource-efficient configuration of image filter pipelines with small datasets.
The integration of vision and language has propelled the advancement of artificial intelligence systems. visual Question Answering (VQA) stands at the nexus of computer vision and natural language processing, enabling...
详细信息
Sonar image segmentation technique is crucial for underwater target tracking, among other things. Due to the undersea environment's influence, noise is easily absorbed, which leads to a poor tracking performance. ...
详细信息
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are on...
详细信息
ISBN:
(纸本)9783031683015;9783031683022
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are only available in paper form, creating significant barriers for modern automated systems. This study tackles these challenges by employing advanced deep-learning techniques alongside traditional imageprocessing to convert legacy engineering drawings into structured, machine-readable formats. Following this digitization process, this multi-modal approach further processes drawings containing a lot of heterogeneous data by filtering non-essential details to isolate and extract critical features. This process enables the conversion of complex drawings into formats suitable for computer vision and deep learning applications. The structured datasets resulting from this process are then utilized to enhance the efficiency of automated processes significantly. For instance, they enable more efficient pick-and-place operations by providing the data necessary for machine learning-driven automation.
Independent adversarial sample detection is an important problem in the field of computer vision and machine learning, especially in the context of the widespread use of deep learning models. This can lead to misclass...
详细信息
In recent years, underwater imageprocessing has been a hot topic in machinevision, especially for underwater robots. A key part of underwater imageprocessing is underwater image restoration. However, underwater ima...
详细信息
ISBN:
(纸本)9789819916443;9789819916450
In recent years, underwater imageprocessing has been a hot topic in machinevision, especially for underwater robots. A key part of underwater imageprocessing is underwater image restoration. However, underwater image restoration is an essential but challenging task in the field of imageprocessing. In this article, we propose an underwater image restoration framework based on physical priors, called PPIR-Net. The PPIR-Net combines prior knowledge with deep learning to greatly improve the structural texture and color information of underwater images. The framework estimates underwater transmission maps and underwater scattering maps through the structure restoration network (SRN). Moreover, the color correction network (CCN) is used to achieve image color correction. Extensive experimental results show that our method exceeds state-of-the-art methods on underwater image evaluation metrics.
暂无评论