To meet the needs of teaching and practical applications in machinevision technology, a virtual reality-based machinevision experimental platform has been designed and developed. Unity3D was utilized as the developm...
详细信息
In the exploration of robot vision systems based on artificial neural networks, the research mainly focuses on their applications in 3D information recognition and processing. By simulating the processing of the human...
详细信息
India holds the title of being the top banana producer globally, contributing approximately 25% of the total banana production. However, exporting it can be a challenge because of its shelf-life. To propose the best p...
详细信息
ISBN:
(纸本)9783031581731;9783031581748
India holds the title of being the top banana producer globally, contributing approximately 25% of the total banana production. However, exporting it can be a challenge because of its shelf-life. To propose the best possible shelf-life extension methodology, it is important to classify based on the banana varieties and ripening stages to ensure sustainable growth and nutritional value. There are still not enough data sets with different varieties of bananas and their respective ripening stages. A review of research publications from the last five years has been conducted using electronic databases like Scopus, Google Scholar, and Research-Gate, as well as the details of publicly accessible dataset repository sites. The dataset captures images of different varieties of banana fruit as well as its respective different stages of ripening. Banana varieties considered include Robusta (MusaAA), Dwarf Cavendish (Musaacuminata), Nanjangud bananas, and Red bananas (Musa acuminata). The dataset contains over 41,900 processed images. In this paper, the authors provide researchers with an opportunity to develop and investigate machine learning and deep learning algorithms that are used to predict and extend the shelf life of banana fruits.
imageprocessing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent visionapplications. Traditionally, task-specific models are developed ...
详细信息
imageprocessing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent visionapplications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general imageprocessing that covers image restoration, image enhancement, image feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies these diverse imageprocessing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the imageprocessing task as a prompting QA problem. PromptGIP can undertake diverse cross-domain tasks using provided visual prompts, eliminating the need for task-specific finetuning. Capable of handling up to 15 different imageprocessing tasks, PromptGIP represents a versatile and adaptive approach to general imageprocessing. Codes will be available at https://***/lyh-18/PromptGIP. Copyright 2024 by the author(s)
Particulate matter in the atmosphere obscures the visibility of the atmosphere, causing a condition known as haze. Other natural phenomena like mist, fog, and dust also obscure the vision;this is because of scattering...
详细信息
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are cr...
详细信息
ISBN:
(数字)9783031581816
ISBN:
(纸本)9783031581809;9783031581816
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (https://***/ beasthunter758/GradEML).
Traditional remote sensing imageprocessing is not able to provide timely information for near real-time applications due to the hysteresis of satellite-ground mutual communication and low processing efficiency. On-bo...
详细信息
Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of t...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of the non-object regions into random segments. This is a critical limitation given the unsupervised setting, where object segments and noise are not distinguishable. To address this limitation we propose BMOD, a Background-aware Motion-guided Objects Discovery method. Concretely, we leverage masks of moving objects extracted from optical flow and design a learning mechanism to extend them to the true foreground composed of both moving and static objects. The background, a complementary concept of the learned foreground class, is then isolated in the object discovery process. This enables a joint learning of the objects discovery task and the object/non-object separation. The conducted experiments on synthetic and real-world datasets show that integrating our background handling with various cutting-edge methods brings each time a considerable improvement. Specifically, we improve the objects discovery performance with a large margin, while establishing a strong baseline for object/non-object separation.
Depth information is useful in many imageprocessing and computer visionapplications, but in photography, depth information is lost in the process of projecting a real-world scene onto a 2D plane. Extracting depth in...
详细信息
In recent years, Transformer models have revolutionized machine learning. While this has resulted in impressive results in the field of Natural Language processing, Computer vision quickly stumbled upon computation an...
详细信息
ISBN:
(纸本)9798350370287;9798350370713
In recent years, Transformer models have revolutionized machine learning. While this has resulted in impressive results in the field of Natural Language processing, Computer vision quickly stumbled upon computation and memory problems due to the high resolution and dimensionality of the input data. This is particularly true for video, where the number of tokens increases cubically relative to the frame and temporal resolutions. A first approach to solve this was vision Transformers, which introduce a partitioning of the input into embedded grid cells, lowering the effective resolution. More recently, Swin Transformers introduced a hierarchical scheme that brought the concepts of pooling and locality to transformers in exchange for much lower computational and memory costs. This work proposes a reformulation of the latter that views Swin Transformers as regular Transformers applied over a quadtree representation of the input, intrinsically providing a wider range of design choices for the attentional mechanism. Compared to similar approaches such as Swin and MaxViT, our method works on the full range of scales while using a single attentional mechanism, allowing us to simultaneously take into account both dense short range and sparse long range dependencies with low computational overhead and without introducing additional sequential operations, thus making full use of GPU parallelism.
暂无评论