In this paper, the 3D space imaging model of machinevision is constructed. Starting from the traditional machinevisionimageprocessing algorithm flow, the image denoising process and target tracking process are opt...
详细信息
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are cr...
详细信息
ISBN:
(数字)9783031581816
ISBN:
(纸本)9783031581809;9783031581816
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (https://***/ beasthunter758/GradEML).
Infrared-visible image fusion combines complementary information from both modalities, enhancing scene perception in applications such as surveillance and autonomous driving. However, existing deep learning-based meth...
详细信息
Large-scale models trained on extensive datasets, have emerged as the preferred approach due to their high generalizability across various tasks. In-context learning (ICL), a popular strategy in natural language proce...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Large-scale models trained on extensive datasets, have emerged as the preferred approach due to their high generalizability across various tasks. In-context learning (ICL), a popular strategy in natural language processing, uses such models for different tasks by providing instructive prompts but without updating model parameters. This idea is now being explored in computer vision, where an input-output image pair (called an in-context pair) is supplied to the model with a query image as a prompt to exemplify the desired output. The efficacy of visual ICL often depends on the quality of the prompts. We thus introduce a method coined Instruct Me More (InMeMo), which augments in-context pairs with a learnable perturbation (prompt), to explore its potential. Our experiments on mainstream tasks reveal that InMeMo surpasses the current state-of-the-art performance. Specifically, compared to the baseline without learnable prompt, InMeMo boosts mIoU scores by 7.35 and 15.13 for foreground segmentation and single object detection tasks, respectively. Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training. Code is available at https://***/Jackieam/InMeMo.
The rapid growth of computer vision-based applications, including smart cities and autonomous driving, has created a pressing demand for efficient 360∘ image compression and computer vision analytics. In most circums...
machine Learning applications Practical resource on the importance of machine Learning and Deep Learning applications in various technologies and real-world situations machine Learning applications discusses methodolo...
详细信息
ISBN:
(数字)9781394173358
ISBN:
(纸本)9781394173327
machine Learning applications Practical resource on the importance of machine Learning and Deep Learning applications in various technologies and real-world situations machine Learning applications discusses methodological advancements of machine learning and deep learning, presents applications in imageprocessing, including face and vehicle detection, image classification, object detection, image segmentation, and delivers real-world applications in healthcare to identify diseases and diagnosis, such as creating smart health records and medical imaging diagnosis, and provides real-world examples, case studies, use cases, and techniques to enable the reader’s active learning. Composed of 13 chapters, this book also introduces real-world applications of machine and deep learning in blockchain technology, cyber security, and climate change. An explanation of AI and robotic applications in mechanical design is also discussed, including robot-assisted surgeries, security, and space exploration. The book describes the importance of each subject area and detail why they are so important to us from a societal and human perspective. Edited by two highly qualified academics and contributed to by established thought leaders in their respective fields, machine Learning applications includes information on: Content based medical image retrieval (CBMIR), covering face and vehicle detection, multi-resolution and multisource analysis, manifold and imageprocessing, and morphological processing Smart medicine, including machine learning and artificial intelligence in medicine, risk identification, tailored interventions, and association rules AI and robotics application for transportation and infrastructure (e.g., autonomous cars and smart cities), along with global warming and climate change Identifying diseases and diagnosis, drug discovery and manufacturing, medical imaging diagnosis, personalized medicine, and smart health records With its practical approach to the subject, Ma
When an underwater camera captures aerial targets, the received light undergoes refraction at the water-air interface. In particular, the calm water compresses the image, while turbulent water causes nonlinear distort...
详细信息
When an underwater camera captures aerial targets, the received light undergoes refraction at the water-air interface. In particular, the calm water compresses the image, while turbulent water causes nonlinear distortion in the captured images. However, existing methods for correcting water-to-air distortion often cause images with distortion or overall shifts. To address the above issue, we propose a multi-strategy hybrid framework to process image sequences effectively, particularly for high-precision applications. Our framework includes a spatiotemporal crossover block to transform and merge features, effectively addressing the template-free problem. Additionally, we introduce an enhancement network to produce a high-quality template in the first stage and a histogram template method to maintain high chromaticity and reduce template noise in the correction stage. Furthermore, our framework incorporates a new registration scheme to facilitate sequence transfer and processing. Compared to existing algorithms, our approach achieves a high restoration level in terms of morphology and color for publicly available image sequences. (c) 2024 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, and similar technologies, are reserved.
In recent years, deep learning has been successfully applied in medical images due to its ability to learn high complex and multidimensional data. However, it is known that deep learning models are vulnerable to adver...
详细信息
ISBN:
(纸本)9798350343557
In recent years, deep learning has been successfully applied in medical images due to its ability to learn high complex and multidimensional data. However, it is known that deep learning models are vulnerable to adversarial machine learning attacks, which add small imperceptible perturbation to the legitimate input image, causing the models to produce incorrect results. In this study, Fast Gradient Sign Method is applied to vision Transformer, a basic pre-trained transformer-based model, which is used for binary classification of a publicly available skin lesion dataset and the robustness of the model is analyzed. Then, the adversarial training approach is used to improve the robustness of the model against adversarial attacks. The experimental results show that the classification accuracy is reduced from 90.1% to 27.38% even for a small perturbation, and the adversarial training approach increases the model's robustness with an accuracy value of 96.61%.
Modern smartphones usually have automatic camera adjustment features that predetermine how images will be processed. Without an intervention from the user (e.g., manual adjustment of exposure settings, addition/remova...
详细信息
ISBN:
(纸本)9798400716553
Modern smartphones usually have automatic camera adjustment features that predetermine how images will be processed. Without an intervention from the user (e.g., manual adjustment of exposure settings, addition/removal of certain image filters), the predetermined camera settings dictate the look and feel of images taken. Since higher-end mobile devices tend to gravitate towards a more visually appealing style and clearer images, image enhancement on entry-level devices could be performed by transferring the style from a higher-end device to a lower-end one. This paper proposes a learning-based, style-driven image enhancement for entry-level devices. Using a deep residual style transfer network, we train a model that learns the relationship between images taken from a high-end device and those taken from an entry-level device to create a filter that could be used to enhance the images captured from an entry-level device. Our quantitative and qualitative analyses show that our proposed method can enhance images to match the qualities produced by higher-end mobile device cameras.
image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensi...
详细信息
image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensing, medical, and visual aid. The image captioning techniques have witnessed a paradigm shift from classical machine-learning-based approaches to the most contemporary deep learning-based techniques. We present an in-depth investigation of image captioning methodologies in this survey using our proposed taxonomy. Furthermore, the study investigates several eras of image captioning advancements, including template-based, retrieval-based, and encoder-decoder-based models. We also explore captioning in languages other than English. A thorough investigation of benchmark image captioning datasets and assessment measures is also discussed. The effectiveness of real-time image captioning is a severe barrier that prevents its use in sensitive applications such as visual aid, security, and medicine. Another observation from our research is the scarcity of personalized domain datasets that limits its adoption into more advanced issues. Despite influential contributions from several academics, further efforts are required to construct substantially robust and reliable image captioning models.
暂无评论