检索结果-内蒙古大学图书馆

2023 5th International Conference on Artificial Intelligence and Computer applications, ICAICA 2023

作者： Li, Yuan Yu, Xin Modern Finance Industry School Shandong Institute of Commerce and Technology Shandong Jinan China

ISBN: (纸本)9798350323313

In this paper, the 3D space imaging model of machine vision is constructed. Starting from the traditional machine vision image processing algorithm flow, the image denoising process and target tracking process are optimized. The method uses the camera to collect the image and video information of the measured object, and transmits it to the controller. The controller corrects the signal obtained by the wireless sensor in the database to reproduce the position of the measured object and the 3D image. A real-time tracking method of motion trajectory based on computer vision is presented. The object autonomous capture, 3D position and motion trajectory tracking. Simulation experiments show that this method is quite different from conventional image processing methods. This method has the advantages of small computation, fast running speed and good real-time performance. It meets the needs of embedded image processing. © 2023 IEEE.

关键词： image recognition

来源：评论

学校读者我要写书评

暂无评论

Is Grad-CAM Explainable in Medical images? 1

引用

8th International Conference on Computer vision and image processing (CVIP)

作者： Suara, Subhashis Jha, Aayush Sinha, Pratik Sekh, Arif Ahmed XIM Univ Bhubaneswar India

ISBN: (数字)9783031581816

ISBN: (纸本)9783031581809;9783031581816

Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (https://***/ beasthunter758/GradEML).

关键词： Explainable Deep Learning Gradient-weighted Class Activation Mapping (Grad-CAM) Medical image Analysis

来源：评论

学校读者我要写书评

暂无评论

Model Pruning for Infrared-Visible image Fusion 2

Model Pruning for Infrared-Visible Image Fusion

引用

2nd International Conference on machine vision, image processing and Imaging Technology, MVIPIT 2024

作者： Chen, Qi Feng, Rui State Grid Beijing Electric Power Company Beijing China

ISBN: (纸本)9798331543037

Infrared-visible image fusion combines complementary information from both modalities, enhancing scene perception in applications such as surveillance and autonomous driving. However, existing deep learning-based methods are often computationally expensive. Our approach involves training an over-parameterized fusion network, applying structured pruning to reduce model complexity, and fine-tuning the pruned model to maintain performance. The pruning process leverages the L2-norm of the Restormer blocks, ensuring that less critical components are removed while preserving essential fusion quality. Experiments on the benchmark datasets demonstrate that our approach achieves high fusion quality with significantly reduced computational costs. Ablation studies further validate the effectiveness of our pruning strategy. ©2024 IEEE.

关键词： image fusion

来源：评论

学校读者我要写书评

暂无评论

Instruct Me More! Random Prompting for Visual In-Context Learning

Instruct Me More! Random Prompting for Visual In-Context Lea...

引用

IEEE/CVF Winter Conference on applications of Computer vision (WACV)

作者： Zhang, Jiahao Wang, Bowen Li, Liangzhi Nakashima, Yuta Nagahara, Hajime Osaka Univ Suita Osaka Japan

ISBN: (纸本)9798350318920;9798350318937

Large-scale models trained on extensive datasets, have emerged as the preferred approach due to their high generalizability across various tasks. In-context learning (ICL), a popular strategy in natural language processing, uses such models for different tasks by providing instructive prompts but without updating model parameters. This idea is now being explored in computer vision, where an input-output image pair (called an in-context pair) is supplied to the model with a query image as a prompt to exemplify the desired output. The efficacy of visual ICL often depends on the quality of the prompts. We thus introduce a method coined Instruct Me More (InMeMo), which augments in-context pairs with a learnable perturbation (prompt), to explore its potential. Our experiments on mainstream tasks reveal that InMeMo surpasses the current state-of-the-art performance. Specifically, compared to the baseline without learnable prompt, InMeMo boosts mIoU scores by 7.35 and 15.13 for foreground segmentation and single object detection tasks, respectively. Our findings suggest that InMeMo offers a versatile and efficient way to enhance the performance of visual ICL with lightweight training. Code is available at https://***/Jackieam/InMeMo.

关键词： Algorithms Algorithms and algorithms formulations image recognition and understanding machine learning architectures

来源：评论

学校读者我要写书评

暂无评论

Towards 360∘ image compression for machines via modulating pixel significance

引用

Multimedia Tools and applications 2024年第42期83卷 90271-90288页

作者： Zheng, Silin Shen, Xuelin Zhang, Qiudan Chen, Zhuo Yang, Wenhan Wang, Xu College of Computer Science and Software Engineering Shenzhen University Guangdong Shenzhen51800 China Guangdong Shenzhen51800 China Peng Cheng Laboratory Guangdong Shenzhen51800 China

The rapid growth of computer vision-based applications, including smart cities and autonomous driving, has created a pressing demand for efficient 360∘ image compression and computer vision analytics. In most circumstances, 360∘ image compression and computer vision face challenges arising from the oversampling inherent in the Equirectangular Projection (ERP). However, these two fields often employ divergent technological approaches. Since image compression aims to reduce redundancy, computer vision analytics attempts to compensate for the semantic distortion caused by the projection process, resulting in a potential conflict between the two objectives. This paper explores a potential route, i.e.360∘ image Coding for machine (360-ICM), which offers an image processing framework that addresses both object deformation and oversampling redundancy within a unified framework. The key innovation lies in inferring a pixel-wise significant map by jointly considering the requirements of redundancy removal and object deformation offsetting. The significance map would be subsequently fed to a deformation-aware image compression network, guiding the bit allocation process as an external condition. More specifically, we employ a deformation-aware image compression network that is characterized by the Spatial Feature Transform (SFT) layer, which is capable of performing complex affine transformations of high-level semantic features, to be essential in dealing with the deformation. The image compression network and significance inference network are jointly trained under the supervision of a 360∘ image-specified object detection network, obtaining a compact representation that is both analytics-oriented and deformation-aware. Extensive experimental results have demonstrated the superiority of the proposed method over existing state-of-the-art image codecs in terms of rate-analytics performance. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part

关键词：

来源：评论

学校读者我要写书评

暂无评论

machine Learning applications: From Computer vision to Robotics 1st

引用

2023年

作者： Indranath Chatterjee Sheetal Zalte

ISBN: (数字)9781394173358

ISBN: (纸本)9781394173327

machine Learning applications Practical resource on the importance of machine Learning and Deep Learning applications in various technologies and real-world situations machine Learning applications discusses methodological advancements of machine learning and deep learning, presents applications in image processing, including face and vehicle detection, image classification, object detection, image segmentation, and delivers real-world applications in healthcare to identify diseases and diagnosis, such as creating smart health records and medical imaging diagnosis, and provides real-world examples, case studies, use cases, and techniques to enable the reader’s active learning. Composed of 13 chapters, this book also introduces real-world applications of machine and deep learning in blockchain technology, cyber security, and climate change. An explanation of AI and robotic applications in mechanical design is also discussed, including robot-assisted surgeries, security, and space exploration. The book describes the importance of each subject area and detail why they are so important to us from a societal and human perspective. Edited by two highly qualified academics and contributed to by established thought leaders in their respective fields, machine Learning applications includes information on: Content based medical image retrieval (CBMIR), covering face and vehicle detection, multi-resolution and multisource analysis, manifold and image processing, and morphological processing Smart medicine, including machine learning and artificial intelligence in medicine, risk identification, tailored interventions, and association rules AI and robotics application for transportation and infrastructure (e.g., autonomous cars and smart cities), along with global warming and climate change Identifying diseases and diagnosis, drug discovery and manufacturing, medical imaging diagnosis, personalized medicine, and smart health records With its practical approach to the subject, Ma

关键词： Artificial Intelligence

来源：评论

学校读者我要写书评

暂无评论

Hybrid framework for correcting water-to-air image sequences

引用

APPLIED OPTICS 2024年第33期63卷 8575-8582页

作者： Cao, Yiqian Cai, Chengtao Meng, Haiyang Harbin Engn Univ Coll Intelligent Syst Sci & Engn Harbin 150001 Peoples R China Harbin Engn Univ Minist Educ Key Lab Intelligent Technol & Applicat Marine Equi Harbin 150001 Peoples R China Heilongjiang Prov Key Lab Environm Intelligent Per Harbin 150001 Peoples R China Shanghai Aerosp Control Technol Inst Shanghai 201109 Peoples R China

When an underwater camera captures aerial targets, the received light undergoes refraction at the water-air interface. In particular, the calm water compresses the image, while turbulent water causes nonlinear distortion in the captured images. However, existing methods for correcting water-to-air distortion often cause images with distortion or overall shifts. To address the above issue, we propose a multi-strategy hybrid framework to process image sequences effectively, particularly for high-precision applications. Our framework includes a spatiotemporal crossover block to transform and merge features, effectively addressing the template-free problem. Additionally, we introduce an enhancement network to produce a high-quality template in the first stage and a histogram template method to maintain high chromaticity and reduce template noise in the correction stage. Furthermore, our framework incorporates a new registration scheme to facilitate sequence transfer and processing. Compared to existing algorithms, our approach achieves a high restoration level in terms of morphology and color for publicly available image sequences. (c) 2024 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, and similar technologies, are reserved.

关键词： image enhancement image metrics Light propagation machine vision Segmentation Transforms

来源：评论

学校读者我要写书评

暂无评论

Implementation of Fast Gradient Sign Adversarial Attack on vision Transformer Model and Development of Defense Mechanism in Classification of Dermoscopy images 31

Implementation of Fast Gradient Sign Adversarial Attack on V...

引用

31st IEEE Conference on Signal processing and Communications applications (SIU)

作者： Kanca, Elif Ayas, Selen Kablan, Elif Baykal Ekinci, Murat Karadeniz Tech Univ Yazilim Muhendisligi Trabzon Turkiye Karadeniz Tech Univ Bilgisayar Muhendisligi Trabzon Turkiye

ISBN: (纸本)9798350343557

In recent years, deep learning has been successfully applied in medical images due to its ability to learn high complex and multidimensional data. However, it is known that deep learning models are vulnerable to adversarial machine learning attacks, which add small imperceptible perturbation to the legitimate input image, causing the models to produce incorrect results. In this study, Fast Gradient Sign Method is applied to vision Transformer, a basic pre-trained transformer-based model, which is used for binary classification of a publicly available skin lesion dataset and the robustness of the model is analyzed. Then, the adversarial training approach is used to improve the robustness of the model against adversarial attacks. The experimental results show that the classification accuracy is reduced from 90.1% to 27.38% even for a small perturbation, and the adversarial training approach increases the model's robustness with an accuracy value of 96.61%.

关键词： vision transformer model fast gradient sign method adversarial machine learning adversarial training

来源：评论

学校读者我要写书评

暂无评论

Style-Driven image Enhancement for Entry-Level Mobile Devices 7

Style-Driven Image Enhancement for Entry-Level Mobile Device...

引用

7th International Conference on machine vision and applications (ICMVA)

作者： Christian Matias, Angelo Patrick Del Gallego, Neil De La Salle Univ Manila Philippines De La Salle Univ Graph Animat Multimedia & Entertainment GAME Lab Manila Philippines

ISBN: (纸本)9798400716553

Modern smartphones usually have automatic camera adjustment features that predetermine how images will be processed. Without an intervention from the user (e.g., manual adjustment of exposure settings, addition/removal of certain image filters), the predetermined camera settings dictate the look and feel of images taken. Since higher-end mobile devices tend to gravitate towards a more visually appealing style and clearer images, image enhancement on entry-level devices could be performed by transferring the style from a higher-end device to a lower-end one. This paper proposes a learning-based, style-driven image enhancement for entry-level devices. Using a deep residual style transfer network, we train a model that learns the relationship between images taken from a high-end device and those taken from an entry-level device to create a filter that could be used to enhance the images captured from an entry-level device. Our quantitative and qualitative analyses show that our proposed method can enhance images to match the qualities produced by higher-end mobile device cameras.

关键词： image processing image enhancement super-resolution computational photography mobile devices

来源：评论

学校读者我要写书评

暂无评论

A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues

引用

ARTIFICIAL INTELLIGENCE REVIEW 2023年第11期56卷 13619-13661页

作者： Sharma, Himanshu Padha, Devanand Cent Univ Jammu Dept Comp Sci & Informat Technol Jammu & Kashmir Jammu 181124 India

image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensing, medical, and visual aid. The image captioning techniques have witnessed a paradigm shift from classical machine-learning-based approaches to the most contemporary deep learning-based techniques. We present an in-depth investigation of image captioning methodologies in this survey using our proposed taxonomy. Furthermore, the study investigates several eras of image captioning advancements, including template-based, retrieval-based, and encoder-decoder-based models. We also explore captioning in languages other than English. A thorough investigation of benchmark image captioning datasets and assessment measures is also discussed. The effectiveness of real-time image captioning is a severe barrier that prevents its use in sensitive applications such as visual aid, security, and medicine. Another observation from our research is the scarcity of personalized domain datasets that limits its adoption into more advanced issues. Despite influential contributions from several academics, further efforts are required to construct substantially robust and reliable image captioning models.

关键词： Attention-based image captioning Encoder-decoder architecture image captioning Multimodal embedding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：