检索结果-内蒙古大学图书馆

Visual Question Answering Optimized Framework using Mixed Precision Training

Visual Question Answering Optimized Framework using Mixed Pr...

2023 International Conference on Artificial Intelligence and applications, ICAIA 2023 and Alliance Technology Conference, ATCON-1 2023

作者： Chowdhury, Souvik Soni, Badal National Institute of Technology Silchar Department of Computer Science and Engineering Silchar India

ISBN: (纸本)9781665456272

Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical research significance and practical application value. This field of study makes use of multimodal learning, computer vision, and natural language processing techniques. Except for a few academics who presented different types of optimized bi-linear fusion approaches that integrate text and image characteristics in an efficient way, there haven't been many efforts to optimize the VQA framework. In order to optimize the VQA problem, we offer a unique Visual Question Answering framework in this research. Because both 16-bit and 32-bit floating points provide automatic mixed precision, deep learning architectures can now be optimized with less computation and execution time. Using the VQA 2.0 and CLEVR datasets, the proposed framework has been tested against two models. In terms of overall accuracy and execution time, the experimental findings demonstrated a significant improvement. © 2023 IEEE.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

Multi-level receptive field feature reuse for multi-focus image fusion

引用

machine vision AND applications 2022年第6期33卷 92-92页

作者： Jiang, Limai Fan, Hui Li, Jinjiang Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Peoples R China Univ Chinese Acad Sci Shenzhen Coll Adv Technol Shenzhen Peoples R China Shandong Technol & Business Univ Sch Comp Sci & Technol Yantai Peoples R China Coinnovat Ctr Shandong Coll & Univ Future Intelli Yantai Peoples R China

Multi-focus image fusion, which is the fusion of two or more images focused on different targets into one clear image, is a worthwhile problem in digital image processing. Traditional methods are usually based on frequency domain or space domain, but they cannot guarantee the accurate measurement of all the image details of the activity level, and also cannot perfect the selection of image fusion rules. Therefore, the deep learning method with strong feature representation ability is called the mainstream of multi-focus image fusion. However, until now, most of the deep learning frameworks have not balanced the relationship between the two input features, the shallow features and the feature fusion. In order to improve the defects of previous work, we propose an end-to-end deep network, which includes an encoder and a decoder. Encoder is a pseudo-Siamese network. It extracts the same and different feature sets by using the features of double encoder, then reuses the shallow features and finally forms the coding. In decoder, the coding will be analyzed and dimensionally reduced enough to generate high-quality fusion image. We carried out extensive experiments. The results show that our network structure is better. Compared with various image fusion methods based on deep learning and traditional multi-focus image fusion methods in recent years, our method is slightly better than theirs in both objective metric contrast and subjective visual contrast.

关键词： Multi-focus image fusion Deep learning Regression model Feature reuse

来源：评论

学校读者我要写书评

暂无评论

End-to-end optimized image compression with the frequency-oriented transform

引用

machine vision AND applications 2024年第2期35卷 27-27页

作者： Zhang, Yuefeng Lin, Kai Beijing Inst Comp Technol & Applicat 51th Yongding Rd Beijing 100039 Peoples R China Peking Univ Sch Comp Sci Beijing 100871 Peoples R China

image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

关键词： image compression image processing Computer vision machine learning

来源：评论

学校读者我要写书评

暂无评论

Streamlining Crop Segmentation with Multispectral Imaging and Foundation Models: Minimizing Manual Annotation 20

Streamlining Crop Segmentation with Multispectral Imaging an...

引用

20th IEEE International Conference on Intelligent Computer Communication and processing Conference, ICCP 2024

作者： Aszkowski, Przemyslaw Kraft, Marek Institute of Robotics and Machine Intelligence Poznań University of Technology Poznań Poland

ISBN: (纸本)9798331539979

Deep learning advancements have significantly enhanced computer vision applications in precision agriculture. While RGB cameras operating in visible light are affordable, they provide limited information compared to multispectral equipment. This research analyses methods to reduce the need for manual annotation when training a model using only RGB images, without compromising the model's accuracy. We propose a semi-supervised approach where a teacher model, trained on multispectral images, generates artificial ground truth data to train a student model that operates solely on RGB images. This strategy has enabled us to achieve nearly a tenfold reduction in the required training data while maintaining similar performance metrics. Additionally, we explore the potential of segmentation foundation models to simplify the manual annotation process, reducing the need for full segmentation masks to just bounding boxes. Our findings also indicate that using multispectral images as input for the Segment Anything Model is more effective than using RGB images. © 2024 IEEE.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

Multichannel Object Detection with Event Camera

Multichannel Object Detection with Event Camera

引用

International image processing, applications and Systems Conference (IPAS)

作者： Rafael Iliasov Alessandro Golkar Chair of Spacecraft Systems Technical University of Munich Munich Germany

ISBN: (数字)9798331506520

ISBN: (纸本)9798331506537

object detection based on event vision has been a dynamically growing field in computer vision for the last 16 years. In this work, we create multiple channels from a single event camera and propose an event fusion method (EFM) to enhance object detection in event-based vision systems. Each channel uses a different accumulation buffer to collect events from the event camera. We implement YOLOv7 for object detection, followed by a fusion algorithm. Our multichannel approach outperforms single-channel-based object detection by 0.7% in mean Average Precision (mAP) for detection overlapping ground truth with IOU = 0.5.

关键词： Computer vision Event detection machine vision AI accelerators Object detection Cameras Feature extraction Real-time systems

来源：评论

学校读者我要写书评

暂无评论

The Evolution and Application of Artificial Intelligence in Rhinology: A State of the Art Review

引用

OTOLARYNGOLOGY-HEAD AND NECK SURGERY 2023年第1期169卷 21-30页

作者： Amanian, Ameen Heffernan, Austin Ishii, Masaru Creighton, Francis X. Thamboo, Andrew Univ British Columbia Dept Surg Div Otolaryngol Head & Neck Surg Vancouver BC Canada Johns Hopkins Univ Sch Med Dept Otolaryngol Head & Neck Surg Baltimore MD 21205 USA

Objective To provide a comprehensive overview on the applications of artificial intelligence (AI) in rhinology, highlight its limitations, and propose strategies for its integration into surgical practice. Data Sources Medline, Embase, CENTRAL, Ei Compendex, IEEE, and Web of Science. Review Methods English studies from inception until January 2022 and those focusing on any application of AI in rhinology were included. Study selection was independently performed by 2 authors;discrepancies were resolved by the senior author. Studies were categorized by rhinology theme, and data collection comprised type of AI utilized, sample size, and outcomes, including accuracy and precision among others. Conclusions An overall 5435 articles were identified. Following abstract and title screening, 130 articles underwent full-text review, and 59 articles were selected for analysis. Eleven studies were from the gray literature. Articles were stratified into image processing, segmentation, and diagnostics (n = 27);rhinosinusitis classification (n = 14);treatment and disease outcome prediction (n = 8);optimizing surgical navigation and phase assessment (n = 3);robotic surgery (n = 2);olfactory dysfunction (n = 2);and diagnosis of allergic rhinitis (n = 3). Most AI studies were published from 2016 onward (n = 45). Implications for Practice This state of the art review aimed to highlight the increasing applications of AI in rhinology. Next steps will entail multidisciplinary collaboration to ensure data integrity, ongoing validation of AI algorithms, and integration into clinical practice. Future research should be tailored at the interplay of AI with robotics and surgical education.

关键词： artificial intelligence rhinology machine learning computer vision prediction prognosis

来源：评论

学校读者我要写书评

暂无评论

Cat-CNN: Human Eye Cataract Detection from Color Fundus Photograph with Deep CNN with Optimized Cascaded Network

Cat-CNN: Human Eye Cataract Detection from Color Fundus Phot...

引用

2024 IEEE International Conference on Signal processing, Information, Communication and Systems, SPICSCON 2024

作者： Islam, Md Sariful Bappy, Md Tusher Ahmad Shawon, Jubayer Ahmed Hasan, Mehedi Rahman, Wahidur Akter, Lija Azad, Mir Mohammad Dept. of Computer Science and Engineering Uttara University Uttara Dhaka Bangladesh Dept. of ICE Bangladesh Army University of Engineering and Technology Natore Bangladesh Dept. of Computer Science and Engineering Inpendent Researcher Dhaka Bangladesh

ISBN: (纸本)9798331510213

Cataracts are clouding of the lens in the eye, leading to loss of vision that can progress to blindness if not treated. This paper proposed a new method for automatic cataract detection using color fundus images and deep learning methods. A dataset of 1,105 color fundus images labeled by expert ophthalmologists was used in this process. We used seven pre-trained CNNs (DenseNet121, EfficientNetB0, MobileNetV2, InceptionV3, Xception, ResNet50, VGG16, and VGG19) for feature extraction before reducing the extracted features using PCA. We used the following combination of machine learning classifiers: SVC, RF, Decision Tree, Gaussian Naive Bayes, XGBoost, K-Nearest Neighbors, and Logistic Regression. For evaluating the models’ performance, we used accuracy, precision, recall, F1-score, and computational efficiency. For all metrics, MobileNetV2 with Random Forest achieved perfect scores: 100% accuracy, precision, recall, and F1-score, with an average processing time of 669 ms ± 28.8 ms. Thus, it can be applied in real-time applications. EfficientNetB0 with SVC gave an average accuracy of 87.33%, with the rest of the precision and recall metrics above 86%. Then, ResNet50, VGG16, and VGG19 followed with high accuracies between the range of 89.64% to 90.50%. It systemizes the proper choice of architectures of CNNs and classifiers, making the system both accurate and computationally efficient. Future work will include augmentation of the dataset, real-time support in the clinical setting, and advanced techniques for image preprocessing, generative adversarial networks. In addition, the development of an automated annotation tool, improvement of explainable AI, will further improve the deployment of robust AI systems in early diagnosis of cataracts, enhancing the outcome for patients. © 2024 IEEE.

关键词： Cataract detection CNN color fundus photography deep learning feature extraction image classification machine learning ophthalmology PCA

来源：评论

学校读者我要写书评

暂无评论

Enhancing Insect image Recognition with Sand Cat Swarm Optimization with Deep Feature Extraction 2

Enhancing Insect Image Recognition with Sand Cat Swarm Optim...

引用

2nd IEEE International Conference on Advances in Information Technology, ICAIT 2024

作者： Yogaraja, C.A. Priyanka, C. Priyadharshini, S.P. Babu, J. Jagan Rajagopal, S. Poorani, S. Ramco Institute of Technology Rajapalayam India School of Artificial Intelligence Amrita Vishwa Vidyapeetham Coimbatore Campus India St. Joseph's College of Engineering OMR Chennai119 India Department of ECE R.M.D Engineering College Kavaraipettai India Department of IT National Engineering College Kovilppati India Kongu Engineering College Perundurai Erode India

ISBN: (纸本)9798350383867

Insect image recognition (iiR) is a specified field in machine learning (ML) and computer vision that efforts to automatically recognise and detection of insect species utilizing visual data attained from images. Leveraging deep learning (DL) techniques, convolutional neural network (CNN), and image processing enables exact and effective classification of a huge range of insect species. iiR has many applications that range from biodiversity observation and pest control in farming in order to entomological study and disease vector identification. The unique procedure of insect detection, it permits experts, entomologists, and ecologists to attain valuable insights, make knowledgeable decisions, and update challenges that contain insect detection which contribute to more real conservation efforts, ecological research, and agriculture management. This work contains a framework of Insect image Recognition with Sand Cat Swarm Optimization with Deep Feature Extraction (iiR-SCSODFR) model. The developed iiR-SCSODFR method incorporates a complete procedure that starts with Gaussian filtering-based image pre-processing, improving image quality and decreasing noise in order to offer a clear basis for precise analysis. Deep feature extraction is carried out to capture intricate visual designs essential to numerous insect species, using advanced models for inclusive insect characterization. For exact insect detection, Long Short-Term Memory (LSTM) systems are employed, proficient in demonstrating time-based needs in image sequences. Besides, model leverages SCSO technique for parameter tuning, adjusting model's performance to adjust unique features of a dataset. The proposed iiR-SCSODFR model signifies an important leap forward in iiR, and provides a forceful and precise solution for automated identification of various insect species. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

An image quality assessment method based on edge extraction and singular value for blurriness

引用

machine vision AND applications 2024年第3期35卷 37-37页

作者： Zhou, Lei Liu, Chuanlin Yadav, Amit Azam, Sami Karim, Asif Southwest Petr Univ Sch Comp Sci & Software Engn Chengdu Peoples R China Sichuan DOOV Intelligent Cloud Valley Co Ltd Yibin Peoples R China Charles Darwin Univ Fac Sci & Technol Darwin NT 0909 Australia

The automatic assessment of perceived image quality is crucial in the field of image processing. To achieve this idea, we propose an image quality assessment (IQA) method for blurriness. The features of gradient and singular value were extracted in this method instead of the single feature in the traditional IQA algorithms. According to the insufficient size of existing public image quality assessment datasets to support deep learning, machine learning was introduced to fuse the features of multiple domains, and a new no-reference (NR) IQA method for blurriness denoted Feature fusion IQA(Ffu-IQA) was proposed. The Ffu-IQA uses a probabilistic model to estimate the probability of each edge detection blur in the image, and then uses machine learning to aggregate the probability information to obtain the edge quality score. After that uses the singular value obtained by singular value decomposition of the image matrix to calculate the singular value score. Finally, machine learning pooling is used to obtain the true quality score. Ffu-IQA achieves PLCC scores of 0.9570 and 0.9616 on CSIQ and TID2013, respectively, and SROCC scores of 0.9380 and 0.9531, which are better than most traditional image quality assessment methods for blurriness.

关键词： image quality assessment No-reference Blur Gradient Singular value machine learning

来源：评论

学校读者我要写书评

暂无评论

机器视觉在食品无损检测中的应用研究进展

引用

中国食品学报 2024年第12期24卷 13-27页

作者：唐彦嵩徐锐豪王夙加清华大学深圳国际研究生院广东深圳518055

随着全球食品消费需求的增加,食品无损检测技术在食品质量控制和安全保障中变得日益重要。本文系统综述机器视觉在食品无损检测中的应用与发展趋势。通过分析当前文献,探讨包括RGB成像、多光谱成像、高光谱成像等多种成像技术,以及图像... 详细信息

随着全球食品消费需求的增加,食品无损检测技术在食品质量控制和安全保障中变得日益重要。本文系统综述机器视觉在食品无损检测中的应用与发展趋势。通过分析当前文献,探讨包括RGB成像、多光谱成像、高光谱成像等多种成像技术,以及图像处理、机器学习和深度学习等检测算法在食品无损检测中的应用。分析机器视觉在食品无损检测中应用的技术挑战,如数据集的匮乏和模型在通用场景下泛化能力不足。基于当前研究现状,展望未来的研究方向,提出多模态数据融合、嵌入式检测系统以及与深度学习技术的紧密结合等可能的发展路径,旨在为食品无损检测技术的创新提供参考和方向。

关键词：食品无损检测食品安全机器视觉机器学习深度学习

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：