检索结果-内蒙古大学图书馆

Visual Question Answering Optimized Framework using Mixed Precision Training

Visual Question Answering Optimized Framework using Mixed Pr...

2023 International Conference on Artificial Intelligence and applications, ICAIA 2023 and Alliance Technology Conference, ATCON-1 2023

作者： Chowdhury, Souvik Soni, Badal National Institute of Technology Silchar Department of Computer Science and Engineering Silchar India

ISBN: (纸本)9781665456272

Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical research significance and practical application value. This field of study makes use of multimodal learning, computer vision, and natural language processing techniques. Except for a few academics who presented different types of optimized bi-linear fusion approaches that integrate text and image characteristics in an efficient way, there haven't been many efforts to optimize the VQA framework. In order to optimize the VQA problem, we offer a unique Visual Question Answering framework in this research. Because both 16-bit and 32-bit floating points provide automatic mixed precision, deep learning architectures can now be optimized with less computation and execution time. Using the VQA 2.0 and CLEVR datasets, the proposed framework has been tested against two models. In terms of overall accuracy and execution time, the experimental findings demonstrated a significant improvement. © 2023 IEEE.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

Classification of Animal Species Using a Deep Neural Network-Based Feature Extraction Method 21

Classification of Animal Species Using a Deep Neural Network...

引用

21st IEEE International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT, HONET 2024

作者： Ibrahim, Mohammed Al-Kubaise, Khamis Alkapti, Ali Almusa, Abdullah Abdelaziz, Osama Al-Maadeed, Somaya Sadasivuni, Kishor Kumar Qatar University Computer Science and Engineering Department Doha Qatar Center for Advanced Materials Qatar University PoBox 2713 Doha Qatar

ISBN: (纸本)9798350378078

This study presents an innovative approach to animal classification and recognition utilizing machine learning and deep learning methodologies. Leveraging advanced algorithms, the proposed system achieves remarkable accuracy in identifying diverse animal species. By integrating sophisticated image processing techniques, the system enhances image quality, improving overall performance. The research demonstrated that the SVM model combined with deep neural network-based feature extraction achieved the highest accuracy of 95.65%. This paper represents a significant stride toward improving the precision and efficiency of animal classification, offering promising applications in biodiversity conservation and ecological monitoring by using advanced feature extraction approach with deep learning. © 2024 IEEE.

关键词： Animal Recognition Computer vision Deep Learning Feature Extraction KNN

来源：评论

学校读者我要写书评

暂无评论

A lightweight pineapple detection network based on YOLOv7-tiny for agricultural robot system

引用

COMPUTERS AND ELECTRONICS IN AGRICULTURE 2025年 231卷

作者： Li, Jiehao Li, Chenglin Zeng, Shan Luo, Xiwen Chen, C. L. Philip Yang, Chenguang South China Agr Univ Coll Engn Key Lab Key Technol Agr Machine & Equipment Minist Educ Guangzhou 510642 Peoples R China South China Univ Technol Sch Comp Sci & Engn Guangzhou 510641 Peoples R China Univ Liverpool Dept Comp Sci Liverpool L693BX England

Automatic detection of pineapples in complex agricultural environments poses several challenges. During harvesting, pineapples that are suitable for collection exhibit intricate scaly surface textures and a wide range of colors. Moreover, occlusion by leaves and fluctuating lighting conditions further complicate the detection of pineapples. In this paper, we propose a high-precision lightweight detection network based on the improved You Only Look Once version 7-tiny (Pineapple-YOLO) for the robot vision system to realize realtime and accurate detection of pineapple. The Convolutional Block Attention Module (CBAM) is embedded into the backbone network to enhance the feature extraction capability, and the Content-Aware Reassembly of Features (CARAFE) is introduced to perform up-sampling operations and expand the receptive field. The Scylla Intersection over Union (SIoU) loss function is used instead of the Complete Intersection over Union (CIoU) loss function to consider the vector angles and redefine the penalty criteria. Finally, the K-means++ clustering algorithm is used to re-cluster the labels of the pineapple dataset and update the size of the anchor. The experimental results show that Pineapple-YOLO achieves a mAP@0.5 of 89.7%, which is a 6.15% improvement over the original YOLOv7-tiny, demonstrating its superiority over other mainstream target detection models. Furthermore, in diverse natural environments where the agricultural robot operates, the Pineapple-YOLO algorithm sustains a commendable 92% success rate in fruit picking, achieved within an average time of 12 s. This demonstrates the efficiency of the visual module in practical engineering applications.

关键词： Agricultural robotics Target detection image processing Lightweight networks Pineapple

来源：评论

学校读者我要写书评

暂无评论

End-to-end optimized image compression with the frequency-oriented transform

引用

machine vision AND applications 2024年第2期35卷 27-27页

作者： Zhang, Yuefeng Lin, Kai Beijing Inst Comp Technol & Applicat 51th Yongding Rd Beijing 100039 Peoples R China Peking Univ Sch Comp Sci Beijing 100871 Peoples R China

image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

关键词： image compression image processing Computer vision machine learning

来源：评论

学校读者我要写书评

暂无评论

Multichannel Object Detection with Event Camera

Multichannel Object Detection with Event Camera

引用

International image processing, applications and Systems Conference (IPAS)

作者： Rafael Iliasov Alessandro Golkar Chair of Spacecraft Systems Technical University of Munich Munich Germany

ISBN: (数字)9798331506520

ISBN: (纸本)9798331506537

object detection based on event vision has been a dynamically growing field in computer vision for the last 16 years. In this work, we create multiple channels from a single event camera and propose an event fusion method (EFM) to enhance object detection in event-based vision systems. Each channel uses a different accumulation buffer to collect events from the event camera. We implement YOLOv7 for object detection, followed by a fusion algorithm. Our multichannel approach outperforms single-channel-based object detection by 0.7% in mean Average Precision (mAP) for detection overlapping ground truth with IOU = 0.5.

关键词： Computer vision Event detection machine vision AI accelerators Object detection Cameras Feature extraction Real-time systems

来源：评论

学校读者我要写书评

暂无评论

Streamlining Crop Segmentation with Multispectral Imaging and Foundation Models: Minimizing Manual Annotation 20

Streamlining Crop Segmentation with Multispectral Imaging an...

引用

20th IEEE International Conference on Intelligent Computer Communication and processing Conference, ICCP 2024

作者： Aszkowski, Przemyslaw Kraft, Marek Institute of Robotics and Machine Intelligence Poznań University of Technology Poznań Poland

ISBN: (纸本)9798331539979

Deep learning advancements have significantly enhanced computer vision applications in precision agriculture. While RGB cameras operating in visible light are affordable, they provide limited information compared to multispectral equipment. This research analyses methods to reduce the need for manual annotation when training a model using only RGB images, without compromising the model's accuracy. We propose a semi-supervised approach where a teacher model, trained on multispectral images, generates artificial ground truth data to train a student model that operates solely on RGB images. This strategy has enabled us to achieve nearly a tenfold reduction in the required training data while maintaining similar performance metrics. Additionally, we explore the potential of segmentation foundation models to simplify the manual annotation process, reducing the need for full segmentation masks to just bounding boxes. Our findings also indicate that using multispectral images as input for the Segment Anything Model is more effective than using RGB images. © 2024 IEEE.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

Enhancing Insect image Recognition with Sand Cat Swarm Optimization with Deep Feature Extraction 2

Enhancing Insect Image Recognition with Sand Cat Swarm Optim...

引用

2nd IEEE International Conference on Advances in Information Technology, ICAIT 2024

作者： Yogaraja, C.A. Priyanka, C. Priyadharshini, S.P. Babu, J. Jagan Rajagopal, S. Poorani, S. Ramco Institute of Technology Rajapalayam India School of Artificial Intelligence Amrita Vishwa Vidyapeetham Coimbatore Campus India St. Joseph's College of Engineering OMR Chennai119 India Department of ECE R.M.D Engineering College Kavaraipettai India Department of IT National Engineering College Kovilppati India Kongu Engineering College Perundurai Erode India

ISBN: (纸本)9798350383867

Insect image recognition (IIR) is a specified field in machine learning (ML) and computer vision that efforts to automatically recognise and detection of insect species utilizing visual data attained from images. Leveraging deep learning (DL) techniques, convolutional neural network (CNN), and image processing enables exact and effective classification of a huge range of insect species. IIR has many applications that range from biodiversity observation and pest control in farming in order to entomological study and disease vector identification. The unique procedure of insect detection, it permits experts, entomologists, and ecologists to attain valuable insights, make knowledgeable decisions, and update challenges that contain insect detection which contribute to more real conservation efforts, ecological research, and agriculture management. This work contains a framework of Insect image Recognition with Sand Cat Swarm Optimization with Deep Feature Extraction (IIR-SCSODFR) model. The developed IIR-SCSODFR method incorporates a complete procedure that starts with Gaussian filtering-based image pre-processing, improving image quality and decreasing noise in order to offer a clear basis for precise analysis. Deep feature extraction is carried out to capture intricate visual designs essential to numerous insect species, using advanced models for inclusive insect characterization. For exact insect detection, Long Short-Term Memory (LSTM) systems are employed, proficient in demonstrating time-based needs in image sequences. Besides, model leverages SCSO technique for parameter tuning, adjusting model's performance to adjust unique features of a dataset. The proposed IIR-SCSODFR model signifies an important leap forward in IIR, and provides a forceful and precise solution for automated identification of various insect species. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Cat-CNN: Human Eye Cataract Detection from Color Fundus Photograph with Deep CNN with Optimized Cascaded Network

Cat-CNN: Human Eye Cataract Detection from Color Fundus Phot...

引用

2024 IEEE International Conference on Signal processing, Information, Communication and Systems, SPICSCON 2024

作者： Islam, Md Sariful Bappy, Md Tusher Ahmad Shawon, Jubayer Ahmed Hasan, Mehedi Rahman, Wahidur Akter, Lija Azad, Mir Mohammad Dept. of Computer Science and Engineering Uttara University Uttara Dhaka Bangladesh Dept. of ICE Bangladesh Army University of Engineering and Technology Natore Bangladesh Dept. of Computer Science and Engineering Inpendent Researcher Dhaka Bangladesh

ISBN: (纸本)9798331510213

Cataracts are clouding of the lens in the eye, leading to loss of vision that can progress to blindness if not treated. This paper proposed a new method for automatic cataract detection using color fundus images and deep learning methods. A dataset of 1,105 color fundus images labeled by expert ophthalmologists was used in this process. We used seven pre-trained CNNs (DenseNet121, EfficientNetB0, MobileNetV2, InceptionV3, Xception, ResNet50, VGG16, and VGG19) for feature extraction before reducing the extracted features using PCA. We used the following combination of machine learning classifiers: SVC, RF, Decision Tree, Gaussian Naive Bayes, XGBoost, K-Nearest Neighbors, and Logistic Regression. For evaluating the models’ performance, we used accuracy, precision, recall, F1-score, and computational efficiency. For all metrics, MobileNetV2 with Random Forest achieved perfect scores: 100% accuracy, precision, recall, and F1-score, with an average processing time of 669 ms ± 28.8 ms. Thus, it can be applied in real-time applications. EfficientNetB0 with SVC gave an average accuracy of 87.33%, with the rest of the precision and recall metrics above 86%. Then, ResNet50, VGG16, and VGG19 followed with high accuracies between the range of 89.64% to 90.50%. It systemizes the proper choice of architectures of CNNs and classifiers, making the system both accurate and computationally efficient. Future work will include augmentation of the dataset, real-time support in the clinical setting, and advanced techniques for image preprocessing, generative adversarial networks. In addition, the development of an automated annotation tool, improvement of explainable AI, will further improve the deployment of robust AI systems in early diagnosis of cataracts, enhancing the outcome for patients. © 2024 IEEE.

关键词： Cataract detection CNN color fundus photography deep learning feature extraction image classification machine learning ophthalmology PCA

来源：评论

学校读者我要写书评

暂无评论

机器视觉在食品无损检测中的应用研究进展

引用

中国食品学报 2024年第12期24卷 13-27页

作者：唐彦嵩徐锐豪王夙加清华大学深圳国际研究生院广东深圳518055

随着全球食品消费需求的增加,食品无损检测技术在食品质量控制和安全保障中变得日益重要。本文系统综述机器视觉在食品无损检测中的应用与发展趋势。通过分析当前文献,探讨包括RGB成像、多光谱成像、高光谱成像等多种成像技术,以及图像... 详细信息

随着全球食品消费需求的增加,食品无损检测技术在食品质量控制和安全保障中变得日益重要。本文系统综述机器视觉在食品无损检测中的应用与发展趋势。通过分析当前文献,探讨包括RGB成像、多光谱成像、高光谱成像等多种成像技术,以及图像处理、机器学习和深度学习等检测算法在食品无损检测中的应用。分析机器视觉在食品无损检测中应用的技术挑战,如数据集的匮乏和模型在通用场景下泛化能力不足。基于当前研究现状,展望未来的研究方向,提出多模态数据融合、嵌入式检测系统以及与深度学习技术的紧密结合等可能的发展路径,旨在为食品无损检测技术的创新提供参考和方向。

关键词：食品无损检测食品安全机器视觉机器学习深度学习

来源：评论

学校读者我要写书评

暂无评论

Ghosting Effect Removal for Multi-Frame Super-Resolution on CCTV Videos with Moving Objects 2022

Ghosting Effect Removal for Multi-Frame Super-Resolution on ...

引用

5th International Conference on machine vision and applications (ICMVA)

作者： Singian, Jarrett Ethan Tan, Jade Nicole Tierro, Martin Angelo Del Gallego, Neil Patrick Ilao, Joel Antioquia, Arren Matthew De La Salle Univ Manila Philippines De Le Salle Univ Graph Animat Multimedia & Entertainment GAME Lab Manila Philippines De Le Salle Univ Ctr Automat Res CAR Manila Philippines

ISBN: (纸本)9781450395670

With the increased use of closed-circuit television (CCTV) footage for security and surveillance purposes as well as for object or person recognition and efficiency monitoring, high-quality CCTV videos are necessary. In this paper, we propose Corgi Eye, a moving object removal + super-resolution framework for enhancing CCTV footages to remove ghosting artifacts caused by performing multiframe super-resolution (MISR) on moving objects. Our method extends the framework of Eagle Eye, which is an existing MISR framework tailored for mobile devices. Our results demonstrate that the system can completely remove ghosting effects caused by moving objects while performing MISR on CCTV footage. Our proposed method demonstrates competitive performance when compared to Eagle Eye, achieving a 16% increase in terms of PSNR metric. Additionally, our method can produce clear images, on par with deep learning approaches such as ESPCN and SOF-VSR.

关键词： ghosting effect removal image segmentation video segmentation image processing image enhancement super-resolution background subtraction moving object detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：