检索结果-内蒙古大学图书馆

2025 IEEE/CVF Winter Conference on applications of Computer vision Workshops, WACVW 2025

作者： Ciranni, Massimiliano Gjergji, Ani Maracani, Andrea Murino, Vittorio Pastore, Vito Paolo University of Genoa MaLGa Dibris Italy Istituto Italiano di Tecnologia Genoa Italy University of Verona Italy

ISBN: (纸本)9798331536626

In the last few years, the abundance of available plank-ton images has significantly increased due to advancements in acquisition system technology. Consequently, a growing interest in automatic plankton image classification has surged. machine learning algorithms have recently emerged to assist in the analysis of this vast quantity of data, supporting traditional manual processing. However, annotating such data is costly and demands significant time and resources, thus requiring data-efficient machine learning solutions. The typical framework for tackling this issue has been the adoption of supervised imageNet pre-trained models, and fine-tuning them on the plankton classification downstream task. Nonetheless, self-supervised pre-training protocols may provide an effective alternative to the supervised approaches using imageNet, while allowing the exploitation of the increasingly large amount of unanno-tated plankton data. To the best of our knowledge, no work systematically analyzes the impact of self-supervised pre-training protocols for plankton image classification. To fill this gap, in this paper, we present a thorough comparison between in-domain (plankton images) and out-of-domain (imageNet) supervised and self-supervised pre-training, in terms of the quality of the corresponding embeddings for plankton image classification. We believe that this work may pave the way for further research in self-supervised protocols for the plankton domain, providing a valuable alternative to imageNet, and exploiting the vast amount of unannotated available plankton images. © 2025 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Advancing Deep Learning on Edge Devices: Fine-Tuning and Deployment of YOLOv7 Model for Efficient Object Detection in AI based Computer vision applications 3

Advancing Deep Learning on Edge Devices: Fine-Tuning and Dep...

引用

3rd International Conference on Intelligent Data Communication Technologies and Internet of Things, IDCIoT 2025

作者： Shekhar, Sudhanshu Sathwik, T.S. Pritwani, Mayank Mohana Ramakanth Kumar, P. Sreelakshmi, K. RV College of Engineering® Bengaluru India

ISBN: (纸本)9798331527549

This paper investigates the optimization and deployment of YOLOv7 deep learning model on NVIDIA Jetson Nano, an AI-focused edge computing platform for object detection in various computer vision applications. The work leverages TensorRT and quantization techniques for model acceleration for good detection accuracy. Further it examines performance metrics such as speed, accuracy, and resource utilization for image dataset. The model is trained using 80 different classes of objects and demonstrates the use of 6 classes. The average detection accuracy obtained 92.35% and the average processing time is 117.8ms. This work supports AI by demonstrating the feasibility of running deep learning models on edge devices and provides insight into the challenges and opportunities of optimizing AI models for energy-efficient, real-time operations on edge devices for various computer vision applications. © 2025 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Event Transformer⁺. A Multi-Purpose Solution for Efficient Event Data processing

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2023年第12期45卷 16013-16020页

作者： Sabater, Alberto Montesano, Luis Murillo, Ana C. Univ Zaragoza DIIS I3A Zaragoza 50009 Spain Bitbrain Technol Zaragoza 50006 Spain

Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer(+), that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.

关键词： Computer vision image analysis image classification

来源：评论

学校读者我要写书评

暂无评论

Macro-Scale Pattern Recognition and Coordinate Identification in Real-time Spatio-temporal Overlap for Photonics Engineering applications 22

Macro-Scale Pattern Recognition and Coordinate Identificatio...

引用

22nd IFAC Conference on Technology, Culture and International Stability (TECIS)

作者： Al-Juboori, Haider South East Technol Univ Fac Engn Dept Elect Engn & Commun 806 Killeshin BldgKilkenny Rd Carlow R93 V960 Ireland

The significance of high-speed machine vision in scientific and technological fields is growing, especially with the era of Industry 4.0 technologies. There are several pattern-matching algorithms that have various intriguing applications in ultralow-latency machine vision processing. However, the low frame rate of image sensors-which usually operate at tens of hertz-fundamentally limits the processing rate. The paper will conceptualize and develop the computerized pattern recognition technique that can be applied to investigate light beam profiles and extract the desired information according to the purpose required in this case study. In the current work, the automatic detection and inspection of laser spots were designed to perform analysis and alignment for laser beam in comparison with the electron spot beam using the LabVIEW graphical programming environment, especially when the laser and electron beams overlap. This is one of the important steps for realizing the fundamental aim of test-FEL to produce short wavelengths with the second, third, and fifth harmonics at 131.5, 88, and 53 nm, respectively. The tentative version of the program achieved the elementary purpose, which fulfilled the accurate transversal alignment of the ultrashort laser pulses with the electron beam in the system of the FEL test facility at MAX-Lab, in addition to studying the beam's stability and jittering range. Copyright (C) 2024 The Authors.

关键词： intelligent systems pattern matching real-time tracking computer vision concepts supporting control automation and semi-robotic systems

来源：评论

学校读者我要写书评

暂无评论

Make a Long image Short: Adaptive Token Length for vision Transformers

Make a Long Image Short: Adaptive Token Length for Vision Tr...

引用

5th International Workshop on Learning with Imbalanced Domains - Theory and applications / European Conference on machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

作者： Zhou, Qiqi Zhu, Yichen Shanghai Univ Elect Power Shanghai Peoples R China Midea Grp Shanghai Peoples R China

ISBN: (纸本)9783031434143;9783031434150

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.

关键词： vision transformer token compression

来源：评论

学校读者我要写书评

暂无评论

A literature review on remote sensing scene categorization based on convolutional neural networks

引用

INTERNATIONAL JOURNAL OF REMOTE SENSING 2023年第8期44卷 2611-2642页

作者： Kaul, Ajay Kumari, Monika Shri Mata Vaishno Devi Univ Sch Comp Sci & Engn Katra J&K India

Remote sensing scene categorization (RSSC) is a long-standing, vital, and complex issue in computer vision. It seeks to classify a scene into one of the predetermined scene groups by analysing the entire image. The rise of large-scale datasets and the resurgence of deep learning-based methods, which directly learn potent feature representations from large amounts of raw data, have led to a lot of progress in representing and classifying RS scenes. Convolutional neural networks (CNN) are among the varieties of deep neural networks that have been the subject of the most research. Taking advantage of the swift increase in the amount of labelled samples and the major enhancements in the strength of processing units, CNNs research has advanced swiftly, producing state-of-the-art results on a number of applications. In this overview, we present a comprehensive evaluation of earlier published surveys and recent CNN-based approaches for RSSC. This study covers more than 100 significant works on scene categorization, including problems, benchmark datasets, and qualitative performance evaluation. In view of the results so far, this study concludes with a list of intriguing research opportunities.

关键词： Convolutional neural network Computer vision Scene representation Remote sensing scene categorization Deep Learning machine learning

来源：评论

学校读者我要写书评

暂无评论

Maritime image Stabilization: A Comprehensive Review of Techniques and Challenges 9

Maritime Image Stabilization: A Comprehensive Review of Tech...

引用

9th International Conference on Electronic Technology and Information Science (ICETIS)

作者： Wei, Enping Tan, Yong Chai Tai, Vin Cent Hao, Yanan Zhang, Xiaodong Zhang, Tian SEGI Univ Fac Engn Built Environm & Informat Technol Ctr Modelling & Simulat Kuala Lumpur Malaysia

ISBN: (纸本)9798350388350;9798350388343

image stabilization plays a crucial role in providing accurate and reliable visual information for machine vision applications. In maritime applications, such as unmanned ship navigation, where six degrees of freedom (DOF) motion and harsh maritime conditions prevail, the efficacy of image stabilization technology is vital for robust image processing algorithms. This paper offers a comprehensive review of image stabilization techniques tailored for maritime environments, developed over the past two decades. We analyzed a total of 39 research articles on the subject, sourced from Web-of-Science, SCOPUS, and the Engineering Index databases, discussing potential research directions to address the limitations of current image stabilization methods, with special consideration for the unique requirements of ship-borne cameras. It provides an up-to-date overview of the techniques, limitations, and algorithms of ship-borne cameras for maritime applications, identifying current knowledge gaps and areas requiring further research. This review aims to guide the development of new technologies and methods to improve the performance of image stabilization systems in maritime contexts.

关键词： image Stabilization Assessment Application Maritime Environment Ship-borne Camera

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement in Welding Operations Through image processing 2

Performance Improvement in Welding Operations Through Image ...

引用

2nd International Conference on Artificial Intelligence and machine Learning applications, AIMLA 2024

作者： Sharma, Mohit Kumar Menon, Soumya V Tripathy, Padmaja Vivekananda Global University Department of Electrical Engineering Jaipur India School of Sciences Department of Chemistry and Biochemistry Karnataka Bangalore India ARKA JAIN University Department of Mechnical Engineering Jharkhand Jamshedpur India

ISBN: (纸本)9798350349221

This examination intends to enhance the overall performance of welding operations through picture processing. It's going to use an aggregate of PC vision and gadgets, getting to know to perceive better and tune welds, improve the accuracy of the system, and reduce the capability for mistakes. Particularly, the study will make use of a deep learning method to classify welds in specific classes, allowing the welding operations to be more effectively monitored and operated. Additionally, a convolutional neural network technique will be utilized to pick out the welds and estimate the vital parameters from the image statistics. Sooner or later, a robot arm geared up with a digital camera and a torch can be used to validate the welding process in an actual-world scenario. The effects of this take a look at will be used to enhance the performance and nice of welding operations through higher visibility into the system. © 2024 IEEE.

关键词： Welds

来源：评论

学校读者我要写书评

暂无评论

image Feature Extraction and Tracking for Robot vision Servo Control System Based on the Transformer Model

Image Feature Extraction and Tracking for Robot Vision Servo...

引用

2024 International Conference on Telecommunications and Power Electronics, TELEPE 2024

作者： He, Yanqiu Zhu, Yanyan Liu, Haisheng Harbin Institute of Petroleum Heilongjiang Harbin150028 China Harbin Huade University Heilongjiang Harbin150025 China The Seventh School of Harbin New District Heilongjiang Harbin150080 China

ISBN: (纸本)9798350369212

Robot vision servo control systems play an important role in modern automation systems, and image feature extraction and tracking, as its key components, have a direct impact on its performance and application scope. In this paper, we explore a novel approach based on the Transformer model, aiming to improve the image feature extraction and tracking function in robot vision servo control systems. First, we briefly introduce the basic principles of the Transformer model and its successful applications in the field of natural language processing. Then, we discuss in detail how to apply the Transformer model to image feature extraction and evaluate its performance with experimental results. Subsequently, we further discussed how to realize the image tracking function by using the Transformer model and proposed a new framework of visual servo control system for robots. Finally, we summarize the research results and look forward to possible future research directions. This research provides new ideas and methods to improve the performance and application range of robot visual servo control systems. © 2024 IEEE.

关键词： machine vision

来源：评论

学校读者我要写书评

暂无评论

All-in-One image Coding for Joint Human-machine vision with Multi-Path Aggregation 38

All-in-One Image Coding for Joint Human-Machine Vision with ...

引用

38th Conference on Neural Information processing Systems, NeurIPS 2024

作者： Zhang, Xu Guo, Peiyao Lu, Ming Ma, Zhan School of Electronic Science and Engineering Nanjing University China

image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at https://***/NJUvision/MPA. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：