检索结果-内蒙古大学图书馆

SMART-vision: survey of modern action recognition techniques in vision

Multimedia Tools and applications 2024年 1-72页

作者： AlShami, Ali K. Rabinowitz, Ryan Lam, Khang Shleibik, Yousra Mersha, Melkamu Boult, Terrance Kalita, Jugal Computer Science Department University of Colorado Colorado Springs 1420 Austin Bluffs Pkwy Colorado SpringsCO80918 United States Information Technology Department Can Tho University Campus II 3/2 Street Ninh Kieu District Can Tho Viet Nam

Human Action Recognition (HAR) is a challenging domain in computer vision, involving recognizing complex patterns by analyzing the spatiotemporal dynamics of individuals’ movements in videos. These patterns arise in sequential data, such as video frames, which are often essential to accurately distinguish actions that would be ambiguous in a single image. HAR has garnered considerable interest due to its broad applicability, ranging from robotics and surveillance systems to sports motion analysis, healthcare, and the burgeoning field of autonomous vehicles. While several taxonomies have been proposed to categorize HAR approaches in surveys, they often overlook hybrid methodologies and fail to demonstrate how different models incorporate various architectures and modalities. In this comprehensive survey, we present the novel SMART-vision taxonomy, which illustrates how innovations in deep learning for HAR complement one another, leading to hybrid approaches beyond traditional categories. Our survey provides a clear roadmap from foundational HAR works to current state-of-the-art systems, highlighting emerging research directions and addressing unresolved challenges in discussion sections for architectures within the HAR domain. We provide details of the research datasets that various approaches used to measure and compare HAR approaches. We also explore the rapidly emerging field of Open-HAR systems, which challenges HAR systems by presenting samples from unknown, novel classes during test-time. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： 3D convolutional Computer vision Deep learning Graph convolutional network Human action recognition machine learning Motion models Open-set recognition Open-world learning Transformer Two-streams network vision-based

来源：评论

学校读者我要写书评

暂无评论

Reconfigurable parallel photonic matrix-vector multiplication processor based on multi-dimensional multiplexing

引用

OPTICS EXPRESS 2025年第9期33卷 19837-19850页

作者： Bi, Yanfeng Wu, Xingyu Fan, Chenrui Zhang, Lufan Wang, Chuan Beijing Normal Univ Sch Artificial Intelligence Beijing 100875 Peoples R China Beijing Normal Univ Appl Opt Beijing Area Major Lab Beijing 100875 Peoples R China

Matrix-vector multiplication (MVM) operations play an important role in applications such as data processing and artificial neural networks. To meet the growing demand for computing power, the photonic MVM processor provides what we believe to be a new computing architecture. In this paper, we propose a reconfigurable parallel MVM (RP-MVM) processor. To further improve the parallel computing dimension, wavelength division multiplexing (WDM) and digital subcarrier multiplexing (DSM) technologies were first incorporated into the photonic MVM. Compared with the traditional WDM-MVM architecture, the parallelism of RP-MVM scheme is increased by N times, where N is the carrier number of DSM signal. Moreover, the input data channel can be dynamically adjusted without changing the hardware scale, which improves the flexibility of computing system. The simulation results show that the RP-MVM scheme can achieve parallel computing operations of eight MVMs, with a computing speed of 128 GOPs. For a random 6-bit resolution data sequence, the root mean square error (RMSE) of calculation results is on the order of 1E-3. In addition, for the image edge extraction task based on Roberts operator, this scheme can realize the parallel processing of four grayscale images. Therefore, the proposed scheme provides an alternative approach for realizing a highly parallel and reconfigurable large-scale photonic MVM architecture.

关键词： machine vision Mode division multiplexing Neural networks Optical computing Variable optical attenuators Wavelength division multiplexing

来源：评论

学校读者我要写书评

暂无评论

A Review on Quantum machine Learning in Different Computer vision Fields

A Review on Quantum Machine Learning in Different Computer V...

引用

2024 IEEE International Performance, Computing, and Communications Conference, IPCCC 2024

作者： Islam, Md Majedul He, Jing Selena Kennesaw State University Department of Computer Science Marietta United States

ISBN: (纸本)9798350367942

Quantum machine Learning (QML) promises the transformative potential in computer vision by utilizing quantum computing to facilitate faster high-dimensional data processing. In this paper, we will go through some of the recent works that employ QML for computer vision problems such as image Segmentation, Classification, and Generation. Demonstrations aimed at showing where QML methods beat the state of art techniques in particular applications like facial recognition, medical imaging, and satellite imagery. QML aspires to make pathbreaking changes in a field limited by current hardware capabilities. This poster abstract summarizes the important studies, methodologies and findings to inform further research in this developing field. © 2024 IEEE.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Synthetic Data Generation for AI-based machine vision applications

Synthetic Data Generation for AI-based Machine Vision Applic...

引用

IS and T International Symposium on Electronic Imaging 2024: Intelligent Robotics and Industrial applications using Computer vision, IRIACV 2024

作者： Seiler, Frederik Eichinger, Verena Effenberger, Ira Fraunhofer IPA Stuttgart Germany

This paper presents a method for synthesizing 2D and 3D sensor data for various machine vision tasks. Depending on the task, different processing steps can be applied to a 3D model of an object. For object detection, segmentation and pose estimation, random object arrangements are generated automatically. In addition, objects can be virtually deformed in order to create realistic images of non-rigid objects. For automatic visual inspection, synthetic defects are introduced into the objects. Thus sensor-realistic datasets with typical object defects for quality control applications can be created, even in the absence of defective parts. The simulation of realistic images uses physically based rendering techniques. Material properties and different lighting situations are taken into account in the 3D models. The resulting tuples of 2D images and their ground truth annotations can be used to train a machine learning model, which is subsequently applied to real data. In order to minimize the reality gap, a random parameter set is selected for each image, resulting in images with high variety. Considering the use cases damage detection and object detection, it has been shown that a machine learning model trained only on synthetic data can also achieve very good results on real data. © 2024, Society for Imaging Science and Technology.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

In-domain Self-supervised Learning for Plankton image Classification on a Budget

In-domain Self-supervised Learning for Plankton Image Classi...

引用

2025 IEEE/CVF Winter Conference on applications of Computer vision Workshops, WACVW 2025

作者： Ciranni, Massimiliano Gjergji, Ani Maracani, Andrea Murino, Vittorio Pastore, Vito Paolo University of Genoa MaLGa Dibris Italy Istituto Italiano di Tecnologia Genoa Italy University of Verona Italy

ISBN: (纸本)9798331536626

In the last few years, the abundance of available plank-ton images has significantly increased due to advancements in acquisition system technology. Consequently, a growing interest in automatic plankton image classification has surged. machine learning algorithms have recently emerged to assist in the analysis of this vast quantity of data, supporting traditional manual processing. However, annotating such data is costly and demands significant time and resources, thus requiring data-efficient machine learning solutions. The typical framework for tackling this issue has been the adoption of supervised imageNet pre-trained models, and fine-tuning them on the plankton classification downstream task. Nonetheless, self-supervised pre-training protocols may provide an effective alternative to the supervised approaches using imageNet, while allowing the exploitation of the increasingly large amount of unanno-tated plankton data. To the best of our knowledge, no work systematically analyzes the impact of self-supervised pre-training protocols for plankton image classification. To fill this gap, in this paper, we present a thorough comparison between in-domain (plankton images) and out-of-domain (imageNet) supervised and self-supervised pre-training, in terms of the quality of the corresponding embeddings for plankton image classification. We believe that this work may pave the way for further research in self-supervised protocols for the plankton domain, providing a valuable alternative to imageNet, and exploiting the vast amount of unannotated available plankton images. © 2025 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Advancing Deep Learning on Edge Devices: Fine-Tuning and Deployment of YOLOv7 Model for Efficient Object Detection in AI based Computer vision applications 3

Advancing Deep Learning on Edge Devices: Fine-Tuning and Dep...

引用

3rd International Conference on Intelligent Data Communication Technologies and Internet of Things, IDCIoT 2025

作者： Shekhar, Sudhanshu Sathwik, T.S. Pritwani, Mayank Mohana Ramakanth Kumar, P. Sreelakshmi, K. RV College of Engineering® Bengaluru India

ISBN: (纸本)9798331527549

This paper investigates the optimization and deployment of YOLOv7 deep learning model on NVIDIA Jetson Nano, an AI-focused edge computing platform for object detection in various computer vision applications. The work leverages TensorRT and quantization techniques for model acceleration for good detection accuracy. Further it examines performance metrics such as speed, accuracy, and resource utilization for image dataset. The model is trained using 80 different classes of objects and demonstrates the use of 6 classes. The average detection accuracy obtained 92.35% and the average processing time is 117.8ms. This work supports AI by demonstrating the feasibility of running deep learning models on edge devices and provides insight into the challenges and opportunities of optimizing AI models for energy-efficient, real-time operations on edge devices for various computer vision applications. © 2025 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Event Transformer⁺. A Multi-Purpose Solution for Efficient Event Data processing

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2023年第12期45卷 16013-16020页

作者： Sabater, Alberto Montesano, Luis Murillo, Ana C. Univ Zaragoza DIIS I3A Zaragoza 50009 Spain Bitbrain Technol Zaragoza 50006 Spain

Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer(+), that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.

关键词： Computer vision image analysis image classification

来源：评论

学校读者我要写书评

暂无评论

Macro-Scale Pattern Recognition and Coordinate Identification in Real-time Spatio-temporal Overlap for Photonics Engineering applications 22

Macro-Scale Pattern Recognition and Coordinate Identificatio...

引用

22nd IFAC Conference on Technology, Culture and International Stability (TECIS)

作者： Al-Juboori, Haider South East Technol Univ Fac Engn Dept Elect Engn & Commun 806 Killeshin BldgKilkenny Rd Carlow R93 V960 Ireland

The significance of high-speed machine vision in scientific and technological fields is growing, especially with the era of Industry 4.0 technologies. There are several pattern-matching algorithms that have various intriguing applications in ultralow-latency machine vision processing. However, the low frame rate of image sensors-which usually operate at tens of hertz-fundamentally limits the processing rate. The paper will conceptualize and develop the computerized pattern recognition technique that can be applied to investigate light beam profiles and extract the desired information according to the purpose required in this case study. In the current work, the automatic detection and inspection of laser spots were designed to perform analysis and alignment for laser beam in comparison with the electron spot beam using the LabVIEW graphical programming environment, especially when the laser and electron beams overlap. This is one of the important steps for realizing the fundamental aim of test-FEL to produce short wavelengths with the second, third, and fifth harmonics at 131.5, 88, and 53 nm, respectively. The tentative version of the program achieved the elementary purpose, which fulfilled the accurate transversal alignment of the ultrashort laser pulses with the electron beam in the system of the FEL test facility at MAX-Lab, in addition to studying the beam's stability and jittering range. Copyright (C) 2024 The Authors.

关键词： intelligent systems pattern matching real-time tracking computer vision concepts supporting control automation and semi-robotic systems

来源：评论

学校读者我要写书评

暂无评论

A literature review on remote sensing scene categorization based on convolutional neural networks

引用

INTERNATIONAL JOURNAL OF REMOTE SENSING 2023年第8期44卷 2611-2642页

作者： Kaul, Ajay Kumari, Monika Shri Mata Vaishno Devi Univ Sch Comp Sci & Engn Katra J&K India

Remote sensing scene categorization (RSSC) is a long-standing, vital, and complex issue in computer vision. It seeks to classify a scene into one of the predetermined scene groups by analysing the entire image. The rise of large-scale datasets and the resurgence of deep learning-based methods, which directly learn potent feature representations from large amounts of raw data, have led to a lot of progress in representing and classifying RS scenes. Convolutional neural networks (CNN) are among the varieties of deep neural networks that have been the subject of the most research. Taking advantage of the swift increase in the amount of labelled samples and the major enhancements in the strength of processing units, CNNs research has advanced swiftly, producing state-of-the-art results on a number of applications. In this overview, we present a comprehensive evaluation of earlier published surveys and recent CNN-based approaches for RSSC. This study covers more than 100 significant works on scene categorization, including problems, benchmark datasets, and qualitative performance evaluation. In view of the results so far, this study concludes with a list of intriguing research opportunities.

关键词： Convolutional neural network Computer vision Scene representation Remote sensing scene categorization Deep Learning machine learning

来源：评论

学校读者我要写书评

暂无评论

Maritime image Stabilization: A Comprehensive Review of Techniques and Challenges 9

Maritime Image Stabilization: A Comprehensive Review of Tech...

引用

9th International Conference on Electronic Technology and Information Science (ICETIS)

作者： Wei, Enping Tan, Yong Chai Tai, Vin Cent Hao, Yanan Zhang, Xiaodong Zhang, Tian SEGI Univ Fac Engn Built Environm & Informat Technol Ctr Modelling & Simulat Kuala Lumpur Malaysia

ISBN: (纸本)9798350388350;9798350388343

image stabilization plays a crucial role in providing accurate and reliable visual information for machine vision applications. In maritime applications, such as unmanned ship navigation, where six degrees of freedom (DOF) motion and harsh maritime conditions prevail, the efficacy of image stabilization technology is vital for robust image processing algorithms. This paper offers a comprehensive review of image stabilization techniques tailored for maritime environments, developed over the past two decades. We analyzed a total of 39 research articles on the subject, sourced from Web-of-Science, SCOPUS, and the Engineering Index databases, discussing potential research directions to address the limitations of current image stabilization methods, with special consideration for the unique requirements of ship-borne cameras. It provides an up-to-date overview of the techniques, limitations, and algorithms of ship-borne cameras for maritime applications, identifying current knowledge gaps and areas requiring further research. This review aims to guide the development of new technologies and methods to improve the performance of image stabilization systems in maritime contexts.

关键词： image Stabilization Assessment Application Maritime Environment Ship-borne Camera

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：