检索结果-内蒙古大学图书馆

Reconfigurable parallel photonic matrix-vector multiplication processor based on multi-dimensional multiplexing

OPTICS EXPRESS 2025年第9期33卷 19837-19850页

作者： Bi, Yanfeng Wu, Xingyu Fan, Chenrui Zhang, Lufan Wang, Chuan Beijing Normal Univ Sch Artificial Intelligence Beijing 100875 Peoples R China Beijing Normal Univ Appl Opt Beijing Area Major Lab Beijing 100875 Peoples R China

Matrix-vector multiplication (MvM) operations play an important role in applications such as data processing and artificial neural networks. To meet the growing demand for computing power, the photonic MvM processor provides what we believe to be a new computing architecture. In this paper, we propose a reconfigurable parallel MvM (RP-MvM) processor. To further improve the parallel computing dimension, wavelength division multiplexing (WDM) and digital subcarrier multiplexing (DSM) technologies were first incorporated into the photonic MvM. Compared with the traditional WDM-MvM architecture, the parallelism of RP-MvM scheme is increased by N times, where N is the carrier number of DSM signal. Moreover, the input data channel can be dynamically adjusted without changing the hardware scale, which improves the flexibility of computing system. The simulation results show that the RP-MvM scheme can achieve parallel computing operations of eight MvMs, with a computing speed of 128 GOPs. For a random 6-bit resolution data sequence, the root mean square error (RMSE) of calculation results is on the order of 1E-3. In addition, for the image edge extraction task based on Roberts operator, this scheme can realize the parallel processing of four grayscale images. Therefore, the proposed scheme provides an alternative approach for realizing a highly parallel and reconfigurable large-scale photonic MvM architecture.

关键词： machine vision Mode division multiplexing Neural networks Optical computing variable optical attenuators Wavelength division multiplexing

来源：评论

学校读者我要写书评

暂无评论

A Review on Quantum machine Learning in Different Computer vision Fields

A Review on Quantum Machine Learning in Different Computer V...

引用

2024 IEEE International Performance, Computing, and Communications Conference, IPCCC 2024

作者： Islam, Md Majedul He, Jing Selena Kennesaw State University Department of Computer Science Marietta United States

ISBN: (纸本)9798350367942

Quantum machine Learning (QML) promises the transformative potential in computer vision by utilizing quantum computing to facilitate faster high-dimensional data processing. In this paper, we will go through some of the recent works that employ QML for computer vision problems such as image Segmentation, Classification, and Generation. Demonstrations aimed at showing where QML methods beat the state of art techniques in particular applications like facial recognition, medical imaging, and satellite imagery. QML aspires to make pathbreaking changes in a field limited by current hardware capabilities. This poster abstract summarizes the important studies, methodologies and findings to inform further research in this developing field. © 2024 IEEE.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Synthetic Data Generation for AI-based machine vision applications

Synthetic Data Generation for AI-based Machine Vision Applic...

引用

IS and T International Symposium on Electronic Imaging 2024: Intelligent Robotics and Industrial applications using Computer vision, IRIACv 2024

作者： Seiler, Frederik Eichinger, verena Effenberger, Ira Fraunhofer IPA Stuttgart Germany

This paper presents a method for synthesizing 2D and 3D sensor data for various machine vision tasks. Depending on the task, different processing steps can be applied to a 3D model of an object. For object detection, segmentation and pose estimation, random object arrangements are generated automatically. In addition, objects can be virtually deformed in order to create realistic images of non-rigid objects. For automatic visual inspection, synthetic defects are introduced into the objects. Thus sensor-realistic datasets with typical object defects for quality control applications can be created, even in the absence of defective parts. The simulation of realistic images uses physically based rendering techniques. Material properties and different lighting situations are taken into account in the 3D models. The resulting tuples of 2D images and their ground truth annotations can be used to train a machine learning model, which is subsequently applied to real data. In order to minimize the reality gap, a random parameter set is selected for each image, resulting in images with high variety. Considering the use cases damage detection and object detection, it has been shown that a machine learning model trained only on synthetic data can also achieve very good results on real data. © 2024, Society for Imaging Science and Technology.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

ENvQA: Improving visual Question Answering model by enriching the visual feature

引用

ENGINEERING applications OF ARTIFICIAL INTELLIGENCE 2025年 142卷

作者： Chowdhury, Souvik Soni, Badal Natl Inst Technol Silchar CSE Dept NIT Rd Silchar 788010 Assam India

visual Question Answering (vQA) is pivotal in various industries, including medicine. Current approaches typically rely on identifying patterns between image regions and questions, using attention-learning techniques to highlight essential information and suppress noise. However, existing vQA systems often overlook crucial foreground and background-related features in images, limiting their ability to tackle complex questions effectively. Most vQA models employ either spatial or channel attention mechanisms. Spatial attention localizes the region of interest (ROI) but may overlook global semantic relationships between salient objects. Conversely, channel attention enhances feature representation but disregards spatial dynamics within images. To address these limitations, we propose "ENvQA" (Enriching v in vQA), a novel vQA model that integrates enriched visual features by leveraging both spatial and object-level features, alongside spatial and channel attention networks. Our model aims to enhance understanding by capturing both local and global contexts within images. Experimental evaluations on benchmark datasets such as vQA 2.0, TDIUC, and GQA demonstrate that ENvQA outperforms state-of-the-art (SOTA) models utilizing attention mechanisms.

关键词： visual Question Answering Computer vision Natural language processing visual and language

来源：评论

学校读者我要写书评

暂无评论

Biomedical image processing and applications Based on Multi Object Detection Algorithm of Computational vision

Biomedical Image Processing and Applications Based on Multi ...

引用

2023 IEEE International Conference on image processing and Computer applications, ICIPCA 2023

作者： Luo, Lei Imperial College London Department of Metabolism Digestion and Reproduction London United Kingdom

ISBN: (纸本)9798350314670

Medical image analysis based on deep learning has important research significance for accurately locating and identifying lesion targets. This article aims to address the issues of improving the detection efficiency and performance of existing object detection methods, and introduces the SE Net module to the Yolo v4 algorithm. The new algorithm improves the positioning accuracy of the data used in this article, achieving the goal of using object detection algorithms for rough target localization before classification. We selected Faster RCNN, Yolo v4, and improved Yolo v4 algorithms for experiments on the BraTS2018 dataset. The experimental results show that, on the whole, the improved Yolo v4 algorithm has better classification effect, and the improved Yolo v4 algorithm provides a good method for assisting doctors to diagnose brain tumor diseases. © 2023 IEEE.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

In-domain Self-supervised Learning for Plankton image Classification on a Budget

In-domain Self-supervised Learning for Plankton Image Classi...

引用

2025 IEEE/CvF Winter Conference on applications of Computer vision Workshops, WACvW 2025

作者： Ciranni, Massimiliano Gjergji, Ani Maracani, Andrea Murino, vittorio Pastore, vito Paolo University of Genoa MaLGa Dibris Italy Istituto Italiano di Tecnologia Genoa Italy University of Verona Italy

ISBN: (纸本)9798331536626

In the last few years, the abundance of available plank-ton images has significantly increased due to advancements in acquisition system technology. Consequently, a growing interest in automatic plankton image classification has surged. machine learning algorithms have recently emerged to assist in the analysis of this vast quantity of data, supporting traditional manual processing. However, annotating such data is costly and demands significant time and resources, thus requiring data-efficient machine learning solutions. The typical framework for tackling this issue has been the adoption of supervised imageNet pre-trained models, and fine-tuning them on the plankton classification downstream task. Nonetheless, self-supervised pre-training protocols may provide an effective alternative to the supervised approaches using imageNet, while allowing the exploitation of the increasingly large amount of unanno-tated plankton data. To the best of our knowledge, no work systematically analyzes the impact of self-supervised pre-training protocols for plankton image classification. To fill this gap, in this paper, we present a thorough comparison between in-domain (plankton images) and out-of-domain (imageNet) supervised and self-supervised pre-training, in terms of the quality of the corresponding embeddings for plankton image classification. We believe that this work may pave the way for further research in self-supervised protocols for the plankton domain, providing a valuable alternative to imageNet, and exploiting the vast amount of unannotated available plankton images. © 2025 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Advancing Deep Learning on Edge Devices: Fine-Tuning and Deployment of YOLOv7 Model for Efficient Object Detection in AI based Computer vision applications 3

Advancing Deep Learning on Edge Devices: Fine-Tuning and Dep...

引用

3rd International Conference on Intelligent Data Communication Technologies and Internet of Things, IDCIoT 2025

作者： Shekhar, Sudhanshu Sathwik, T.S. Pritwani, Mayank Mohana Ramakanth Kumar, P. Sreelakshmi, K. RV College of Engineering® Bengaluru India

ISBN: (纸本)9798331527549

This paper investigates the optimization and deployment of YOLOv7 deep learning model on NvIDIA Jetson Nano, an AI-focused edge computing platform for object detection in various computer vision applications. The work leverages TensorRT and quantization techniques for model acceleration for good detection accuracy. Further it examines performance metrics such as speed, accuracy, and resource utilization for image dataset. The model is trained using 80 different classes of objects and demonstrates the use of 6 classes. The average detection accuracy obtained 92.35% and the average processing time is 117.8ms. This work supports AI by demonstrating the feasibility of running deep learning models on edge devices and provides insight into the challenges and opportunities of optimizing AI models for energy-efficient, real-time operations on edge devices for various computer vision applications. © 2025 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Event Transformer⁺. A Multi-Purpose Solution for Efficient Event Data processing

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2023年第12期45卷 16013-16020页

作者： Sabater, Alberto Montesano, Luis Murillo, Ana C. Univ Zaragoza DIIS I3A Zaragoza 50009 Spain Bitbrain Technol Zaragoza 50006 Spain

Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/vR and autonomous driving. Current top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer(+), that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.

关键词： Computer vision image analysis image classification

来源：评论

学校读者我要写书评

暂无评论

Make a Long image Short: Adaptive Token Length for vision Transformers

Make a Long Image Short: Adaptive Token Length for Vision Tr...

引用

5th International Workshop on Learning with Imbalanced Domains - Theory and applications / European Conference on machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

作者： Zhou, Qiqi Zhu, Yichen Shanghai Univ Elect Power Shanghai Peoples R China Midea Grp Shanghai Peoples R China

ISBN: (纸本)9783031434143;9783031434150

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the viT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-viT (ReviT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReviT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReviT to process images with the minimum sufficient number of tokens, reducing token numbers in the viT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative viT models on image classification and action recognition.

关键词： vision transformer token compression

来源：评论

学校读者我要写书评

暂无评论

Black widow optimisation with deep learning-based feature fusion model for remote sensing image analysis

引用

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING 2025年第1期28卷 56-70页

作者： Rathod, vaishnavee vijay Rana, Dipti P. Mehta, Rupa G. Sardar Vallabhbhai Natl Inst Technol Dept Comp Sci Engn Surat Gujarat India

Recently, achieving accurate remote sensing images (RSI) classification has been a primary goal in deep learning, given its extensive applications, including urban planning and disaster management. The performance of existing convolutional neural networks (CNN)-based strategies is primarily influenced by their parameter settings, necessitating automated hyperparameter tuning through metaheuristic methods. The proposed BWODLF-RSI technique integrates black widow optimisation with a deep learning feature fusion model for enhanced RSI analysis. The preliminary processing step is to enhance RSI quality using noise reduction through a Gaussian filter (GF), enhancing contrast with the help of contrast limited adaptive histogram equalisation (CLAHE), and data augmentation to prevent overfitting. It is followed by employing Inception v3 and DenseNet201 to extract and fuse potent features. A critical aspect of this strategy is using black widow optimisation to fine-tune the kernel extreme learning machine (KELM) model, attaining a notable RSI classification accuracy of 94.05%. When tested on UCM and AID datasets, the BWODLF-RSI approach demonstrated superior feature selection and RSI analysis performance.

关键词： remote sensing image classification deep learning pre-processing feature fusion

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：