检索结果-内蒙古大学图书馆

26th Irish machine vision and image processing Conference, IMvIP 2024

作者： Jakab, Daniel Braun, Alexander Agnew, Cathaoir Mohandas, Reenu Deegan, Brian Michael Molloy, Dara Ward, Enda Scanlan, Anthony Eising, Ciarán Dept. of Electronic and Computer Engineering University of Limerick Castletroy LimerickV94 T9PX Ireland Faculty of Electrical Engineering & Information Technology University of Applied Sciences Düsseldorf40476 Germany Dept. of Electrical and Electronic Engineering University of Galway GalwayH91 TK33 Ireland Valeo Vision Systems GalwayDY1 22DJ Ireland Lero the Science Foundation Ireland Research Centre for Software University of Limerick LimerickV94 T9PX Ireland

ISBN: (纸本)9781837242672

Automotive simulation can potentially compensate for a lack of training data in computer vision applications. However, there has been little to no image quality evaluation of automotive simulation and the impact of optical degradations on simulation is little explored. In this work, we investigate virtual KITTI and the impact of applying variations of Gaussian blur on image sharpness. Furthermore, we consider object detection, a common computer vision application on three different state-of-the-art models, thus allowing us to characterize the relationship between object detection and sharpness. It was found that while image sharpness (MTF50) degrades from an average of 0.245cy/px to approximately 0.119cy/px;object detection performance stays largely robust within 0.58%(Faster RCNN), 1.45%(YOLOF) and 1.93%(DETR) across all respective held-out test sets. © This is an open access article published by the IET under the Creative Commons Attribution License (http://***/licenses/by/3.0/)

关键词： Automobile simulators

来源：评论

学校读者我要写书评

暂无评论

image Feature Extraction and Tracking for Robot vision Servo Control System Based on the Transformer Model

Image Feature Extraction and Tracking for Robot Vision Servo...

引用

2024 International Conference on Telecommunications and Power Electronics, TELEPE 2024

作者： He, Yanqiu Zhu, Yanyan Liu, Haisheng Harbin Institute of Petroleum Heilongjiang Harbin150028 China Harbin Huade University Heilongjiang Harbin150025 China The Seventh School of Harbin New District Heilongjiang Harbin150080 China

ISBN: (纸本)9798350369212

Robot vision servo control systems play an important role in modern automation systems, and image feature extraction and tracking, as its key components, have a direct impact on its performance and application scope. In this paper, we explore a novel approach based on the Transformer model, aiming to improve the image feature extraction and tracking function in robot vision servo control systems. First, we briefly introduce the basic principles of the Transformer model and its successful applications in the field of natural language processing. Then, we discuss in detail how to apply the Transformer model to image feature extraction and evaluate its performance with experimental results. Subsequently, we further discussed how to realize the image tracking function by using the Transformer model and proposed a new framework of visual servo control system for robots. Finally, we summarize the research results and look forward to possible future research directions. This research provides new ideas and methods to improve the performance and application range of robot visual servo control systems. © 2024 IEEE.

关键词： machine vision

来源：评论

学校读者我要写书评

暂无评论

Structured deep learning based object-specific distance estimation from a monocular image

引用

INTERNATIONAL JOURNAL OF machine LEARNING AND CYBERNETICS 2023年第12期14卷 4151-4161页

作者： Shi, Yu Lin, Tao Chen, Biao Wang, Ruixia Zhang, Yabo Shanghai Inst Technol Sch Comp Sci & Informat Engn HaiQuan Rd Shanghai 201418 Peoples R China

Distance calculation is a critical link in the research fields of object trajectory prediction, automatic driving obstacle avoidance, and so on. However, the research on distance using deep learning methods has yet to attract wide attention. The accuracy of traditional distance estimation algorithms based on the optical principle and mathematical modeling is low in practical applications, mainly the curve or slope of the road surface. This paper addresses the challenging distance estimation problem by developing an end-to-end structured model to directly predict the distance for objects in a given image. Besides, the traditional mathematical modeling process is replaced by this learning-based method. To facilitate the research on this task, we construct the extended distance datasets by KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) and NYU(Nathan Silberman, Pushmeet Kohli, Derek Hoiem, Rob Fergus) Depth v2 distance datasets. Experimental results demonstrate that the structured learning model has higher accuracy than the traditional algorithm in different distance ranges and better performance for curves and ramps. Moreover, improving neural network performance will be the direction of improving the model in the future.

关键词： Distance estimation Convolutional neural network Monocular image Structured deep learning Conditional random field machine learning Computer vision Single camera

来源：评论

学校读者我要写书评

暂无评论

All-in-One image Coding for Joint Human-machine vision with Multi-Path Aggregation 38

All-in-One Image Coding for Joint Human-Machine Vision with ...

引用

38th Conference on Neural Information processing Systems, NeurIPS 2024

作者： Zhang, Xu Guo, Peiyao Lu, Ming Ma, Zhan School of Electronic Science and Engineering Nanjing University China

image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at https://***/NJUvision/MPA. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Identification of triangular single crystals of transition metal dichalcogenides based on the detection algorithm

引用

OPTICS LETTERS 2024年第2期49卷 298-301页

作者： Mao, Yu Wang, Zixin Xu, Chang Wang, Yan Dong, Ningning Wang, Jun Chinese Acad Sci Shanghai Inst Opt & Fine Mech Photon Integrated Circuits Ctr Shanghai 201800 Peoples R China Univ Chinese Acad Sci Ctr Mat Sci & Optoelect Engn Beijing 100049 Peoples R China Chinese Acad Sci State Key Lab High Field Laser Phys Shanghai Inst Opt & Fine Mech Shanghai 201800 Peoples R China Chinese Acad Sci Ctr Excellence Ultraintense Laser Sci CEULS Shanghai 201800 Peoples R China

The distinctive properties and facile integration of 2D materials hold the potential to offer promising avenues for the on-chip photonic devices, and the expeditious and nondestructive identification and localization of diverse fundamental building blocks become key prerequisites. Here, we present a methodology grounded in digital image processing and deep learning, which effectively achieves the detection and precise localization of four monolayer-thick triangular single crystals of transition metal dichalcogenides with the mean average precision above 90%, and the approach demonstrates robust recognition capabilities across varied imaging conditions encompassing both white light and monochromatic light. This stands poised to serve as a potent data-driven tool enhancing the characterizing efficiency and holds the potential to expedite research initiatives and applications founded on the utilization of 2D materials. (c) 2024 Optica Publishing Group

关键词： Deep learning Digital image processing image metrics Imaging systems machine vision Photonic devices

来源：评论

学校读者我要写书评

暂无评论

Camera Based mmWave Beam Prediction: Towards Multi-Candidate Real-World Scenarios

引用

IEEE TRANSACTIONS ON vEHICULAR TECHNOLOGY 2025年第4期74卷 5897-5913页

作者： Charan, Gouranga Alrabeiah, Muhammad Osman, Tawfik Alkhateeb, Ahmed Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85287 USA King Saud Univ Elect Engn Dept Riyadh 11421 Saudi Arabia

Leveraging sensory information to aid the millimeter-wave (mmWave) and sub-terahertz (sub-THz) beam selection process is attracting increasing interest. This sensory data, captured for example by cameras at the basestations, has the potential of significantly reducing the beam sweeping overhead and enabling highly-mobile applications. The solutions developed so far, however, have mainly considered single-candidate scenarios, i.e., scenarios with a single candidate user in the visual scene, and were evaluated using synthetic datasets. To address these limitations, this paper extensively investigates the sensing-aided beam prediction problem in a real-world multi-object vehicle-to-infrastructure (v2I) scenario and presents a comprehensive machine learning based framework. In particular, this paper proposes to utilize visual and positional data to predict the optimal beam indices as an alternative to the conventional beam sweeping approaches. For this, a novel user (transmitter) identification solution has been developed, a key step in realizing sensing-aided multi-candidate and multi-user beam prediction solutions. The proposed solutions are evaluated on the large-scale real-world DeepSense 6G dataset. Experimental results in realistic v2I communication scenarios indicate that the proposed solutions achieve between $67-84\%$ top-1 and close to 100% top-5 beam prediction accuracy for the scenarios with single-user, and between $65-80\%$ top-1 and close to 95% top-5 beam prediction accuracy for multi-candidate scenarios. Furthermore, the proposed approach can identify the probable transmitting candidate with more than 93% accuracy across the different scenarios. This highlights a promising approach for significantly reducing the beam training overhead in mmWave/THz communication systems.

关键词： Wireless communication Wireless sensor networks visualization Training Millimeter wave communication vectors Computer vision Array signal processing Sensors Receivers Beam prediction computer vision deep learning mmWave communication multi-user

来源：评论

学校读者我要写书评

暂无评论

Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation

Learning to Adapt CLIP for Few-Shot Monocular Depth Estimati...

引用

IEEE/CvF Winter Conference on applications of Computer vision (WACv)

作者： Hu, Xueting Zhang, Ce Zhang, Yi Hai, Bowen Yu, Ke He, Zhihai Southern Univ Sci & Technol Dept Elect & Elect Engn Shenzhen Peoples R China Pengcheng Lab Shenzhen Peoples R China Southern Univ Sci & Technol Ctr Computat Sci & Engn Shenzhen Peoples R China

ISBN: (纸本)9798350318920;9798350318937

Pre-trained vision-Language Models (vLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic modalities. When CLIP is used for depth estimation tasks, the patches, divided from the input images, can be combined with a series of semantic descriptions of the depth information to obtain similarity results. The coarse estimation of depth is then achieved by weighting and summing the depth values, called depth bins, corresponding to the pre-defined semantic descriptions. The zero-shot approach circumvents the computational and time-intensive nature of traditional fully-supervised depth estimation methods. However, this method, utilizing fixed depth bins, may not effectively generalize as images from different scenes may exhibit distinct depth distributions. To address this challenge, we propose a few-shot-based method which learns to adapt the vLMs for monocular depth estimation to balance training costs and generalization capabilities. Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference. Additionally, we incorporate learnable prompts to preprocess the input text to convert the easily human-understood text into easily model-understood vectors and further enhance the performance. With only one image per scene for training, our extensive experiment results on the NYU v2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE(1).

关键词： Algorithms Algorithms Algorithms and algorithms formulations image recognition and understanding machine learning architectures vision + language and/or other modalities

来源：评论

学校读者我要写书评

暂无评论

A Comparative Study on Pruning Deep Convolutional Neural Networks Using Clustering Methods: K-Means, CLIQUE, DENCLUE, and OptiGrid 24

A Comparative Study on Pruning Deep Convolutional Neural Net...

引用

9th International Conference on Multimedia and image processing (ICMIP)

作者： Alqemlas, Danah Saud Jeragh, Mohammad Esmaeel Kuwait Univ Kuwait Kuwait Kuwait Oil Co Kuwait Kuwait

ISBN: (纸本)9798400716164

In the past years, machine learning (ML) and deep learning (DL) have led to the advancement of several applications, including computer vision, natural language processing, and audio processing. These complex tasks require large models, which is a challenge to deploy in devices with limited resources. These resource-constrained devices have limited computation power and memory. Hence, the neural networks must be optimized through network acceleration and compression techniques. This paper proposes a novel method to compress and accelerate neural networks from a small set of spatial convolution kernels. Firstly, a novel pruning algorithm is proposed based on the density-based clustering method that identifies and removes redundancy in CNNs while maintaining the accuracy and throughput tradeoff. Secondly, a novel pruning algorithm based on the grid-based clustering method is proposed to identify and remove redundancy in CNNs. The performance of the three pruning algorithms (density-based, grid-based, and partitional-based clustering algorithms) is evaluated against each other. The experiments were conducted using the deep CNN compression technique on the vGG-16 and ResNet models to achieve higher accuracy on image classification than the original model at a higher compression ratio and speedup.

关键词： Neural Network Pruning Clustering Methods image processing

来源：评论

学校读者我要写书评

暂无评论

Evolving processing Pipelines for Industrial Imaging with Cartesian Genetic Programming

Evolving Processing Pipelines for Industrial Imaging with Ca...

引用

4th IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS)

作者： Margraf, Andreas Cui, Henning Stein, Anthony Haehner, Jorg Fraunhofer IGCV Digital Prod & AI Augsburg Germany Univ Augsburg Organ Comp Grp Augsburg Germany Univ Hohenheim AI Agr Engn Stuttgart Germany

ISBN: (纸本)9798350337440

The reconfiguration of machine vision systems heavily depends on the collection and availability of large datasets, rendering them inflexible and vulnerable to even minor changes in the data. This paper proposes a refinement of Miller's Cartesian Genetic Programming methodology, aimed at generating filter pipelines for image processing tasks. The approach is based on CGP-IP, but specifically adapted for image processing in industrial monitoring applications. The suggested method allows for retraining of filter pipelines using small datasets;this concept of self-adaptivity renders high-precision machine vision more resilient to faulty machine settings or changes in the environment and provides compact programs. A dependency graph is introduced to rule out invalid pipeline solutions. Furthermore, we suggest to not only generate pipelines from scratch, but store and reapply previous solutions and re-adjust filter parameters. Our modifications are designed to increase the likelihood of early convergence and improvement in the fitness indicators. This form of self-adaptivity allows for a more resource-efficient configuration of image filter pipelines with small datasets.

关键词： cgp image filters monitoring segmentation

来源：评论

学校读者我要写书评

暂无评论

A hybrid heuristic model for video surveillance-based on object detection and video anomaly classification using optimized Bi-LSTM-RBF

引用

SIGNAL image AND vIDEO processing 2025年第6期19卷 1-14页

作者： Ramasamy, Dhivya Praba Kanagaraj, Kavitha Kumarasamy, Jasmine Kumaraguru Coll Technol Elect & Commun Coimbatore 641049 Tamil Nadu India

Nowadays, video monitoring applications have become significant for observing human activity with the help of computer vision-based approaches for investigating numerous video sequences. The major goal of anomaly identification is to discover the abnormalities automatically in a short interval of time. Performing efficient anomaly detection in a video monitoring system is considered as a complex task due to video noise, spilling, and anomalies. various anomaly detection models based on Artificial Intelligence (AI) have been developed for video surveillance;however, these models often address only specific issues and do not consider the evaluation concerns over time. Hence, this paper aims to implement a video anomaly detection model through surveillance cameras for reducing abnormal activities that enhance the security of the environment. At first, the input videos are collected from the standard benchmark datasets. These collected videos are given in the frame extraction phase. Further, the extracted frames are fed to the object detection phase, where the YOLO-v3 technique is used. Parameter optimization of YOLO-v3 is achieved using the Modified Cat and Mouse Optimization (MCMO) algorithm to improve detection performance. The object-detected frames are fed as input to the ResNet for extracting the deep features. The extracted deep features are utilized for the classification phase, where the Optimized Bi-directional Long Short Term Memory (Bi-LSTM)-Radial Basis Function (RBF) (OBi-LSTM-RBF) provides the classified anomaly outcome. The variables are optimized using the enhanced CMO algorithm for enhancing the efficacy of the anomaly classification. Simulation evaluations are carried out to reveal the effectiveness of the offered approach with diverse baseline algorithms using diverse performance measures. The offered approach shows significant enhancement in accuracy over baseline approaches. Specifically, it outperforms conventional CNNs by 45.47%, DNNs by 90.03%,

关键词： video anomaly classification Object detection video surveillance Improved YOLO-v3 Modified cat and mouse optimization ResNet Optimized bi-directional long short term memory-radial basis function

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：