检索结果-内蒙古大学图书馆

17th International conference on machine vision, ICMV 2024

作者： Moschidis, Christos Vrochidou, Eleni Papakostas, George A. Department of Informatics Democritus University of Thrace Kavala65404 Greece

ISBN: (纸本)9781510688278

Common computer vision (CV) tasks include image classification, object detection, segmentation, and recognition. To handle such tasks, machine learning (ML) models for image processing require a great amount of annotated training data. While datasets are expanding in size and variety, annotation becomes demanding, since its quality can severely affect the models' performance. Thus, several annotation tools have been developed and designated for specific applications and model requirements. This work aims to provide an overview of the most up-to-date annotation tools for computer vision tasks, including 2D and 3D image data and video, comparatively highlighting their advantages and limitations. The appropriateness of each tool for specific tasks is emphasized, providing a reference map for researchers towards determining the annotation tool best tailored to their needs. Future trends in image annotation are also discussed. © 2025 SPIE.

关键词： image annotation

来源：评论

学校读者我要写书评

暂无评论

An AI Solution for Web Accessibility and images Classification 13

An AI Solution for Web Accessibility and Images Classificati...

引用

13th International conference on image processing Theory Tools and Applications

作者： Noreskal, Laura Feuilloley, Guillaume Charbel, Simon-Pierre Sogeti Part Capgemini Issy Les Moulineaux France Sogeti Part Capgemini Rennes France

ISBN: (纸本)9798331541859;9798331541842

This article details the research on web accessibility conducted at Capgemini's SogetiLabs. We introduce our project aimed at developing an automatic accessibility audit tool for website images. Our AI solution for web accessibility focuses on distinguishing between informative and decorative images in line with RGAA (Referenciel Gen eral d'Am elioration de l'Acessibilite) recommendations and then generating alternative text for informative images. To achieve this, we have established a comprehensive processing workflow. Additionally, we present initial experiments in image classification using Convolutional Neural Networks (CNNs) and YOLO's (You Only Look Once) model.

关键词： machine Learning Web Accessibility RGAA Computer vision image processing Classification YOLO CNN

来源：评论

学校读者我要写书评

暂无评论

Enhancing Visual Question Answering through Bi-Modal Feature Fusion: Performance Analysis 24

Enhancing Visual Question Answering through Bi-Modal Feature...

引用

6th International conference on image processing and machine vision (IPMV)

作者： Mao, Keyu Fudan Univ Sch Data Sci Shanghai Peoples R China

ISBN: (纸本)9798400708473

With the significant development in deep learning within the domains of computer vision and natural language processing, the research involving the multimodal aspects of Visual Question Answering (VQA) has also reached a pivotal turning point in recent years. Throughout prior investigations, scholars have consistently emphasized feature extraction from images and text. Numerous models have been applied in this context, ranging from the initial breakthroughs of Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), to the momentary prominence of Dynamic Memory Networks, and subsequently, the rise of transformers in recent times. Nonetheless, it is imperative to recognize that beyond the ambit of feature extraction models, the fusion of bi-modal features assumes pivotal significance. This paper builds upon SOAT model from the previous work, serving as the baseline, and meticulously scrutinizes its performance across distinct fusion methodologies. Various notable fusion strategies, such as MUTAN and BLOCK, are considered. Notably, the most adept model achieves an impressive 65.74% accuracy on the VQA v2 dataset, outperforming established benchmarks. This outcome robustly substantiates the premise that fusion techniques exert tangible influence over the ultimate research outcomes.

关键词： VQA Natural language processing Computer vision Fusion Bilinear models

来源：评论

学校读者我要写书评

暂无评论

Optimization of Wafer Scan image Based on Improved Canny Edge Detection 2

Optimization of Wafer Scan Image Based on Improved Canny Edg...

引用

2nd International conference on Algorithm, image processing and machine vision, AIPMV 2024

作者： Wu, Zilong Xie, DongDong Sun, WeiQiang Yang, Leijing School of Electronic Engineering Beijing University of Posts and Telecommunications Beijing China Beijing China

ISBN: (纸本)9798350390254

The scanning electron microscope (SEM) is vital in wafer processing, providing high-res surface images for defect analysis. Despite optimizations, images may have noise and edge jitter from electromagnetic interference and mechanical vibrations. This paper aims to design a software-based image processing algorithm based on the Canny edge detection algorithm to address these issues as much as possible and enhance the readability of wafer images without compromising their original information. This paper mainly optimizes the algorithm from three aspects: The non-maximum suppression step is replaced by GEF to keep the edge information intact, the noise is suppressed by a special anisotropic Gaussian filter, and the threshold adaptive capability is added to the algorithm. The wafer image processed by the improved Canny algorithm will have less noise interference and more image details, which can reduce the workload of subsequent processing. © 2024 IEEE.

关键词： machine vibrations

来源：评论

学校读者我要写书评

暂无评论

IMPROVING image CODING FOR machineS THROUGH OPTIMIZING ENCODER VIA AUXILIARY LOSS 31

IMPROVING IMAGE CODING FOR MACHINES THROUGH OPTIMIZING ENCOD...

引用

2024 International conference on image processing

作者： Iino, Kei Akamatsu, Shunsuke Watanabe, Hiroshi Enomoto, Shohei Sakamoto, Akira Eda, Takeharu Waseda Univ Grad Sch Fundamental Sci & Engn Tokyo Japan NTT Software Innovat Ctr Tokyo Japan

ISBN: (纸本)9798350349405;9798350349399

image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM;optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.

关键词： image coding for machines ICM learned image compression auxiliary loss VCM

来源：评论

学校读者我要写书评

暂无评论

Optical Inspection of Stator Slots for Electric Motors

引用

MANUFACTURING LETTERS 2024年 41卷 103-112页

作者： Wagner, Sean Agapiou, John Gen Motors Global Res & Dev 30470 Harley Earl Blvd Warren MI 48092 USA

An optical non-contact inspection system was developed for measuring the slots in stator lamination stacks. To avoid passing go/no-go gage blocks through the slots, a machine vision system is instead used to measure the stator core slots and identify the presence of burrs within the slots. Utilizing telecentric optics along with an alignment monitoring system configured to monitor and orient the stator core, the core slots can be oriented relative to the imaging axis for further metrology measurements. Among these measurements, the smallest opening dimensions (slot width and depth) of each slot due to misalignment of laminations and the detection of burrs along the edges of the slots throughout the length of the lamination stack are critical for full stator assembly. Advanced image processing algorithms were developed to obtain sub-pixel accuracy which is required to measure the slots. This, used in conjunction with a robust vision calibration technique, increases the feasibility of building a device that can be implemented as a production inspection system. Experiments show the reliability of the computer vision approach and how it can be used in the inspection of slots in lamination stacks.

关键词： dimensional metrology computer vision machine vision electric motors stator core lamination stacks nondestructive evaluation

来源：评论

学校读者我要写书评

暂无评论

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA 31

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA

引用

2024 International conference on image processing

作者： Mandal, Maniratnam Birkbeck, Neil Adsumilli, Balu Bovik, Alan C. Univ Texas Austin Austin TX 78712 USA YouTube Mountain View CA USA Google Inc Mountain View CA USA

ISBN: (纸本)9798350349405;9798350349399

User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during and after capture, which are inherently diverse and commingled. These distortions have different perceptual effects based on the media content. Given recent dramatic increases in the consumption of short-form content, the analysis and control of their perceptual quality has become an important problem. Regardless of the content, many UGC videos have overlaid and embedded texts in them, which are visually salient. Hence text quality has a significant impact on the global perception of video or image quality and needs to be studied. One of the most important factors in perceptual text quality in user-generated media is legibility, which has been studied very little in the context of computer vision. Predicting text legibility can also help in text recognition applications such as image search or document identification. This work aims at modeling text legibility using computer vision techniques and thus studying the relationship between text quality and legibility. We propose a modified dataset variant of COCO-Text [1] and a model for predicting text legibility for both handwritten and machine-generated texts. We also demonstrate how models trained to predict text legibility can help in the prediction of text (perceptual) quality. The dataset and models can be accessed here https://***/research/Quality/***.

关键词： image Quality Assessment Text Quality Text Legibility User-generated Content

来源：评论

学校读者我要写书评

暂无评论

Attention-Based image Caption Generation 5th

Attention-Based Image Caption Generation

引用

5th International conference on Data Science, machine Learning and Applications

作者： Manasa, M. Sowmya, D. Reddy, Y. Supriya Sreedevi, Pogula G Pulla Reddy Engn Coll Dept Comp Sci & Engn Data Sci Kurnool 5185007 India G Pulla Reddy Engn Coll Dept Comp Sci & Engn Kurnool 5185007 India Rajeev Gandhi Mem Coll Engn & Technol Dept Comp Sci & Engn Nandyal 518501 India

ISBN: (纸本)9789819780334;9789819780310;9789819780303

The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption generator can be incredibly useful for producing captions. In this study, we present a unique method for creating picture captions utilizing an attention mechanism that concentrates on pertinent areas of the image while it creates captions. On benchmark datasets, our model, which uses deep neural networks to extract picture attributes and produce captions, obtains state-of-the-art results, confirming the effectiveness of the attention mechanism in raising the caliber of the generated captions. We also offer a thorough evaluation of the performance of our approach and talk about potential future directions for enhancing image caption generation.

关键词： image Caption Computer vision Natural Language processing Attention Mechanism Performance

来源：评论

学校读者我要写书评

暂无评论

Research on the algorithm of image processing and target detection based on embedded vision module 4

Research on the algorithm of image processing and target det...

引用

4th International conference on Optics and image processing, ICOIP 2024

作者： Junjie, Wu Liantao, Ding Sihan, Liu Shuai, Ren Tong, Wu College of Communication Engineering Jilin University Chang Chun China School of Mechanical and Aerospace Engineering Jilin University Chang Chun China First Automotive Work Shop -VOLKSWAGEN Automobile Company Limited Chang Chun China

ISBN: (纸本)9781510682481

To improve the recognition accuracy of the embedded visual module and make it competent for visual tasks on complex occasions, an image preprocessing method used on the OpenMV is proposed. Aiming at the two main recognition tasks of machine vision, color recognition and shape recognition, an image preprocessing algorithm and a filtering algorithm based on the OpenMV of embedded vision module are proposed for use by analyzing the interference existing in practical problems. For color features, the processing of binary segmentation and image morphology can filter the background and the noise similar to the target, which greatly improves the recognition accuracy with only a small increase in recognition time. For shape features, kernel filtering and Hough Transform are used to filter the irrelevant targets in the image, which improves the recognition accuracy, reduces the image complexity, and speeds up the operation speed of Hough Transform. The experimental and practical results show that the image preprocessing method combined with OpenMV can not only take into account the advantages of the embedded visual module but also complete the task of target recognition with high accuracy, which helps meet the needs of practical engineering and promote the wide application of embedded vision. © 2024 SPIE.

关键词： machine vision

来源：评论

学校读者我要写书评

暂无评论

Deep Learning-Based image Style Migration Technique and Art Design 2

Deep Learning-Based Image Style Migration Technique and Art ...

引用

2nd International conference on Algorithm, image processing and machine vision, AIPMV 2024

作者： Deng, Kaida Institute of Information Engineering Huizhou Engineering Vocational College Guangdong Huizhou516000 China

ISBN: (纸本)9798350390254

In this paper, we adopt image style migration technique based on deep learning, use Vgg19 network for content and style feature extraction, combine an image with art design style, and realise the generation of Van Gogh style Eiffel Tower image. The style conversion effect is evaluated by content loss and style loss calculation, and the values of J_content and J_style show an overall decreasing trend, indicating that the model works well and the output image gradually completes the style migration. This technique is of great significance in the field of computer vision and artificial intelligence, and brings new creative possibilities to the field of art and design, expanding the boundaries of image processing and art and design. © 2024 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：