Common computer vision (CV) tasks include image classification, object detection, segmentation, and recognition. To handle such tasks, machine learning (ML) models for imageprocessing require a great amount of annota...
详细信息
This article details the research on web accessibility conducted at Capgemini's SogetiLabs. We introduce our project aimed at developing an automatic accessibility audit tool for website images. Our AI solution fo...
详细信息
ISBN:
(纸本)9798331541859;9798331541842
This article details the research on web accessibility conducted at Capgemini's SogetiLabs. We introduce our project aimed at developing an automatic accessibility audit tool for website images. Our AI solution for web accessibility focuses on distinguishing between informative and decorative images in line with RGAA (Referenciel Gen eral d'Am elioration de l'Acessibilite) recommendations and then generating alternative text for informative images. To achieve this, we have established a comprehensive processing workflow. Additionally, we present initial experiments in image classification using Convolutional Neural Networks (CNNs) and YOLO's (You Only Look Once) model.
With the significant development in deep learning within the domains of computer vision and natural language processing, the research involving the multimodal aspects of Visual Question Answering (VQA) has also reache...
详细信息
ISBN:
(纸本)9798400708473
With the significant development in deep learning within the domains of computer vision and natural language processing, the research involving the multimodal aspects of Visual Question Answering (VQA) has also reached a pivotal turning point in recent years. Throughout prior investigations, scholars have consistently emphasized feature extraction from images and text. Numerous models have been applied in this context, ranging from the initial breakthroughs of Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), to the momentary prominence of Dynamic Memory Networks, and subsequently, the rise of transformers in recent times. Nonetheless, it is imperative to recognize that beyond the ambit of feature extraction models, the fusion of bi-modal features assumes pivotal significance. This paper builds upon SOAT model from the previous work, serving as the baseline, and meticulously scrutinizes its performance across distinct fusion methodologies. Various notable fusion strategies, such as MUTAN and BLOCK, are considered. Notably, the most adept model achieves an impressive 65.74% accuracy on the VQA v2 dataset, outperforming established benchmarks. This outcome robustly substantiates the premise that fusion techniques exert tangible influence over the ultimate research outcomes.
The scanning electron microscope (SEM) is vital in wafer processing, providing high-res surface images for defect analysis. Despite optimizations, images may have noise and edge jitter from electromagnetic interferenc...
详细信息
image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM;optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
An optical non-contact inspection system was developed for measuring the slots in stator lamination stacks. To avoid passing go/no-go gage blocks through the slots, a machinevision system is instead used to measure t...
详细信息
An optical non-contact inspection system was developed for measuring the slots in stator lamination stacks. To avoid passing go/no-go gage blocks through the slots, a machinevision system is instead used to measure the stator core slots and identify the presence of burrs within the slots. Utilizing telecentric optics along with an alignment monitoring system configured to monitor and orient the stator core, the core slots can be oriented relative to the imaging axis for further metrology measurements. Among these measurements, the smallest opening dimensions (slot width and depth) of each slot due to misalignment of laminations and the detection of burrs along the edges of the slots throughout the length of the lamination stack are critical for full stator assembly. Advanced imageprocessing algorithms were developed to obtain sub-pixel accuracy which is required to measure the slots. This, used in conjunction with a robust vision calibration technique, increases the feasibility of building a device that can be implemented as a production inspection system. Experiments show the reliability of the computer vision approach and how it can be used in the inspection of slots in lamination stacks.
User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during an...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during and after capture, which are inherently diverse and commingled. These distortions have different perceptual effects based on the media content. Given recent dramatic increases in the consumption of short-form content, the analysis and control of their perceptual quality has become an important problem. Regardless of the content, many UGC videos have overlaid and embedded texts in them, which are visually salient. Hence text quality has a significant impact on the global perception of video or image quality and needs to be studied. One of the most important factors in perceptual text quality in user-generated media is legibility, which has been studied very little in the context of computer vision. Predicting text legibility can also help in text recognition applications such as image search or document identification. This work aims at modeling text legibility using computer vision techniques and thus studying the relationship between text quality and legibility. We propose a modified dataset variant of COCO-Text [1] and a model for predicting text legibility for both handwritten and machine-generated texts. We also demonstrate how models trained to predict text legibility can help in the prediction of text (perceptual) quality. The dataset and models can be accessed here https://***/research/Quality/***.
The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption...
详细信息
The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption generator can be incredibly useful for producing captions. In this study, we present a unique method for creating picture captions utilizing an attention mechanism that concentrates on pertinent areas of the image while it creates captions. On benchmark datasets, our model, which uses deep neural networks to extract picture attributes and produce captions, obtains state-of-the-art results, confirming the effectiveness of the attention mechanism in raising the caliber of the generated captions. We also offer a thorough evaluation of the performance of our approach and talk about potential future directions for enhancing image caption generation.
To improve the recognition accuracy of the embedded visual module and make it competent for visual tasks on complex occasions, an image preprocessing method used on the OpenMV is proposed. Aiming at the two main recog...
详细信息
In this paper, we adopt image style migration technique based on deep learning, use Vgg19 network for content and style feature extraction, combine an image with art design style, and realise the generation of Van Gog...
详细信息
暂无评论