This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than ...
详细信息
ISBN:
(纸本)9784885523434
This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.
Speckle field is one of the most information-rich light fields related to plentiful physical characteristics at present that can be used to provide high-resolution surface topography information or applied to image re...
详细信息
ISBN:
(纸本)9798400716553
Speckle field is one of the most information-rich light fields related to plentiful physical characteristics at present that can be used to provide high-resolution surface topography information or applied to image reconstruction, image enhancement and other fields. However, the most study of speckle image recovery for monochromatic wavelength ignores a large amount of real object information. In this paper, the amplitude information of colored speckle recovery in optical imaging is studied, as might be seen if red, green, and blue lasers illuminate a rough surface with different reflectivity at these three wavelengths. We derived the expression for color speckle distribution and designed an imaging system with a pupil stop, normal plane-wave incidence on the diffuser, and a camera to observe the colored speckled image. In order to analyze the simulation experiment, two aspects are studied: phase shift and average speckle size. The results show that more characteristic information is recovered from the colored speckle image.
Currently, the welding process between electrical connectors and multi-core wires mainly relies on manual operation. This traditional method not only consumes a lot of time and manpower, but also long-term operation m...
详细信息
ISBN:
(纸本)9798350386783;9798350386776
Currently, the welding process between electrical connectors and multi-core wires mainly relies on manual operation. This traditional method not only consumes a lot of time and manpower, but also long-term operation may cause certain physical burden and health hazards to the operator. Therefore, researching and implementing automated welding between electrical connectors and multi-core wires has become an urgent problem to be solved. On the basis of summarizing the current research status at home and abroad, the software and hardware parts of the system were designed to meet the requirements of identifying and positioning welding circular electrical connectors. By introducing imageprocessing and machinevision technology, adopting a dual machine collaboration approach and based on machinevision methods, automatic wire welding of electrical connectors has been achieved, improving welding efficiency and reducing the labor intensity of operators. In addition, it is also conducive to promoting the development of industrial automation.
Convolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical imageprocessing, researchers have identified certain challenges associat...
详细信息
Convolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical imageprocessing, researchers have identified certain challenges associated with CNNs. These challenges encompass the generation of less informative features, limitations in capturing both high and low-frequency information within feature maps, and the computational cost incurred when enhancing receptive fields by deepening the network. Transformers have emerged as an approach aiming to address and overcome these specific limitations of CNNs in the context of medical image analysis. Preservation of all spatial details of medical images is necessary to ensure accurate patient diagnosis. Hence, this research introduced the use of a pure vision Transformer (ViT) for a denoising artificial neural network for medical imageprocessing specifically for low-dose computed tomography (LDCT) image denoising. The proposed model follows a U-Net framework that contains ViT modules with the integration of Noise2Neighbor (N2N) interpolation operation. Five different datasets containing LDCT and normal-dose CT (NDCT) image pairs were used to carry out this experiment. To test the efficacy of the proposed model, this experiment includes comparisons between the quantitative and visual results among CNN-based (BM3D, RED-CNN, DRL-E-MP), hybrid CNN-ViT-based (TED-Net), and the proposed pure ViT-based denoising model. The findings of this study showed that there is about 15-20% increase in SSIM and PSNR when using self-attention transformers than using the typical pure CNN. Visual results also showed improvements especially when it comes to showing fine structural details of CT images.
machine learning is the state of the art for many recurring tasks in several heterogeneous domains. In the last decade, it has been also widely used in Precision Agriculture (PA) and Wild Flora Monitoring (WFM) to add...
详细信息
machine learning is the state of the art for many recurring tasks in several heterogeneous domains. In the last decade, it has been also widely used in Precision Agriculture (PA) and Wild Flora Monitoring (WFM) to address a set of problems with a big impact on economy, society and academia, heralding a paradigm shift across the industry and academia. Many applications in those fields involve imageprocessing and computer vision stages. Remote sensing devices are very popular choice for image acquisition in this context, and in particular, Unmanned Aerial Vehicles (UAVs) offer a good tradeoff between cost and area coverage. For these reasons, research literature is rich of works that face problems in Precision Agriculture and Wild Flora Monitoring domains with machine learning/computer vision methods applied to UAV imagery. In this work, we review this literature, with a special focus on algorithms, model sizing, dataset characteristics and innovative technical solutions presented in many domain-specific models, providing the reader with an overview of the research trend in recent years.
User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during an...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during and after capture, which are inherently diverse and commingled. These distortions have different perceptual effects based on the media content. Given recent dramatic increases in the consumption of short-form content, the analysis and control of their perceptual quality has become an important problem. Regardless of the content, many UGC videos have overlaid and embedded texts in them, which are visually salient. Hence text quality has a significant impact on the global perception of video or image quality and needs to be studied. One of the most important factors in perceptual text quality in user-generated media is legibility, which has been studied very little in the context of computer vision. Predicting text legibility can also help in text recognition applications such as image search or document identification. This work aims at modeling text legibility using computer vision techniques and thus studying the relationship between text quality and legibility. We propose a modified dataset variant of COCO-Text [1] and a model for predicting text legibility for both handwritten and machine-generated texts. We also demonstrate how models trained to predict text legibility can help in the prediction of text (perceptual) quality. The dataset and models can be accessed here https://***/research/Quality/***.
In recent years, the model of improved GAN has been widely applied in the field of machinevision. It not only covers the traditional imageprocessing, but also includes image conversion, image synthesis and so on.. F...
详细信息
In recent years, the model of improved GAN has been widely applied in the field of machinevision. It not only covers the traditional imageprocessing, but also includes image conversion, image synthesis and so on.. Firstly, this paper describes the basic principles and existing problems of GAN, then introduces several improved GAN models, including Info-GAN, DC-GAN, f-GAN, Cat-GAN and others. Secondly, several improved GAN models for different applications in the field of machinevision are described. Finally, the future trend and development of GAN are prospected.
We present a new data generation method to facilitate an automatic machine interpretation of 2D engineering part drawings. While such drawings are a common medium for clients to encode design and manufacturing require...
详细信息
We present a new data generation method to facilitate an automatic machine interpretation of 2D engineering part drawings. While such drawings are a common medium for clients to encode design and manufacturing requirements, a lack of computer support to automatically interpret these drawings necessitates part manufacturers to resort to laborious manual approaches for interpretation which, in turn, severely limits processing capacity. Although recent advances in trainable computer vision methods may enable automatic machine interpretation, it remains challenging to apply such methods to engineering drawings due to a lack of labeled training data. As one step toward this challenge, we propose a constrained data synthesis method to generate an arbitrarily large set of synthetic training drawings using only a handful of labeled examples. Our method is based on the randomization of the dimension sets subject to two major constraints to ensure the validity of the synthetic drawings. The effectiveness of our method is demonstrated in the context of a binary component segmentation task with a proposed list of descriptors. An evaluation of several image segmentation methods trained on our synthetic dataset shows that our approach to new data generation can boost the segmentation accuracy and the generalizability of the machine learning models to unseen drawings.
This paper aims to explore an innovative method combining computer vision and machine learning to accurately identify and analyze various movements in badminton. This paper first summarizes the application prospect of...
详细信息
Convolutional neural networks (CNNs) have significantly contributed to recent advances in machine learning and computer vision. Although initially designed for image classification, the application of CNNs has stretch...
详细信息
Convolutional neural networks (CNNs) have significantly contributed to recent advances in machine learning and computer vision. Although initially designed for image classification, the application of CNNs has stretched far beyond the context of images alone. Some exciting applications, e.g., in natural language processing and image segmentation, implement one-dimensional CNNs, often after a pre-processing step that transforms higher-dimensional input into a suitable data format for the networks. However, local correlations within data can diminish or vanish when one converts higher-dimensional data into a one-dimensional string. The Hilbert space-filling curve can minimize this loss of locality. Here, we study this claim rigorously by comparing an analytical model that quantifies locality preservation with the performance of several neural networks trained with and without Hilbert mappings. We find that Hilbert mappings offer a consistent advantage over the traditional flatten transformation in test accuracy and training speed. The results also depend on the chosen kernel size, agreeing with our analytical model. Our findings quantify the importance of locality preservation when transforming data before training a one-dimensional CNN and show that the Hilbert space-filling curve is a preferential transformation to achieve this goal.
暂无评论