Lying in the cross-section of computer vision and natural language processing, vision language models are capable of processingimages and text at once. These models are helpful in various tasks: text generation from ...
详细信息
Lying in the cross-section of computer vision and natural language processing, vision language models are capable of processingimages and text at once. These models are helpful in various tasks: text generation from image and vice versa, image-text retrieval, or visual navigation. Besides building a model trained on a dataset for a task, people also study general-purpose models to utilize many datasets for multitasks. Their two primary applications are image captioning and visual question answering. For English, large datasets and foundation models are already abundant. However, for Vietnamese, they are still limited. To expand the language range, this work proposes a pretrained general-purpose image-text model named VisualRoBERTa. A dataset of 600k images with captions (translated MS COCO 2017 from English to Vietnamese) is introduced to pretrain VisualRoBERTa. The model's architecture is built using Convolutional Neural Network and Transformer blocks. Fine-tuning VisualRoBERTa shows promising results on the ViVQA dataset with 34.49% accuracy, 0.4173 BLEU 4, and 0.4390 RougeL (in visual question answering task), and best outcomes on the sViIC dataset with 0.6685 BLEU 4, 0.6320 RougeL (in image captioning task).
Privacy is a crucial concern in collaborative machinevision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machinevision model do...
详细信息
Privacy is a crucial concern in collaborative machinevision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machinevision model does not need the exact visual content to perform its task. Taking advantage of this potential, private information could be removed from the data insofar as it does not significantly impair the accuracy of the machinevision system. In this paper, we present an autoencoder-style network integrated within an object detection pipeline, which generates a latent representation of the input image that preserves task-relevant information while removing private information. Our approach employs an adversarial training strategy that not only removes private information from the bottleneck of the autoencoder but also promotes improved compression efficiency for feature channels coded by conventional codecs like VVC-Intra. We assess the proposed system using a realistic evaluation framework for privacy, directly measuring face and license plate recognition accuracy. Experimental results show that our proposed method is able to reduce the bitrate significantly at the same object detection accuracy compared to coding the input images directly, while keeping the face and license plate recognition accuracy on the images recovered from the bottleneck features low, implying strong privacy protection. Our code is available at https://***/bardia-az/ppa-code.
The specular reflection of objects is an important factor affecting image display quality, which poses challenges to tasks such as pattern recognition and machinevision detection. At present, specular removal for a s...
详细信息
The specular reflection of objects is an important factor affecting image display quality, which poses challenges to tasks such as pattern recognition and machinevision detection. At present, specular removal for a single real image is a crucial pre-processing step to improve the performance of computer vision algorithms. Despite notable approaches tailored for handling synthesized and pre-simplified images with dark backgrounds, real-time separation of specular reflection for a single real image remains a challenging problem. This paper proposes a novel specular removal method to separate the specular reflection for a single real image accurately and efficiently based on the dark channel prior. Initially, a modified-specular-free (MSF) image is developed using the dark channel prior, which can derive a direct estimation of specular reflection. Next, the image chromaticity spaces are established to represent the pixel intensity. Then, the maximum chromaticity value of the modified MSF image is extracted to guide the filtering of the specular reflection, treating the specular pixels as noise in the chromaticity space. Finally, the image without specular reflection can be obtained using the restored maximum chromaticity value based on the dichromatic reflection model. The superiority of this method is to achieve highquality specular reflection separation quickly without destroying the geometric features of the real image. Compared with the state-of-the-art methods, experimental results show that the proposed algorithm can achieve the best subjective visual effect and satisfactory quantitative performance. In addition, this approach can be implemented efficiently to meet real-time requirements, promising to be applied to computer vision measurement and inspection applications.
Embedded computer vision systems are increasingly being adopted across various domains, playing a pivotal role in enabling advanced technologies such as autonomous vehicles and industrial automation. Their cost-effect...
详细信息
Embedded computer vision systems are increasingly being adopted across various domains, playing a pivotal role in enabling advanced technologies such as autonomous vehicles and industrial automation. Their cost-effectiveness, compact size, and portability make them particularly well-suited for diverse implementations and operations. In real-time scenarios, these systems must process visual data with minimal latency, which is crucial for immediate decision-making. However, these solutions continue to face significant challenges related to computational efficiency, memory usage, and accuracy. This research addresses these challenges by enhancing classification methodologies, specifically in Gray Level Co-occurrence Matrix (GLCM) feature extraction and Support Vector machine (SVM) classifiers. To maintain a high level of accuracy while preserving performance, a smaller feature set is selected following a comprehensive complexity analysis and is further refined through Correlation-based Feature Selection (CFS). The proposed method achieves an overall classification accuracy of 84.76% with a feature set reduced by 79.2%, resulting in a 72.45% decrease in processing time, a 50% reduction in storage requirements, and up to a 77.8% decrease in memory demand during prediction. These improvements demonstrate the effectiveness of the proposed approach in improving the adaptability and capabilities of embedded vision systems (EVS), optimizing their performance under the constraints of real-time limited-resource environments.
Extending the depth of field (DOF) of imaging optics is a longstanding challenge in machinevision, microscopy, photography and cinematography. This paper presents a method to extend DOF of camera lenses up to 5 times...
详细信息
ISBN:
(纸本)9781510673151;9781510673144
Extending the depth of field (DOF) of imaging optics is a longstanding challenge in machinevision, microscopy, photography and cinematography. This paper presents a method to extend DOF of camera lenses up to 5 times by using foto-foXXus - multi-focus quasi afocal optics. The foto-foXXus devices are implemented as achromatic aplanatic optical systems installed in front of camera lenses in such a way that the combined optical system has simultaneously several focuses separated along the optical axis. When applied for imaging a scene, such a combined optical system forms along the optical axis several images of each object of the extended DOF. The inevitable decrease in contrast of the common image, resulting from defocusing of some images from the plane of camera sensor (or film), can be enhanced using specific algorithms in the stage of imageprocessing, which is nowadays an obligatory part of image capture in machinevision or microscopy. This method is very effective in capturing black-and-white objects, such as QR-codes, or in computer vision-based robotic arms for detecting the shape and size of objects. Direct measurements of the modulation transfer function (MTF) and through-focus MTF curves for a system consisting of a foto-foXXus and a state-of-the-art machinevision objective confirm the increase in depth of focus of the combined optical system and, consequently, depth of field in the Object space. The paper presents description of the foto-foXXus devices, measurements data of MTF and through-focus MTF-curves using the MTF test bench, as well as examples of imaging real objects demonstrating effective extending depth of field.
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and ...
详细信息
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related imageprocessingapplications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
The computer vision-based analysis of railway superstructure has gained significant attention in railway engineering. This approach utilises advanced imageprocessing and machine learning techniques to extract valuabl...
详细信息
The computer vision-based analysis of railway superstructure has gained significant attention in railway engineering. This approach utilises advanced imageprocessing and machine learning techniques to extract valuable information from visual data captured in the railway track environment. By analysing images from various sources such as cameras, drones, or sensors, computer vision algorithms can accurately detect and classify different components of the ballast superstructure, including the catenary system support, rail surface and profile, fastening system, sleeper, and ballast layer. This enables the automated assessment of the railway track's condition, stability, and maintenance needs. This paper comprehensively reviews the recent advancements, challenges, and potential applications of computer vision techniques in analysing railway superstructure. It discusses various vision-based methodologies and machine-learning approaches utilised in this context. Furthermore, it examines the benefits and limitations of computer vision-based analysis and presents future research directions for improving its applicability in railway track engineering.
In modern industrial production, high-temperature environments are commonplace, posing significant challenges to equipment stability, safety, and production efficiency. machinevision, as an effective automated inspec...
详细信息
In modern industrial production, high-temperature environments are commonplace, posing significant challenges to equipment stability, safety, and production efficiency. machinevision, as an effective automated inspection technology, has attracted extensive attention in high-temperature settings. However, the unique conditions of high temperatures, such as significant thermal noise and optical interference, demand enhanced performance from machinevision systems. The second law of thermodynamics provides a theoretical foundation for understanding these challenges, emphasizing the increase of entropy in energy transformation and transfer processes, and guides the design and optimization of machinevision systems in high-temperature environments. This paper aims to comprehensively explore the application of machinevision based on the second law of thermodynamics in high-temperature industrial inspection, focusing on two core issues: the impact of thermodynamic parameters on the performance of machinevision systems and the technology for analyzing high-temperature industrial infrared images using multiscale entropy. By thoroughly analyzing how thermodynamic parameters influence the design and implementation of machinevision systems, and by developing infrared imageprocessing algorithms adapted to high temperatures, this study seeks to enhance the efficiency and accuracy of machinevision technology in high-temperature industrial applications, providing theoretical support and technical guidance for the advancement of intelligent manufacturing.
The absence of standardized evaluation methodologies for single-layer dimensional accuracy significantly hinders the broader implementation of direct ink writing (DIW) technology. Addressing the critical need for prec...
详细信息
The absence of standardized evaluation methodologies for single-layer dimensional accuracy significantly hinders the broader implementation of direct ink writing (DIW) technology. Addressing the critical need for precision non-contact assessment in DIW fabrication, this study develops a novel machinevision-based framework for dimensional accuracy evaluation. The methodology encompasses three key phases: (1) establishment of an optimized hardware configuration with integrated imageprocessing algorithms;(2) comprehensive investigation of camera calibration protocols, advanced image preprocessing techniques, and high-precision contour extraction methods;and (3) development of an iterative closest point (ICP) algorithm-enhanced evaluation system. The experimental results demonstrate that our machinevision system achieves 0.04 mm x 0.04 mm spatial resolution with the ICP convergence threshold optimized to 0.001 mm. The proposed method shows an 80% improvement in measurement accuracy (0.001 mm) compared to conventional approaches. Process parameter optimization experiments validated the system's effectiveness, showing at least 76.3% enhancement in printed layer dimensional accuracy. This non-contact evaluation solution establishes a robust framework for quantitative quality control in DIW applications, providing critical insights for process optimization and standardization efforts in additive manufacturing.
Producers need to strictly control the quality of their products when facing customer needs, ensuring the qualification rate of the products. The level of product design is not only related to the abilities of designe...
详细信息
暂无评论