检索结果-内蒙古大学图书馆

A Novel Pretrained General-purpose vision Language Model for the Vietnamese Language

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION processing 2024年第5期23卷 1-16页

作者： Dinh Anh Vu Quang Nhat Minh Pham Giang Son Tran Univ Sci & Technol Hanoi Hanoi Vietnam Aimesoft JSC R&D Hanoi Vietnam Univ Sci & Technol Hanoi ICTLab Vietnam Acad Sci & Technol 18 Hoang Quoc Viet Hanoi Vietnam

Lying in the cross-section of computer vision and natural language processing, vision language models are capable of processing images and text at once. These models are helpful in various tasks: text generation from image and vice versa, image-text retrieval, or visual navigation. Besides building a model trained on a dataset for a task, people also study general-purpose models to utilize many datasets for multitasks. Their two primary applications are image captioning and visual question answering. For English, large datasets and foundation models are already abundant. However, for Vietnamese, they are still limited. To expand the language range, this work proposes a pretrained general-purpose image-text model named VisualRoBERTa. A dataset of 600k images with captions (translated MS COCO 2017 from English to Vietnamese) is introduced to pretrain VisualRoBERTa. The model's architecture is built using Convolutional Neural Network and Transformer blocks. Fine-tuning VisualRoBERTa shows promising results on the ViVQA dataset with 34.49% accuracy, 0.4173 BLEU 4, and 0.4390 RougeL (in visual question answering task), and best outcomes on the sViIC dataset with 0.6685 BLEU 4, 0.6320 RougeL (in image captioning task).

关键词： Computer vision natural language processing visual linguistic image text pretrain Vietnamese foundation multi-modal machine learning

来源：评论

学校读者我要写书评

暂无评论

Privacy-Preserving Autoencoder for Collaborative Object Detection

引用

IEEE TRANSACTIONS ON image processing 2024年 33卷 4937-4951页

作者： Azizian, Bardia Bajic, Ivan V. Simon Fraser Univ Sch Engn Sci Burnaby BC V5A 1S6 Canada

Privacy is a crucial concern in collaborative machine vision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machine vision model does not need the exact visual content to perform its task. Taking advantage of this potential, private information could be removed from the data insofar as it does not significantly impair the accuracy of the machine vision system. In this paper, we present an autoencoder-style network integrated within an object detection pipeline, which generates a latent representation of the input image that preserves task-relevant information while removing private information. Our approach employs an adversarial training strategy that not only removes private information from the bottleneck of the autoencoder but also promotes improved compression efficiency for feature channels coded by conventional codecs like VVC-Intra. We assess the proposed system using a realistic evaluation framework for privacy, directly measuring face and license plate recognition accuracy. Experimental results show that our proposed method is able to reduce the bitrate significantly at the same object detection accuracy compared to coding the input images directly, while keeping the face and license plate recognition accuracy on the images recovered from the bottleneck features low, implying strong privacy protection. Our code is available at https://***/bardia-az/ppa-code.

关键词： image coding Data privacy Training Privacy Codecs Visualization machine vision Deep neural network coding for machines privacy model inversion attack collaborative intelligence adversarial training feature compression

来源：评论

学校读者我要写书评

暂无评论

A fast specular removal method for a single real image☆

引用

DISPLAYS 2025年 87卷

作者： Hao, Chuanpeng He, Yan Li, Yufeng Niu, Xiaobo Wang, Yan Chongqing Univ State Key Lab Mech Transmiss Adv Equipment Chongqing 400030 Peoples R China Univ Brighton Sch Comp Engn & Math Brighton BN2 4GJ England

The specular reflection of objects is an important factor affecting image display quality, which poses challenges to tasks such as pattern recognition and machine vision detection. At present, specular removal for a single real image is a crucial pre-processing step to improve the performance of computer vision algorithms. Despite notable approaches tailored for handling synthesized and pre-simplified images with dark backgrounds, real-time separation of specular reflection for a single real image remains a challenging problem. This paper proposes a novel specular removal method to separate the specular reflection for a single real image accurately and efficiently based on the dark channel prior. Initially, a modified-specular-free (MSF) image is developed using the dark channel prior, which can derive a direct estimation of specular reflection. Next, the image chromaticity spaces are established to represent the pixel intensity. Then, the maximum chromaticity value of the modified MSF image is extracted to guide the filtering of the specular reflection, treating the specular pixels as noise in the chromaticity space. Finally, the image without specular reflection can be obtained using the restored maximum chromaticity value based on the dichromatic reflection model. The superiority of this method is to achieve highquality specular reflection separation quickly without destroying the geometric features of the real image. Compared with the state-of-the-art methods, experimental results show that the proposed algorithm can achieve the best subjective visual effect and satisfactory quantitative performance. In addition, this approach can be implemented efficiently to meet real-time requirements, promising to be applied to computer vision measurement and inspection applications.

关键词： Specular removal Highlight Dark channel machine vision image restoration

来源：评论

学校读者我要写书评

暂无评论

Enhanced Classification System for Real-Time Embedded vision applications

引用

IEEE ACCESS 2024年 12卷 162311-162326页

作者： Khelifi, Ramzi Nini, Brahim Berkane, Mohamed Univ Oum El Bouaghi Res Lab Comp Sci Complex Syst ReLa CS 2 Oum El Bouaghi 04000 Algeria Univ Oum El Bouaghi Artificial Intelligence & Autonomous Things Lab Oum El Bouaghi 04000 Algeria

Embedded computer vision systems are increasingly being adopted across various domains, playing a pivotal role in enabling advanced technologies such as autonomous vehicles and industrial automation. Their cost-effectiveness, compact size, and portability make them particularly well-suited for diverse implementations and operations. In real-time scenarios, these systems must process visual data with minimal latency, which is crucial for immediate decision-making. However, these solutions continue to face significant challenges related to computational efficiency, memory usage, and accuracy. This research addresses these challenges by enhancing classification methodologies, specifically in Gray Level Co-occurrence Matrix (GLCM) feature extraction and Support Vector machine (SVM) classifiers. To maintain a high level of accuracy while preserving performance, a smaller feature set is selected following a comprehensive complexity analysis and is further refined through Correlation-based Feature Selection (CFS). The proposed method achieves an overall classification accuracy of 84.76% with a feature set reduced by 79.2%, resulting in a 72.45% decrease in processing time, a 50% reduction in storage requirements, and up to a 77.8% decrease in memory demand during prediction. These improvements demonstrate the effectiveness of the proposed approach in improving the adaptability and capabilities of embedded vision systems (EVS), optimizing their performance under the constraints of real-time limited-resource environments.

关键词： Accuracy Support vector machines Real-time systems Feature extraction Memory management Computer vision Surveillance Bandwidth Wildlife machine learning image processing Embedded computer vision limited resource systems machine learning pattern classification real-time image processing

来源：评论

学校读者我要写书评

暂无评论

Multifocus Camera Optics with 5^x Extending the Depth of Field 8

Multifocus Camera Optics with 5<SUP>x</SUP> Extending the De...

引用

Conference on Optics, Photonics, and Digital Technologies for Imaging applications viii

作者： Laskin, Alexander Laskin, Vadim Ostrun, Aleksei AdlOpt GmbH Rudower Chaussee 29 D-12489 Berlin Germany St Petersburg Natl Res Univ Informat Technol Mech Kronverkskiy Pr 49 St Petersburg 197101 Russia

ISBN: (纸本)9781510673151;9781510673144

Extending the depth of field (DOF) of imaging optics is a longstanding challenge in machine vision, microscopy, photography and cinematography. This paper presents a method to extend DOF of camera lenses up to 5 times by using foto-foXXus - multi-focus quasi afocal optics. The foto-foXXus devices are implemented as achromatic aplanatic optical systems installed in front of camera lenses in such a way that the combined optical system has simultaneously several focuses separated along the optical axis. When applied for imaging a scene, such a combined optical system forms along the optical axis several images of each object of the extended DOF. The inevitable decrease in contrast of the common image, resulting from defocusing of some images from the plane of camera sensor (or film), can be enhanced using specific algorithms in the stage of image processing, which is nowadays an obligatory part of image capture in machine vision or microscopy. This method is very effective in capturing black-and-white objects, such as QR-codes, or in computer vision-based robotic arms for detecting the shape and size of objects. Direct measurements of the modulation transfer function (MTF) and through-focus MTF curves for a system consisting of a foto-foXXus and a state-of-the-art machine vision objective confirm the increase in depth of focus of the combined optical system and, consequently, depth of field in the Object space. The paper presents description of the foto-foXXus devices, measurements data of MTF and through-focus MTF-curves using the MTF test bench, as well as examples of imaging real objects demonstrating effective extending depth of field.

关键词： extended depth of field DOF imaging camera optics machine vision microscopy industrial inspection photography cinematography

来源：评论

学校读者我要写书评

暂无评论

Computer vision on X-Ray Data in Industrial Production and Security applications: A Comprehensive Survey

引用

IEEE ACCESS 2023年 11卷 2445-2477页

作者： Rafiei, Mehdi Raitoharju, Jenni Iosifidis, Alexandros Aarhus Univ DIGIT Dept Elect & Comp Engn Aarhus Denmark Univ Jyvaskyla Fac Informat Technol Jyvaskyla 40100 Finland

X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.

关键词： X-ray imaging Security Computer vision Imaging Industrial engineering Three-dimensional displays Deep learning deep learning X-ray industrial applications security applications

来源：评论

学校读者我要写书评

暂无评论

vision-based monitoring of railway superstructure: A review

引用

CONSTRUCTION AND BUILDING MATERIALS 2024年 442卷

作者： Aela, Peyman Cai, Jiafu Jing, Guoqing Chi, Hung-Lin Hong Kong Polytech Univ Dept Bldg & Real Estate Hung Hom Hong Kong Peoples R China Beijing Jiaotong Univ Sch Civil Engn Beijing 100044 Peoples R China

The computer vision-based analysis of railway superstructure has gained significant attention in railway engineering. This approach utilises advanced image processing and machine learning techniques to extract valuable information from visual data captured in the railway track environment. By analysing images from various sources such as cameras, drones, or sensors, computer vision algorithms can accurately detect and classify different components of the ballast superstructure, including the catenary system support, rail surface and profile, fastening system, sleeper, and ballast layer. This enables the automated assessment of the railway track's condition, stability, and maintenance needs. This paper comprehensively reviews the recent advancements, challenges, and potential applications of computer vision techniques in analysing railway superstructure. It discusses various vision-based methodologies and machine-learning approaches utilised in this context. Furthermore, it examines the benefits and limitations of computer vision-based analysis and presents future research directions for improving its applicability in railway track engineering.

关键词： Railway superstructure Track inspection Computer vision machine learning Robotics

来源：评论

学校读者我要写书评

暂无评论

Application of machine vision Based on the Second Law of Thermodynamics in High-Temperature Industrial Inspection

引用

INTERNATIONAL JOURNAL OF HEAT AND TECHNOLOGY 2024年第2期42卷 697-706页

作者： Yuan, Na Ma, Zhuang Gong, Cheng Hou, Xihuan Ji, Zhanlin Tangshan Univ Intelligence & Informat Engn Coll Tangshan 063000 Peoples R China North China Univ Sci & Technol Dept Artificial Intelligence Tangshan 063009 Peoples R China

In modern industrial production, high-temperature environments are commonplace, posing significant challenges to equipment stability, safety, and production efficiency. machine vision, as an effective automated inspection technology, has attracted extensive attention in high-temperature settings. However, the unique conditions of high temperatures, such as significant thermal noise and optical interference, demand enhanced performance from machine vision systems. The second law of thermodynamics provides a theoretical foundation for understanding these challenges, emphasizing the increase of entropy in energy transformation and transfer processes, and guides the design and optimization of machine vision systems in high-temperature environments. This paper aims to comprehensively explore the application of machine vision based on the second law of thermodynamics in high-temperature industrial inspection, focusing on two core issues: the impact of thermodynamic parameters on the performance of machine vision systems and the technology for analyzing high-temperature industrial infrared images using multiscale entropy. By thoroughly analyzing how thermodynamic parameters influence the design and implementation of machine vision systems, and by developing infrared image processing algorithms adapted to high temperatures, this study seeks to enhance the efficiency and accuracy of machine vision technology in high-temperature industrial applications, providing theoretical support and technical guidance for the advancement of intelligent manufacturing.

关键词： machine vision high-temperature industrial inspection entropy infrared image analysis second law of thermodynamics multi-scale entropy

来源：评论

学校读者我要写书评

暂无评论

Dimensional Accuracy Evaluation of Single-Layer Prints in Direct Ink Writing Based on machine vision

引用

SENSORS 2025年第8期25卷 2543-2543页

作者： Tu, Yongqiang Zhang, Haoran Chen, Hu Bao, Baohua Fang, Canmi Wu, Hao Chen, Xinkai Hassan, Alaa Boudaoud, Hakim Jimei Univ Coll Marine Equipment & Mech Engn Xiamen 361021 Peoples R China Univ Lorraine Innovat Proc Res Inst F-54000 Nancy France

The absence of standardized evaluation methodologies for single-layer dimensional accuracy significantly hinders the broader implementation of direct ink writing (DIW) technology. Addressing the critical need for precision non-contact assessment in DIW fabrication, this study develops a novel machine vision-based framework for dimensional accuracy evaluation. The methodology encompasses three key phases: (1) establishment of an optimized hardware configuration with integrated image processing algorithms;(2) comprehensive investigation of camera calibration protocols, advanced image preprocessing techniques, and high-precision contour extraction methods;and (3) development of an iterative closest point (ICP) algorithm-enhanced evaluation system. The experimental results demonstrate that our machine vision system achieves 0.04 mm x 0.04 mm spatial resolution with the ICP convergence threshold optimized to 0.001 mm. The proposed method shows an 80% improvement in measurement accuracy (0.001 mm) compared to conventional approaches. Process parameter optimization experiments validated the system's effectiveness, showing at least 76.3% enhancement in printed layer dimensional accuracy. This non-contact evaluation solution establishes a robust framework for quantitative quality control in DIW applications, providing critical insights for process optimization and standardization efforts in additive manufacturing.

关键词： machine vision dimensional accuracy evaluation single layer direct ink writing

来源：评论

学校读者我要写书评

暂无评论

Product Design Defect Detection and Automatic Repair Algorithm Based on CAD and machine vision

引用

Computer-Aided Design and applications 2024年第S15期21卷 276-289页

作者： Zhu, Zonghua Xiong, Limei School of Arts Jingchu University of Technology Hubei Jingmen448000 China

Producers need to strictly control the quality of their products when facing customer needs, ensuring the qualification rate of the products. The level of product design is not only related to the abilities of designers themselves, but also the role of design tools in product design cannot be underestimated. This article adopts a defect detection algorithm that combines computer-aided design (CAD) and machine vision and learns and identifies different types of product defects by training deep learning models. This algorithm utilizes the geometric information of products in CAD models and image processing methods in machine vision technology to detect and repair surface defects in product design automatically. The results show that it is significantly superior to the wavelet transform method in terms of detection accuracy, stability, and detail capture ability. Input the product design image to be tested into the model, and the model will output information on the type and severity of defects at each location. By combining deep learning and machine vision technology, this algorithm can more accurately detect and locate various types of product defects. © 2024 U-turn Press LLC,.

关键词： Product design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：