images are a vital part of our everyday life and imageprocessing is the heart of all the modern technologies, including machinevision, artificial intelligence, robotics, deep learning. It would not be wrong to say t...
images are a vital part of our everyday life and imageprocessing is the heart of all the modern technologies, including machinevision, artificial intelligence, robotics, deep learning. It would not be wrong to say that imageprocessing is one of the many reasons for achieving success in any industrial domain, whether it be medical, food, textile, or any other automation industry. It is next to impossible to work in these domains without having sufficient knowledge and skills about imageprocessing techniques. In this thesis document you will find the significance of imageprocessing used in three diverse projects. Each one of the projects is described as a separate chapter in this document. The first project is focused on reducing the power consumption in OLED-based devices. Actually there are two main goals of this project, first one, as the name suggests, is to minimize the power consumed by an OLED device to display images, and the second goal is to simultaneously enhance the color contrasts in images. OLED display panels have become increasingly popular in recent years, thanks to their numerous advantages over the traditional LCD displays. Power consumption in OLED displays depends on the contents where as the backlight is responsible for power consumption in LCD displays. This image-dependent or content-dependent power consumption model of OLED displays have encouraged numerous researchers to create possibilities for reducing the power consumption in OLED-based devices. One such possibility has been explored in this Ph. D. research work. Another industrial application has been presented in the second part of the thesis document. It is a part of the "Food Digital Monitoring" project, funded by Regione Piemonte. The major aim of this project is to identify the healthy and contaminated hazelnuts by using fluorescence and spectral imaging techniques. Two types of contamination are discussed in this work, one, caused by bacterial and fungal infections, called "rot
This study presents a vision-based closed-loop tracking system designed specifically for robotic laser beam welding of curved and closed square butt joints. The proposed system is compared against 11 existing solution...
详细信息
This study presents a vision-based closed-loop tracking system designed specifically for robotic laser beam welding of curved and closed square butt joints. The proposed system is compared against 11 existing solutions reported in the literature, which employ various sensor principles for the same application. The system employs a non-contact, non-intrusive machinevision approach, seamlessly integrated into the laser beam welding head to mitigate challenges associated with sensor forerun. Key features include an off-axis LED illumination, an optical filter, and a movable actuator, facilitating real-time imageprocessing and closed-loop control during the welding process. Experimental validation was conducted on stainless-steel plates with complex closed square butt joints. The system achieved a mean absolute joint-to-beam offset of 0.14 mm across four test cases, with a maximum offset of 0.85 mm, demonstrating its robustness and precision. Comparative analysis underscores the proposed method's advantages, showcasing its potential for industrial applications in laser beam welding of geometrically challenging joints.
Human skin classification is an essential task for several machinevisionapplications such as human -machine interfaces, people/object tracking, and classification. In this paper, we describe a hybrid CMOS/memristor ...
详细信息
Human skin classification is an essential task for several machinevisionapplications such as human -machine interfaces, people/object tracking, and classification. In this paper, we describe a hybrid CMOS/memristor vision sensor architecture embedding skin detection over a wide dynamic range. In -sensor RGB to rg -chromaticity colorspace conversion is executed on -the -fly through a pixel -level automatic exposure time control. Each pixel of the array delivers two pre -filtered analog signals, the r and g values, suitable for being efficiently classified as skin or non -skin through an analog memristive neural network (NN), without the need for any further signal processing. Moreover, we study the NN performance and theorize how it should be added in the hardware. The skin classifier is organized in an array of column -level memristor-based NN to exploit the nano -scale device characteristics and non-volatile analog memory capabilities, making the proposed sensor architecture highly flexible, customizable for various use -case scenarios, and low -power. The output is a skin bitmap that is robust against variations of the illuminant color and intensity. (c) 2024 Optica Publishing Group
The contemporary industry has witnessed a significant transformative development with the integration of artificial intelligence (AI) in various industrial systems, resulting in an enhanced automation for heightened p...
详细信息
The contemporary industry has witnessed a significant transformative development with the integration of artificial intelligence (AI) in various industrial systems, resulting in an enhanced automation for heightened productivity and efficiency. However, mastering this level of automation can be challenging for some applications, such as manufacturing inspection, which can be delicate while maintaining a precise cadence for an in-line manufacturing scale. In this paper, a systematic machinevision-based approach for on-machine inspection is proposed in order to automate and improve inspection process towards computer numerical control (CNC) machined parts. The approach incorporates remapping algorithm and imageprocessing operations to accurately extract desired features. Subsequently, these features will undergo dimensional inspection based on their generated point clouds. Tests were applied on a sample part using a complementary metal-oxide-semiconductor (CMOS) camera mounted on the spindle of 5-axis CNC machining center. The paper explores numerous aspects related to different stages of the approach and their impact on the resulting inspected features evaluations. It also highlights significant findings regarding critical factors for conducting well-structured experiments at various stages. Promising results have shown the significance of the presented work regarding industrial automation technology, ultimately improving manufacturing efficiency throughout the production line.
Conventional photography can only provide a two-dimensional image of the scene, whereas emerging imaging modalities such as light field enable the representation of higher dimensional visual information by capturing l...
详细信息
Conventional photography can only provide a two-dimensional image of the scene, whereas emerging imaging modalities such as light field enable the representation of higher dimensional visual information by capturing light rays from different directions. Light fields provide immersive experiences, a sense of presence in the scene, and can enhance different vision tasks. Hence, research into light field processing methods has become increasingly popular. It does, however, come at the cost of higher data volume and computational complexity. With the growing deployment of machine-learning and deep architectures in imageprocessingapplications, a paradigm shift toward learning-based approaches has also been observed in the design of light field processing methods. Various learning-based approaches are developed to process the high volume of light field data efficiently for different vision tasks while improving performance. Taking into account the diversity of light field vision tasks and the deployed learning-based frameworks, it is necessary to survey the scattered learning-based works in the domain to gain insight into the current trends and challenges. This paper aims to review the existing learning-based solutions for light field imaging and to summarize the most promising frameworks. Moreover, evaluation methods and available light field datasets are highlighted. Lastly, the review concludes with a brief outlook for future research directions.
The article focuses on the concepts of Cell image Segmentation (CIS) and the gradual introduction of cell counting. Motivated by the rapid development of machine learning (ML) methods, which is carried out in this inv...
详细信息
The article focuses on the concepts of Cell image Segmentation (CIS) and the gradual introduction of cell counting. Motivated by the rapid development of machine learning (ML) methods, which is carried out in this investigation. ML is evolving from theory to practical applications, with deep neural network models extensively used in academia and business for various applications, including image counting and natural language processing. These advancements can greatly influence medical imaging technologies, data processing, diagnostics, and healthcare in general. Main objectives of the research are to provide an overview of biological cell counting methods in microscopic images and to explore deep learning (DL)-based image segmentation approaches. The study expertly describes current trends, cutting-edge learning technologies, and platforms utilized for DL approaches. Cell counting is one of the most researched and challenging subjects in computer vision systems. Academics are increasingly interested in this area due to its real-time applications in biology, biochemistry, medical diagnostics, computer vision-based cell tracking systems for large populations, and stem cell manufacturing. Counting cells in the biological field is beneficial. For instance, the ratio of white blood cells to cancer cells in the blood can help determine the origin of a disease. Biologists also need to count cells within cell cultures to monitor the time-dependent growth of cells during bacterial experiments. Numerous methods for cell counting have been developed, after addressing the challenges with Cell Counting algorithms;the article explores promising future directions in CIS and cell counting research fields.
Soil erosion, primarily driven by water and wind, poses a significant environmental challenge globally, leading to land degradation and geo-hazards. Despite various empirical methods, image analysis, and machine learn...
详细信息
Soil erosion, primarily driven by water and wind, poses a significant environmental challenge globally, leading to land degradation and geo-hazards. Despite various empirical methods, image analysis, and machine learning techniques employed to address this issue, effective mitigation tools remain lacking. This study presents an innovative framework integrating imageprocessing (IP) and machine learning (ML) to enhance the understanding, quantification, and prediction of soil erosion processes. Laboratory flume experiments were conducted to capture erosion images, which were pre-processed using techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE) to improve image quality. Supervised ML models, including Logistic Regression (LR), K-Nearest Neighbor (KNN), Support Vector machine (SVM), Decision Tree (DT), and Random Forest (RF), were applied to classify eroded and non-eroded soil areas. The model's performance was rigorously evaluated using metrics such as precision, recall, and F1-score. The results demonstrated that KNN and RF outperformed other models in predicting soil erosion, with KNN exhibiting the least variation (2.39%) compared to the reference erosion profile. This study underscores the potential of an IP and ML ensemble framework for precise soil erosion quantification and prediction, offering practical applications for erosion mitigation. The open-source code and dataset are available at https://***/mlgeotech/***.
Lying in the cross-section of computer vision and natural language processing, vision language models are capable of processingimages and text at once. These models are helpful in various tasks: text generation from ...
详细信息
Lying in the cross-section of computer vision and natural language processing, vision language models are capable of processingimages and text at once. These models are helpful in various tasks: text generation from image and vice versa, image-text retrieval, or visual navigation. Besides building a model trained on a dataset for a task, people also study general-purpose models to utilize many datasets for multitasks. Their two primary applications are image captioning and visual question answering. For English, large datasets and foundation models are already abundant. However, for Vietnamese, they are still limited. To expand the language range, this work proposes a pretrained general-purpose image-text model named VisualRoBERTa. A dataset of 600k images with captions (translated MS COCO 2017 from English to Vietnamese) is introduced to pretrain VisualRoBERTa. The model's architecture is built using Convolutional Neural Network and Transformer blocks. Fine-tuning VisualRoBERTa shows promising results on the VivQA dataset with 34.49% accuracy, 0.4173 BLEU 4, and 0.4390 RougeL (in visual question answering task), and best outcomes on the sViIC dataset with 0.6685 BLEU 4, 0.6320 RougeL (in image captioning task).
Privacy is a crucial concern in collaborative machinevision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machinevision model do...
详细信息
Privacy is a crucial concern in collaborative machinevision where a part of a Deep Neural Network (DNN) model runs on the edge, and the rest is executed on the cloud. In such applications, the machinevision model does not need the exact visual content to perform its task. Taking advantage of this potential, private information could be removed from the data insofar as it does not significantly impair the accuracy of the machinevision system. In this paper, we present an autoencoder-style network integrated within an object detection pipeline, which generates a latent representation of the input image that preserves task-relevant information while removing private information. Our approach employs an adversarial training strategy that not only removes private information from the bottleneck of the autoencoder but also promotes improved compression efficiency for feature channels coded by conventional codecs like VVC-Intra. We assess the proposed system using a realistic evaluation framework for privacy, directly measuring face and license plate recognition accuracy. Experimental results show that our proposed method is able to reduce the bitrate significantly at the same object detection accuracy compared to coding the input images directly, while keeping the face and license plate recognition accuracy on the images recovered from the bottleneck features low, implying strong privacy protection. Our code is available at https://***/bardia-az/ppa-code.
The specular reflection of objects is an important factor affecting image display quality, which poses challenges to tasks such as pattern recognition and machinevision detection. At present, specular removal for a s...
详细信息
The specular reflection of objects is an important factor affecting image display quality, which poses challenges to tasks such as pattern recognition and machinevision detection. At present, specular removal for a single real image is a crucial pre-processing step to improve the performance of computer vision algorithms. Despite notable approaches tailored for handling synthesized and pre-simplified images with dark backgrounds, real-time separation of specular reflection for a single real image remains a challenging problem. This paper proposes a novel specular removal method to separate the specular reflection for a single real image accurately and efficiently based on the dark channel prior. Initially, a modified-specular-free (MSF) image is developed using the dark channel prior, which can derive a direct estimation of specular reflection. Next, the image chromaticity spaces are established to represent the pixel intensity. Then, the maximum chromaticity value of the modified MSF image is extracted to guide the filtering of the specular reflection, treating the specular pixels as noise in the chromaticity space. Finally, the image without specular reflection can be obtained using the restored maximum chromaticity value based on the dichromatic reflection model. The superiority of this method is to achieve highquality specular reflection separation quickly without destroying the geometric features of the real image. Compared with the state-of-the-art methods, experimental results show that the proposed algorithm can achieve the best subjective visual effect and satisfactory quantitative performance. In addition, this approach can be implemented efficiently to meet real-time requirements, promising to be applied to computer vision measurement and inspection applications.
暂无评论