image categorization is a fundamental task in computer vision, with applications in domains such as object recognition, medical imaging, and autonomous systems. Traditional approaches frequently fail to balance accura...
详细信息
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes t...
详细信息
ISBN:
(数字)9798350365504
ISBN:
(纸本)9798350365511
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes to bring said processing to the focal plane, next to a high dynamic range image sensor working on the principle of lateral overflow capacitor. This brings the benefits of processing scenes with a wide dynamic range in a power efficient manner. Circuit simulations for edge detection, as an example of early visionprocessing conveyed in this paper, show that our proposal meets the accuracy typically found in applications like machinevision. Simulations are in XFAB’s XS018 technology.
This study proposes a robust computer vision-based system for autonomous vehicle identification and tracking, utilizing OpenCv with Python for real-time imageprocessing. To precisely identify cars and bikes, the syst...
详细信息
Assessing the quality of pansharpened images is a critical issue in order to obtain a quantitative score to represent the quality and compare the performance of different fusion methods. Most of the introduced metrics...
详细信息
This research delves into deep learning and machinevisionapplications for plant leaf disease detection in agricultural settings, focusing on farm village datasets. Utilizing a blend of authentic farm village data an...
详细信息
The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (vLMs). These models aim to bridge the gap between tex...
详细信息
ISBN:
(纸本)9798891760615
The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (vLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as these models become larger and more complex, they also become more challenging to train and deploy. One approach to addressing this challenge is the use of sparsely-gated mixture-of-experts (MoE) techniques, which divide the model into smaller, specialized submodels that can jointly solve a task. In this paper, we explore the effectiveness of MoE in scaling vision-language models, demonstrating its potential to achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vLMs. We hope our work will inspire further research into the use of MoE for scaling large-scale vision-language models and other multimodal machine learning applications.
In the robot application system incorporating dexterous hand, a vision-based robot grasping system is proposed to address the lack of robustness of dexterous hand in grasping fixed attitude objects. First, a 6DOF robo...
详细信息
In this digital era, social media is one of the key platforms for collecting customer feedback and reflecting their views on various aspects, including products, services, brands, events, and other topics of interest....
详细信息
In this digital era, social media is one of the key platforms for collecting customer feedback and reflecting their views on various aspects, including products, services, brands, events, and other topics of interest. However, there is a rise of sarcastic memes on social media, which often convey contrary meaning to the implied sentiments and challenge traditional machine learning identification techniques. The memes, blending text and visuals on social media, are difficult to discern solely from the captions or images, as their humor often relies on subtle contextual cues requiring a nuanced understanding for accurate interpretation. Our study introduces Offensive images and Sarcastic Memes Detection to address this problem. Our model employs various techniques to identify sarcastic memes and offensive images. The model uses Optical Character Recognition (OCR) and bidirectional long-short term memory (Bi-LSTM) for sarcastic meme detection. For offensive image detection, the model employs Autoencoder LSTM, deep learning models such as Densenet and mobilenet, and computer vision techniques like Feature Fusion Process (FFP) based on Transfer Learning (TL) with image Augmentation. The study showcases the effectiveness of the proposed methods in achieving high accuracy in detecting offensive content across different modalities, such as text, memes, and images. Based on tests conducted on real-world datasets, our model has demonstrated an accuracy rate of 92% on the Hateful Memes Challenge dataset. The proposed methodology has also achieved a Testing Accuracy (TA) of 95.7% for Densenet with transfer learning on the NPDI dataset and 95.12% on the Pornography dataset. Moreover, implementing Transfer Learning with a Feature Fusion Process (FFP) has resulted in a TA of 99.45% for the NPDI dataset and 98.5% for the Pornography dataset.
By the latest method, wafers of semiconductors have been sliced very thin for manufacturing efficiency, and the manufacturing process of stacking various thin films has been used. In order to measure such a thin film ...
详细信息
By the latest method, wafers of semiconductors have been sliced very thin for manufacturing efficiency, and the manufacturing process of stacking various thin films has been used. In order to measure such a thin film during the semiconductor manufacturing process, an Elipsometer, a non-destructive optical device, is used. Ellipsometer analyzes the thin film by checking the change in the polarization state of the incident light after the light irradiated to the wafer surface is reflected from the incident surface. However, thinly sliced wafers are often bent during the manufacturing process, so in industrial sites Therefore, it was difficult to efficiently measure the thin film by maintaining an accurate optical state. Accordingly, this study analyzed data based on the image of machinevision and compared algorithms that efficiently enable precise measurement on vented wafers by using it and changing the Z axis. Thus, we propose a focusing optimizing algorithm based on machinevisionimageprocessing and evaluate the data and features to support it, and we open data sets and algorithm codes that can prove this process in GitHub repository(1). In addition, the efficiency of these algorithms was interpreted through simulation figures, and through this, an optical system capable of precise measurement applying a method of efficiently moving the Z-axis is proposed. (c) 2023 The Authors. Published by Elsevier B.v.
The Pix2Pix architecture is widely used for image colourisation. This is the problem of transforming a greyscale image into a realistic colour image. However, the canonical Pix2Pix colourisation model uses batch norma...
详细信息
暂无评论