Signal capture is at the forefront of perceiving and understanding the environment;thus, imaging plays a pivotal role in mobile vision. Recent unprecedented progress in artificial intelligence (AI) has shown great pot...
详细信息
Signal capture is at the forefront of perceiving and understanding the environment;thus, imaging plays a pivotal role in mobile vision. Recent unprecedented progress in artificial intelligence (AI) has shown great potential in the development of advanced mobile platforms with new imaging devices. Traditional imaging systems based on the "capturing images first and processing afterward" mechanism cannot meet this explosive demand. On the other hand, computational imaging (CI) systems are designed to capture high-dimensional data in an encoded manner to provide more information for mobile vision systems. Thanks to AI, CI can now be used in real-life systems by integrating deep learning algorithms into the mobile vision platform to achieve a closed loop of intelligent acquisition, processing, and decision-making, thus leading to the next revolution of mobile vision. Starting from the history of mobile vision using digital cameras, this work first introduces the advancement of CI in diverse applications and then conducts a comprehensive review of current research topics combining CI and AI. Although new-generation mobile platforms, represented by smart mobile phones, have deeply integrated CI and AI for better image acquisition and processing, most mobile vision platforms, such as self-driving cars and drones only loosely connect CI and AI, and are calling for a closer integration. Motivated by this fact, at the end of this work, we propose some potential technologies and disciplines that aid the deep integration of CI and AI and shed light on new directions in the future generation of mobile vision platforms.
Facial recognition is a widely-used process that aims to detect and verify an individual's identity. This technique is employed in various applications, such as image and video analysis, surveillance, and security...
详细信息
Background: Fermented foods are products processed through microbial fermentation and are widely appreciated by consumers around the world for their unique flavors. With advancements in industrial technology and incre...
详细信息
Background: Fermented foods are products processed through microbial fermentation and are widely appreciated by consumers around the world for their unique flavors. With advancements in industrial technology and increasing consumer demand, modern techniques are being progressively integrated into the production and quality control of fermented foods to enhance production efficiency and product quality. Among these innovations, computer vision technology stands out as particularly impactful. Scope and approach: This paper provides an overview of the applications of computer vision in the field of fermented foods, focusing on its technical algorithms and applications within the food industry. It outlines the specific uses of computer vision technology across different types of fermented foods and discusses the relevant techniques employed. Finally, this review highlights the transformative potential of adaptive learning and multimodal fusion in addressing current limitations of computer vision for fermented food monitoring. Key findings and conclusions: The adoption of computer vision technology has significantly improved both the efficiency and accuracy of quality control processes in fermented food production. Through non-contact real-time monitoring, researchers can quickly identify the dynamic changes in microorganisms and related parameter indicators during fermentation and evaluate their impact on food quality. These technologies have not only boosted the efficiency of fermented food production but have also enhanced control over product flavor and safety assessments. Despite ongoing challenges in technology implementation and data analysis, the continuous advancements in deep learning and imageprocessing technologies are expected to increase the impact of computer vision in the field of fermented foods, driving sustainable industry development.
The exceptional ability of optical metasurfaces to manipulate light has enabled integrated analog computing and imageprocessing in ultracompact, energy-efficient platforms that support high speeds. To date, metasurfa...
详细信息
The exceptional ability of optical metasurfaces to manipulate light has enabled integrated analog computing and imageprocessing in ultracompact, energy-efficient platforms that support high speeds. To date, metasurfaces have demonstrated various analog processing functions, including differentiation, convolution, and classification. However, a fundamental limitation of existing designs is their static functionality, which restricts adaptability to diverse application scenarios. To address this challenge, momentum-space reconfigurable metasurfaces operating in the near-infrared range are experimentally demonstrated, capable of switchable imageprocessing functions including image differentiation and bright-field imaging. These meta-devices are achieved by integrating nematic liquid crystals with silicon metasurfaces that support resonances of quasi-bound states in the continuum (quasi-BICs). The quasi-BIC modes enable further design freedom over the angular dispersion of metasurfaces. The results showcase an electrically tunable, CMOS-compatible approach to reconfigurable optical computing, offering significant potential for applications such as online training of diffractive neural networks, machinevision, and augmented reality.
The use of machinevision and deep learning for intelligent industrial inspection has become increasingly important in automating the production processes. Despite the fact that machinevision approaches are used for ...
详细信息
The use of machinevision and deep learning for intelligent industrial inspection has become increasingly important in automating the production processes. Despite the fact that machinevision approaches are used for industrial inspection, deep learning-based defect segmentation has not been widely studied. While state-of-the-art segmentation methods are often tuned for a specific purpose, extending them to unknown sets or other datasets, such as defect segmentation datasets, require further analysis. In addition, recent contributions and improvements in image segmentation methods have not been extensively investigated for defect segmentation. To address these problems, we conducted a comparative experimental study on several recent state-of-the-art deep learning-based segmentation methods for steel surface defect segmentation and evaluated them on the basis of segmentation performance, processing time, and computational complexity using two public datasets, NEU-Seg and Severstal Steel Defect Detection (SSDD). In addition we proposed and trained a hybrid transformer-based encoder with CNN-based decoder head and achieved state-of-the-art results, a Dice score of 95.22% (NEU-Seg) and 95.55% (SSDD).
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of token...
详细信息
ISBN:
(纸本)9783031434143;9783031434150
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.
Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitorin...
详细信息
ISBN:
(纸本)9781510666931;9781510666948
Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.
This paper proposes image style transfer with shape preservation for gaze estimation. While several shape preservation constraints are proposed, we present additional shape preservation constraints using (i) dense pix...
详细信息
ISBN:
(纸本)9784885523434
This paper proposes image style transfer with shape preservation for gaze estimation. While several shape preservation constraints are proposed, we present additional shape preservation constraints using (i) dense pixelwise correspondences between the original and its transferred images and (ii) task-driven learning using gaze estimation error for directly improving gaze direction estimation. A variety of experiments with other SOTA methods, publicly-available datasets, and ablation studies validate the effectiveness of our method.
In recent years, the rapid development of computer vision and artificial intelligence has significantly advanced agricultural applications, particularly in the quality detection and grading of navel oranges. This revi...
详细信息
Computer vision techniques have immense potential for materials design applications. In this work, we introduce an integrated and general-purpose Atomvision library that can be used to generate and curate microscopy i...
详细信息
Computer vision techniques have immense potential for materials design applications. In this work, we introduce an integrated and general-purpose Atomvision library that can be used to generate and curate microscopy image (such as scanning tunneling microscopy and scanning transmission electron microscopy) data sets and apply a variety of machine learning techniques. To demonstrate the applicability of this library, we (1) establish an atomistic image data set of about 10 000 materials with large structural and chemical diversity, (2) develop and compare convolutional and atomistic line graph neural network models to classify the Bravais lattices, (3) demonstrate the application of fully convolutional neural networks using U-Net architecture to pixelwise classify atom versus background, (4) use a generative adversarial network for super resolution, (5) curate an image data set on the basis of natural language processing using an open-access arXiv data set, and (6) integrate the computational framework with experimental microscopy images for Rh, Fe3O4, and SnS systems. The Atomvision library is available at https://***/ usnistgov/atomvision.
暂无评论