As a ubiquitous manipulation tool, optical tweezers are widely used in biochemistry and applied physics, so that a wide range of microscopic and nanoscopic particles could be investigated. In recent years, digital ima...
详细信息
As a ubiquitous manipulation tool, optical tweezers are widely used in biochemistry and applied physics, so that a wide range of microscopic and nanoscopic particles could be investigated. In recent years, digital imageprocessing techniques for improving target particle observation have diversified, leading to the development of numerous automatic tasks. The technique was developed in response to the need for multi-particle manipulation and feature detection. Here we describe how digital imageprocessing can be used to enhance the capabilities of optical manipulation. In particular, cutting-edge imageprocessing techniques that rely on artificial intelligence development are making optical trapping more widely accessible and enabling automatic manipulation of microscopic and nanoscopic particles.
Artificial vision systems will be essential in intelligent machine-visionapplications such as autonomous vehicles, bionic eyes, and humanoid robot eyes. However, conventional digital electronics in these systems face...
详细信息
Artificial vision systems will be essential in intelligent machine-visionapplications such as autonomous vehicles, bionic eyes, and humanoid robot eyes. However, conventional digital electronics in these systems face limitations in system complexity, processing speed, and energy consumption. These challenges have been addressed by biomimetic approaches utilizing optoelectronic synapses inspired by the biological synapses in the eye. Nano- materials can confine photogenerated charge carriers within nano-sized regions, and thus offer significant potential for optoelectronic synapses to perform in-sensor image-processing tasks, such as classifying static multicolor images and detecting dynamic object movements. We introduce recent developments in optoelectronic synapses, focusing on use of photosensitive nanomaterials. We also explore applications of these synapses in recognizing static and dynamic optical information. Finally, we suggest future directions for research on optoelectronic synapses to implement neuromorphic artificial vision.
visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine ta...
详细信息
visual content is increasingly being processed by machines for various automated content analysis tasks instead of being consumed by humans. Despite the existence of several compression methods tailored for machine tasks, few consider real-world scenarios with multiple tasks. In this paper, we aim to address this gap by proposing a task-switchable pre-processor that optimizes input images specifically for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. The proposed task-switchable pre-processor adeptly maintains relevant semantic information based on the specific characteristics of different downstream tasks, while effectively suppressing irrelevant information to reduce bitrate. To enhance the processing of semantic information for diverse tasks, we leverage pre-extracted semantic features to modulate the pixel-to-pixel mapping within the pre-processor. By switching between different modulations, multiple tasks can be seamlessly incorporated into the system. Extensive experiments demonstrate the practicality and simplicity of our approach. It significantly reduces the number of parameters required for handling multiple tasks while still delivering impressive performance. Our method showcases the potential to achieve efficient and effective compression for machinevision tasks, supporting the evolving demands of real-world applications.
In the machinevision-based online monitoring of the flotation process, froth images acquired in real-time are subject to color distortion and excessive bright spots caused by inconsistent illumination, which hinders ...
详细信息
In the machinevision-based online monitoring of the flotation process, froth images acquired in real-time are subject to color distortion and excessive bright spots caused by inconsistent illumination, which hinders the effectiveness of image analysis and further online measurement for operating performance indicators. Current imageprocessing methods struggle to correct color distortion and remove excess bright spots in froth images simultaneously. Therefore, in this article, an illumination domain signal-guided unsupervised generative adversarial network (IDS-GUGAN) is proposed for illumination consistency processing of flotation froth images. First, considering the varying effects of inconsistent illumination on froth images, the illumination domain signal-guided image generation (IDS-GIG) mechanism based on the theory of unsupervised disentangled representation learning is designed to achieve adaptive correction of froth images with varying degrees of distortion. Moreover, a novel lightweight double-closed-loop network architecture is introduced to support unsupervised learning utilizing unpaired froth images and improve computational efficiency, which makes the proposed approach highly suitable for industrial applications. Comprehensive experiments on a real tungsten cleaner flotation process dataset and two public benchmark datasets related to image illumination processing tasks consistently endorse the superiority of IDS-GUGAN.
At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machinevision analytics and may require occasional human viewing. ...
详细信息
At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machinevision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machinevision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machinevision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machinevision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.
In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges ...
详细信息
ISBN:
(纸本)9798350350494;9798350350500
In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges arising from the lack of spatial information in these labels by focusing on facilitating supervised learning through the generation of pseudolabels from class activation maps (CAMs). Due to the localized pattern detection of Convolutional Neural Networks (CNNs), CAMs often emphasize only the most discriminative parts of an object, making it challenging to accurately distinguish foreground objects from each other and the background. Recent studies have shown that vision Transformer (viT) features, due to their global view, are more effective in capturing the scene layout than CNNs. However, the use of hierarchical viTs has not been extensively explored in this field. This work explores the use of Swin Transformer by proposing "SWTformer" to enhance the accuracy of the initial seed CAMs by bringing local and global views together. SWTformer-v1 generates class probabilities and CAMs using only the patch tokens as features. SWTformer-v2 incorporates a multi-scale feature fusion mechanism to extract additional information and utilizes a background-aware mechanism to generate more accurate localization maps with improved cross-object discrimination. Based on experiments on the PascalvOC 2012 dataset, SWTformer-v1 achieves a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields comparable performance by 0.82% mIoU on average higher than other methods in generating initial localization maps, depending only on the classification network. SWTformer-v2 further improves the accuracy of the generated seed CAMs by 5.32% mIoU, further proving the effectiveness of the local-to-global view provided by the Swin transformer. Code available at: https://***/RozhanAhmadi/SWTformer
Photoadaptive synaptic devices enable in-sensor processing of complex illumination scenes, while second-order adaptive synaptic plasticity improves learning efficiency by modifying the learning rate in a given environ...
详细信息
Photoadaptive synaptic devices enable in-sensor processing of complex illumination scenes, while second-order adaptive synaptic plasticity improves learning efficiency by modifying the learning rate in a given environment. The integration of above adaptations in one phototransistor device will provide opportunities for developing high-efficient machinevision system. Here, a dually adaptable organic heterojunction transistor as a working unit in the system, which facilitates precise contrast enhancement and improves convergence rate under harsh lighting conditions, is reported. The photoadaptive threshold sliding originates from the bidirectional photoconductivity caused by the light intensity-dependent photogating effect. Metaplasticity is successfully implemented owing to the combination of ambipolar behavior and charge trapping effect. By utilizing the transistor array in a machinevision system, the details and edges can be highlighted in the 0.4% low-contrast images, and a high recognition accuracy of 93.8% with a significantly promoted convergence rate by about 5 times are also achieved. These results open a strategy to fully implement metaplasticity in optoelectronic devices and suggest their visionprocessingapplications in complex lighting scenes. Organic heterojunction transistors are designed to integrate light intensity-adaptive threshold sliding and second-order adaptive metaplasticity. The unique dual adaptability enables the highlighting of 0.4% low-contrast images, and the efficient recognition can be achieved benefiting from the learning rate changes in the backpropagation process. image
image captioning generates textual description from the corresponding input image with the help of computer vision and natural language processing. In recent years, deep learning approaches have shown promise in image...
详细信息
image captioning generates textual description from the corresponding input image with the help of computer vision and natural language processing. In recent years, deep learning approaches have shown promise in image captioning. This research introduces a novel image captioning architecture comprising a dual self-attention fused encoder-decoder framework. The vGG16 Hybrid Places 1365 (v16HP1365) encoder captures diverse visual features from images, enhancing the quality of image representations. In this article, the Gated Recurrent Unit (GRU) is considered as a decoder for conducting word-level language modeling. Additionally, the dual self-attention network embedded in the architecture allows for capturing contextual image information to provide accurate content descriptions and relationship understanding. Experimental evaluations on the COCO dataset showcase superior performance, surpassing existing methods in terms of captioning quality metrics. This approach holds potential for applications such as aiding the visually impaired and advancing content retrieval. Future work aims to extend the model to support multilingual captioning.
This paper introduces a high dynamic range pixel for early visionprocessing. Early vision is the first stage to subsequently extract semantic information for imageprocessing or video analytics. This paper proposes t...
详细信息
This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinc...
详细信息
This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinctive colours indicative of their pH levels, were transformed into standardized 200x200-pixel images through the application of imageprocessing techniques. Following this, a pH prediction model was constructed using the Adaptive Boosting regressor algorithm. The pH values of the training data used when training the model were distributed irregularly between 0-14. The models were trained with 94 pictures and 1880 experimental values. In addition, a reliable pre-processing part has been placed into the model using imageprocessing techniques, allowing test data to be obtained in any desired environment. The obtained training and test data were separated from noise parameters, affecting the prediction results negatively. A smartphone application based on the model has been developed and made available to everyone. This innovative methodology bridges the gap between traditional pH measurement techniques and computer vision, offering amore accessible and eco-friendly means of pH assessment. The practical applications of this research extend to various fields, including environmental monitoring, agriculture, and educational settings.
暂无评论