Face detection applications using digital photos are critical in the face recognition process. This application is used in biometric recognition systems, search systems, and security systems. Artificial intelligence a...
详细信息
The proceedings contain 52 papers. The topics discussed include: improvement of remote sensing image target detection algorithm based on YOLO v5;A Study of Chan-vese model with the introduction of edge information;rea...
The proceedings contain 52 papers. The topics discussed include: improvement of remote sensing image target detection algorithm based on YOLO v5;A Study of Chan-vese model with the introduction of edge information;real-time monitoring algorithm of muscle state based on sEMG signal;lane detection network with direction context;anomaly pixel detection via dual-branch uncertainty metrics;high precision license plate recognition algorithm in open scene;implementation and design of metro process quality inspection system based on imageprocessing technology;the research on remote sensing image change detection based on deep learning;research on aircraft wheel hub pose detection method based on machinevision;lunar dome detection method based on few-shot object detection;and image enhancement algorithm of foggy sky with sky based on sky segmentation.
vision-language models like CLIP, utilizing class proxies derived from class name text features, have shown a notable capability in zero-shot medical image diagnosis which is vital in scenarios with limited disease da...
详细信息
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted f...
详细信息
ISBN:
(纸本)9781665492577
In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard.
In recent years, there has been a notable surge in the utilization of industrial imageprocessingapplications across various sectors, including automotive, medical, and space industries. These applications rely on sp...
详细信息
In recent years, there has been a notable surge in the utilization of industrial imageprocessingapplications across various sectors, including automotive, medical, and space industries. These applications rely on specialized camera systems and advanced imageprocessing techniques to accurately measure working products with precise tolerances. This research presents a novel fast algorithm for measuring the diameter of a ring, employing a subpixel counting method. The algorithm classifies image pixels into two categories: full pixels and transition pixels. Full pixels reside entirely within the inner region of the workpiece, while transition pixels represent gray pixels that reside at the boundary between the workpiece and its background. To ensure accurate determination of the object area, the proposed method incorporates normalization to account for the contribution of transition pixels alongside full pixels. Subsequently, the circle area equation is employed to calculate the diameter. Moreover, a robust threshold selection method is introduced to effectively distinguish pixels with gray intensities. The experimental setup consists of an industrial camera equipped with telecentric lenses and appropriate illumination. The results demonstrate that the proposed algorithm achieves a 3-10 % improvement in accuracy compared to existing approaches. In terms of measuring sensitivity, the operational sensitivity of the proposed methodology is quantified as 1/20th of the pixel size, exhibiting an average uncertainty of 1 mu m. Furthermore, the proposed method surpasses existing works by at least 12.5 % to 35 % in terms of benchmarking computing time.
The task of image style transfer is to automatically redraw (using neural networks) an image with some content (for example, a family photo) in the style set by another image (for example, a van Gogh painting), which ...
详细信息
This study developed an end-to-end procedure to overcome common issues faced during the analysis of passive infrared thermography (IRT) image sequences from outdoor concrete infrastructures. The processing pipeline in...
详细信息
Autoregressive language modeling (ALM) has been successfully used in self-supervised pre-training in Natural language processing (NLP). However, this paradigm has not achieved comparable results with other self-superv...
详细信息
ISBN:
(纸本)9781577358800
Autoregressive language modeling (ALM) has been successfully used in self-supervised pre-training in Natural language processing (NLP). However, this paradigm has not achieved comparable results with other self-supervised approaches in computer vision (e.g., contrastive learning, masked image modeling). In this paper, we try to find the reason why auto-regressive modeling does not work well on vision tasks. To tackle this problem, we fully analyze the limitation of visual autoregressive methods and proposed a novel stochastic auto-regressive image modeling (named SAIM) by the two simple designs. First, we serialize the image into patches. Second, we employ the stochastic permutation strategy to generate an effective and robust image context which is critical for vision tasks. To realize this task, we create a parallel encoder-decoder training process in which the encoder serves a similar role to the standard vision transformer focusing on learning the whole contextual information, and meanwhile the decoder predicts the content of the current position so that the encoder and decoder can reinforce each other. Our method significantly improves the performance of autoregressive image modeling and achieves the best accuracy (83.9%) on the vanilla viT-Base model among methods using only imageNet-1K data. Transfer performance in downstream tasks also shows that our model achieves competitive performance. Code is available at https://***/qiy20/SAIM.
Edge detection can benefit many different industries and domains, including computer vision, machine learning, image analysis, remote sensing, thermal imaging, pattern recognition, and medical imaging. The technique o...
详细信息
Robust, fast, and low-power hardware platforms are desirable for the implementation of real-time machinevision. Here the authors develop a computing-in-sensor network using ferroelectric photo sensors with remanent-p...
详细信息
Robust, fast, and low-power hardware platforms are desirable for the implementation of real-time machinevision. Here the authors develop a computing-in-sensor network using ferroelectric photo sensors with remanent-polarization-controlled photo responsivities. Nowadays the development of machinevision is oriented toward real-time applications such as autonomous driving. This demands a hardware solution with low latency, high energy efficiency, and good reliability. Here, we demonstrate a robust and self-powered in-sensor computing paradigm with a ferroelectric photosensor network (FE-PS-NET). The FE-PS-NET, constituted by ferroelectric photosensors (FE-PSs) with tunable photoresponsivities, is capable of simultaneously capturing and processingimages. In each FE-PS, self-powered photovoltaic responses, modulated by remanent polarization of an epitaxial ferroelectric Pb(Zr0.2Ti0.8)O-3 layer, show not only multiple nonvolatile levels but also sign reversibility, enabling the representation of a signed weight in a single device and hence reducing the hardware overhead for network construction. With multiple FE-PSs wired together, the FE-PS-NET acts on its own as an artificial neural network. In situ multiply-accumulate operation between an input image and a stored photoresponsivity matrix is demonstrated in the FE-PS-NET. Moreover, the FE-PS-NET is faultlessly competent for real-time imageprocessing functionalities, including binary classification between 'X' and 'T' patterns with 100% accuracy and edge detection for an arrow sign with an F-Measure of 1 (under 365 nm ultraviolet light). This study highlights the great potential of ferroelectric photovoltaics as the hardware basis of real-time machinevision.
暂无评论