During machinevisual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In...
详细信息
During machinevisual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy environments, such as a thermal imaging system, however, the neural performance faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception, as benchmarked with MNIST classification. A quantitative analysis of the relationship between signal concentration and noise robustness supports our idea with its practical implementation in an incoherent imaging system. This compute-first detection scheme can advance infrared machinevision technologies for industrial and defense applications.
Hand gesture recognition (HGR) is a convenient and natural form of human-computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this...
详细信息
Hand gesture recognition (HGR) is a convenient and natural form of human-computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this paper gives an overview focused on device-free HGR. That means we evaluate HGR systems that do not require the user to wear something like a data glove or hold a device. HGR systems are explored regarding technology, hardware, and algorithms. The interconnectedness of timing and power requirements with hardware, pre-processing algorithm, classification, and technology and how they permit more or less granularity, accuracy, and number of gestures is clearly demonstrated. Sensor modalities evaluated are WIFI, vision, radar, mobile networks, and ultrasound. The pre-processing technologies stereo vision, multiple-input multiple-output (MIMO), spectrogram, phased array, range-doppler-map, range-angle-map, doppler-angle-map, and multilateration are explored. Classification approaches with and without ML are studied. Among those with ML, assessed algorithms range from simple tree structures to transformers. All applications are evaluated taking into account their level of integration. This encompasses determining whether the application presented is suitable for edge integration, their real-time capability, whether continuous learning is implemented, which robustness was achieved, whether ML is applied, and the accuracy level. Our survey aims to provide a thorough understanding of the current state of the art in device-free HGR on edge devices and in general. Finally, on the basis of present-day challenges and opportunities in this field, we outline which further research we suggest for HGR improvement. Our goal is to promote the development of efficient and accurate gesture recognition systems.
Optimizers play important roles in enhancing the performance of a deep network. A study on different optimizers is necessary to understand the effect of optimizers on the performance of the deep network for a given ta...
详细信息
Optimizers play important roles in enhancing the performance of a deep network. A study on different optimizers is necessary to understand the effect of optimizers on the performance of the deep network for a given target task, such as image classification. Several attempts were made to investigate the effect of optimizers on the performance of CNNs. However, such experiments have not been carried out on vision transformers (viT), despite the recent success of viT in various imageprocessing tasks. In this paper, we conduct exhaustive experiments with viT using different optimizers. In our experiments, we found that weight decoupling and weight decay in optimizers play important roles in training viT. We focused on the concept of weight decoupling and tried different variations of it to investigate to what extent weight decoupling is beneficial for a viT. We propose two techniques that provide better results than weight-decoupled optimizers: (i) The weight decoupling step in optimizers involves a linear update of the parameter with weight decay as the scaling factor. We propose a quadratic update of the parameter which involves using a linear as well as squared parameter update using the weight decay as the scaling factor. (ii) We propose using different weight decay values for different parameters depending on the gradient value of the loss function with respect to that parameter. A smaller weight decay is used for parameters with a higher gradient value and vice versa. image classification experiments are conducted over CIFAR-100 and TinyimageNet datasets to observe the performance of these proposed methods with respect to state-of-the-art optimizers such as Adam, RAdam, and AdaBelief. The code is available at https://***/Hemanth-Boyapati/Adaptive-weight-decay-optimizers.
Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machinevisionapplications. Many deep learning based FER models have been dev...
详细信息
Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machinevisionapplications. Many deep learning based FER models have been developed and shown good results on emotion recognition. However, FER using deep learning still suffering from illumination conditions, noise around the face such as hair, background, and other ambience conditions. To mitigate such issues and improve the performance of FER, we propose Enhanced Face Localization augmented Light Convolution Neural Network (EFL-LCNN). EFL-LCNN incorporates three phase pre-processing and Light CNN, a trimmed vGG16 model. Three phase pre-processing includes face detection, enhanced face region cropping for ambience noise removal and image enhancement using CLAHE for addressing illumination problems. Three phase pre-processing is followed by the implementation of Light CNN to improve FER performance with reduced complexity. The EFL-LCNN is rigorously tested on four publicly available benchmark datasets: JAFFE, CK, MUG and KDEF. It is observed from the empirical results that the EFL-LCNN boosted recognition accuracies significantly when compared with the state-of-the-art.
The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption...
详细信息
The automatic development of meaningful, detailed textual descriptions for supplied images is a difficult task in the fields of computer vision and natural language processing. As a result, an AI-powered image caption generator can be incredibly useful for producing captions. In this study, we present a unique method for creating picture captions utilizing an attention mechanism that concentrates on pertinent areas of the image while it creates captions. On benchmark datasets, our model, which uses deep neural networks to extract picture attributes and produce captions, obtains state-of-the-art results, confirming the effectiveness of the attention mechanism in raising the caliber of the generated captions. We also offer a thorough evaluation of the performance of our approach and talk about potential future directions for enhancing image caption generation.
Segmentation has been a rooted area of research having diverse dimensions. The roots of image segmentation and its associated techniques have supported computer vision, pattern recognition, imageprocessing, and it ho...
详细信息
Segmentation has been a rooted area of research having diverse dimensions. The roots of image segmentation and its associated techniques have supported computer vision, pattern recognition, imageprocessing, and it holds variegated applications in crucial domains. To compile the vast literature on machine learning and deep learning-based segmentation techniques and proffer statistical, comprehensive, semi-automated, and application-specific analysis, which could contribute to the ongoing research. 16,674 studies have been filtered out from the pool of 22,088 studies collocated by executing a search string on the Scopus database. These studies are analyzed for their meta-data, comprehensive content and reviewed to identify key research areas using the topic modeling-based method (LDA). Also, the segmentation role for mathematical expression recognition has been fathomed out. IEEE is a ubiquitous name in the terms of the renowned publisher, reputed journal (IEEE Access), and most cited affiliation (#10,472). Three out of five extracted topic solutions by the LDA model be evidence of streaming research areas in image segmentation. Medical imageprocessing, machinevision and Object Identification are the accentuated domains in the context. The streamlining of comprehensive analysis puts forth neural network-based approaches as a trend. Inquisition of segmentation techniques for mathematical expressions articulate neural-based segmentation techniques (CNN, RNN, LSTM) as preeminent segmentation techniques and geometrical features as focused features of the process. To sum up, the purpose of the current study is to summarize the best available research on image segmentation after synthesizing the results of an assorted set of studies.
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), c...
详细信息
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.
As a key factor in the milling process, the wear status of the milling cutter has a significant impact on the machining quality of the workpiece. To detect wear on a milling machine efficiently and precisely, this pap...
详细信息
ISBN:
(纸本)9798350363272;9798350363265
As a key factor in the milling process, the wear status of the milling cutter has a significant impact on the machining quality of the workpiece. To detect wear on a milling machine efficiently and precisely, this paper presents the development of a milling machine wear detection system based on machinevision and digital imageprocessing. The system including link mechanisms and industrial camera is designed for auxiliary localization and collection of on-machineimages of milling cutter status. The image preprocessing method based on automatic threshold segmentation and Canny edge detection operator is proposed to identify the edge of cutter wear. The Maximum connected domains algorithm is used to screen the wear area of the milling cutter and the amount of wear is obtained based on a calibrated scaling method. Experimental results show that the proposed system is suitable for industrial use due to its rapid detection speed and strong recognition accuracy, which are desirable for engineering applications.
The use of high-altitude remote sensing (RS) data from aerial and satellite platforms presents considerable challenges for agricultural monitoring and crop yield estimation due to the presence of noise caused by atmos...
详细信息
The use of high-altitude remote sensing (RS) data from aerial and satellite platforms presents considerable challenges for agricultural monitoring and crop yield estimation due to the presence of noise caused by atmospheric interference, sensor anomalies, and outlier pixel values. This paper introduces a "Quartile Clean image" pre-processing technique to address these data issues by analyzing quartile pixel values in local neighborhoods to identify and adjust outliers. Applying this technique to 20,946 Moderate Resolution Imaging Spectroradiometer (MODIS) images from 2002 to 2015, improved the mean peak signal-to-noise ratio (PSNR) to 40.91 dB. Integrating Quartile Clean data with Convolutional Neural Networks (CNN) models with exponential decay learning rate scheduling achieved RMSE improvements up to 5.88% for soybeans and 21.85% for corn, while Long Short-Term Memory (LSTM) models demonstrated RMSE reductions up to 11.52% for soybeans and 29.92% for corn using exponential decay learning rates. To compare the proposed method with state-of-the-art technique, we introduce the vision Transformer (viT) model for crop yield estimation. The viT model, applied to the same dataset, achieves remarkable performance without explicit pre-processing, with R2 scores ranging from 0.9752 to 0.9875 for soybean and 0.9540 to 0.9888 for corn yield estimation. The RMSE values range from 7.75086 to 9.76838 for soybean and 26.25265 to 34.20382 for corn, demonstrating the viT model's robustness. This research contributes by (1) introducing the Quartile Clean image method for enhancing RS data quality and improving crop yield estimation accuracy, and (2) comparing it with the state-of-the-art viT model. The results demonstrate the effectiveness of the proposed approach and highlight the potential of the viT model for crop yield estimation, representing a valuable advancement in processing high-altitude imagery for precision agriculture applications. Novel Quartile Clean image technique i
Facial expression recognition (FER) is a tedious task in imageprocessing for complex real-world scenarios that are captured under different lighting conditions, facial obstructions, and a diverse range of facial orie...
详细信息
Facial expression recognition (FER) is a tedious task in imageprocessing for complex real-world scenarios that are captured under different lighting conditions, facial obstructions, and a diverse range of facial orientations. To address this issue, a novel Twinned attention network (Twinned-Att) is proposed in this paper for an efficient FER in occluded images. The proposed Twinned-Att network is designed in two separate modules: Holistic module (HM) and landmark centric module (LCM). The holistic module comprises of dual coordinate attention block (Dual-CA) and the Cross Convolution block (Cross-conv). The Dual-CA block is essential for learning positional, spatial, and contextual information by highlighting the most prominent characteristics in the facial regions. The Cross-conv block learns the spatial inter-dependencies and correlations to identify complex relationships between various facial regions. The LCM emphasizes smaller and distinct local regions while maintaining resilience against occlusions. vigorous experiments have been undertaken to improve the efficacy of the proposed Twinned-Att. The results produced by the Twinned-Att illustrate the remarkable responses which achieve the accuracies of 86.92%, 85.64%, 78.40%, 69.82%, 64.71%, 85.52%, and 85.83% for the datasets viz., RAF DB, FER PLUS, FER 2013, FED RO, SFEW 2.0, occluded RAF DB and occluded FER Plus respectively. The proposed Twinned-Att network is experimented with various backbone networks, including Resnet-18, Resnet-50, and Resnet-152. It consistently outperforms well and highlights its prowess in addressing the challenges of robust FER in the images captured in complex real-world environments.
暂无评论