Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machinevisionapplications. Many deep learning based FER models have been dev...
详细信息
Facial expression is an inevitable aspect of human communication, and hence facial emotion recognition (FER) has become the basis for many machinevisionapplications. Many deep learning based FER models have been developed and shown good results on emotion recognition. However, FER using deep learning still suffering from illumination conditions, noise around the face such as hair, background, and other ambience conditions. To mitigate such issues and improve the performance of FER, we propose Enhanced Face Localization augmented Light Convolution Neural Network (EFL-LCNN). EFL-LCNN incorporates three phase pre-processing and Light CNN, a trimmed VGG16 model. Three phase pre-processing includes face detection, enhanced face region cropping for ambience noise removal and image enhancement using CLAHE for addressing illumination problems. Three phase pre-processing is followed by the implementation of Light CNN to improve FER performance with reduced complexity. The EFL-LCNN is rigorously tested on four publicly available benchmark datasets: JAFFE, CK, MUG and KDEF. It is observed from the empirical results that the EFL-LCNN boosted recognition accuracies significantly when compared with the state-of-the-art.
Segmentation has been a rooted area of research having diverse dimensions. The roots of image segmentation and its associated techniques have supported computer vision, pattern recognition, imageprocessing, and it ho...
详细信息
Segmentation has been a rooted area of research having diverse dimensions. The roots of image segmentation and its associated techniques have supported computer vision, pattern recognition, imageprocessing, and it holds variegated applications in crucial domains. To compile the vast literature on machine learning and deep learning-based segmentation techniques and proffer statistical, comprehensive, semi-automated, and application-specific analysis, which could contribute to the ongoing research. 16,674 studies have been filtered out from the pool of 22,088 studies collocated by executing a search string on the Scopus database. These studies are analyzed for their meta-data, comprehensive content and reviewed to identify key research areas using the topic modeling-based method (LDA). Also, the segmentation role for mathematical expression recognition has been fathomed out. IEEE is a ubiquitous name in the terms of the renowned publisher, reputed journal (IEEE Access), and most cited affiliation (#10,472). Three out of five extracted topic solutions by the LDA model be evidence of streaming research areas in image segmentation. Medical imageprocessing, machinevision and Object Identification are the accentuated domains in the context. The streamlining of comprehensive analysis puts forth neural network-based approaches as a trend. Inquisition of segmentation techniques for mathematical expressions articulate neural-based segmentation techniques (CNN, RNN, LSTM) as preeminent segmentation techniques and geometrical features as focused features of the process. To sum up, the purpose of the current study is to summarize the best available research on image segmentation after synthesizing the results of an assorted set of studies.
image segmentation plays a critical role in unlocking the mysteries of the universe, providing astronomers with a clearer perspective on celestial objects within complex astronomical images and data cubes. Manual segm...
详细信息
image segmentation plays a critical role in unlocking the mysteries of the universe, providing astronomers with a clearer perspective on celestial objects within complex astronomical images and data cubes. Manual segmentation, while traditional, is not only time-consuming but also susceptible to biases introduced by human intervention. As a result, automated segmentation methods have become essential for achieving robust and consistent results in astronomical studies. This review begins by summarizing traditional and classical segmentation methods widely used in astronomical tasks. Despite the significant improvements these methods have brought to segmentation outcomes, they fail to meet astronomers' expectations, requiring additional human correction, further intensifying the labor-intensive nature of the segmentation process. The review then focuses on the transformative impact of machine learning, particularly deep learning, on segmentation tasks in astronomy. It introduces state-of-the-art machine learning approaches, highlighting their applications and the remarkable advancements they bring to segmentation accuracy in both astronomical images and data cubes. As the field of machine learning continues to evolve rapidly, it is anticipated that astronomers will increasingly leverage these sophisticated techniques to enhance segmentation tasks in their research projects. In essence, this review serves as a comprehensive guide to the evolution of segmentation methods in astronomy, emphasizing the transition from classical approaches to cutting -edge machine learning methodologies. We encourage astronomers to embrace these advancements, fostering a more streamlined and accurate segmentation process that aligns with the ever-expanding frontiers of astronomical exploration.
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), c...
详细信息
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.
Convolutional neural networks (CNNs) have shown great performance in computer vision tasks, from image classification to pattern recognition. However, CNNs '${\rm CNNs}{\prime }$ superior performance arises at the...
详细信息
Convolutional neural networks (CNNs) have shown great performance in computer vision tasks, from image classification to pattern recognition. However, CNNs '${\rm CNNs}<^>{\prime }$ superior performance arises at the expense of high computational costs, which restricts their employment in real-time decision-making applications. Computationally intensive convolutions can be offloaded to optical metasurfaces, enabling sub-picosecond latency and nearly zero energy consumption, but the currently reported approaches require additional bulk optics and can only process polarized light, which limits their practical usages in integrated lightweight systems. To solve these challenges, a novel design of the metasurface-based optical convolutional accelerator is experimentally demonstrated, offering an ultra-compact volume of 0.016 mm3${\rm mm}<^>machine$, a low cross-talk of -20 dB, polarization insensitivity, and is capable of implementing multiple convolution operations and extracting simultaneously various features from light-encoded images. The ultra-compact metasurface-based optical accelerator can be compactly integrated with a digital imaging system to constitute an optical-electronic hybrid CNN, which experimentally achieves a consistent accuracy of 96 % in arbitrarily polarized MNIST digits classification. The proposed ultra-compact metasurface-based optical convolutional accelerator paves the way for power-efficient edge-computing platforms for a range of machinevisionapplications. This work experimentally demonstrates a novel design of the metasurface-based optical convolutional accelerator offering an ultra-compact volume of 0.016 mm3${\rm mm}<^>machine$, low cross-talk of -20 dB, polarization insensitivity, and is capable of implementing multiple convolution operations and extracting simultaneously various features from light-encoded images. The metasurface-based optical convolutional accelerator paves the way for power-efficient edge-computing platforms for machine vis
As a key factor in the milling process, the wear status of the milling cutter has a significant impact on the machining quality of the workpiece. To detect wear on a milling machine efficiently and precisely, this pap...
详细信息
ISBN:
(纸本)9798350363272;9798350363265
As a key factor in the milling process, the wear status of the milling cutter has a significant impact on the machining quality of the workpiece. To detect wear on a milling machine efficiently and precisely, this paper presents the development of a milling machine wear detection system based on machinevision and digital imageprocessing. The system including link mechanisms and industrial camera is designed for auxiliary localization and collection of on-machineimages of milling cutter status. The image preprocessing method based on automatic threshold segmentation and Canny edge detection operator is proposed to identify the edge of cutter wear. The Maximum connected domains algorithm is used to screen the wear area of the milling cutter and the amount of wear is obtained based on a calibrated scaling method. Experimental results show that the proposed system is suitable for industrial use due to its rapid detection speed and strong recognition accuracy, which are desirable for engineering applications.
The use of high-altitude remote sensing (RS) data from aerial and satellite platforms presents considerable challenges for agricultural monitoring and crop yield estimation due to the presence of noise caused by atmos...
详细信息
The use of high-altitude remote sensing (RS) data from aerial and satellite platforms presents considerable challenges for agricultural monitoring and crop yield estimation due to the presence of noise caused by atmospheric interference, sensor anomalies, and outlier pixel values. This paper introduces a "Quartile Clean image" pre-processing technique to address these data issues by analyzing quartile pixel values in local neighborhoods to identify and adjust outliers. Applying this technique to 20,946 Moderate Resolution Imaging Spectroradiometer (MODIS) images from 2002 to 2015, improved the mean peak signal-to-noise ratio (PSNR) to 40.91 dB. Integrating Quartile Clean data with Convolutional Neural Networks (CNN) models with exponential decay learning rate scheduling achieved RMSE improvements up to 5.88% for soybeans and 21.85% for corn, while Long Short-Term Memory (LSTM) models demonstrated RMSE reductions up to 11.52% for soybeans and 29.92% for corn using exponential decay learning rates. To compare the proposed method with state-of-the-art technique, we introduce the vision Transformer (ViT) model for crop yield estimation. The ViT model, applied to the same dataset, achieves remarkable performance without explicit pre-processing, with R2 scores ranging from 0.9752 to 0.9875 for soybean and 0.9540 to 0.9888 for corn yield estimation. The RMSE values range from 7.75086 to 9.76838 for soybean and 26.25265 to 34.20382 for corn, demonstrating the ViT model's robustness. This research contributes by (1) introducing the Quartile Clean image method for enhancing RS data quality and improving crop yield estimation accuracy, and (2) comparing it with the state-of-the-art ViT model. The results demonstrate the effectiveness of the proposed approach and highlight the potential of the ViT model for crop yield estimation, representing a valuable advancement in processing high-altitude imagery for precision agriculture applications. Novel Quartile Clean image technique i
We consider a variational approach to the problem of structure + texture decomposition (also known as cartoon + texture decomposition). As usual for many variational problems in image analysis and processing, the ener...
详细信息
We consider a variational approach to the problem of structure + texture decomposition (also known as cartoon + texture decomposition). As usual for many variational problems in image analysis and processing, the energy we minimize consists of two terms: a data-fitting term and a regularization term. The main feature of our approach consists of choosing parameters in the regularization term adaptively. Namely, the regularization term is given by a weighted p(.)-Dirichlet-based energy ? a(x)|?u|( p(x)), where the weight and exponent functions are determined from an analysis of the spectral content of the image curvature. Our numerical experiments, both qualitative and quantitative, suggest that the proposed approach delivers better results than state-of-the-art methods for extracting the structure from textured and mosaic images, as well as competitive results on image enhancement problems.
The human retina is able to extract key feature information from a large amount of redundant visual information, which is the basis for efficient information processing in the human visual system. However, current ret...
详细信息
The human retina is able to extract key feature information from a large amount of redundant visual information, which is the basis for efficient information processing in the human visual system. However, current retina-inspired photonic synaptic devices lack fast noise filtering capabilities, limiting the speed of image preprocessing in neuromorphic visual systems. Here, a photonic synaptic transistor (PST) based on graphene/organic heterojunction that exhibits high photosensitivity and optically tunable synaptic characteristics from visible to near-infrared (488-1310 nm) is demonstrated. The PST enables light-intensity-controlled memory-free and long-memory mode switching, allowing to achieve fast image noise filtering in a PST-based vision sensor (processing times as low as 30 ms). In addition, image recognition in an artificial neural network connected by the PST, and the efficiency and accuracy of image recognition can be significantly improved by performing image noise filtering at the front-end is demonstrated. This work provides the potential to improve the information processing speed of bio-inspired neuromorphic visual systems and contribute to the development of machinevisionapplications. Here, a photonic synaptic transistor (PST) based on vertically stacked graphene/PEIE/BHJ configuration with high photosensitivity and optically tunable synaptic characteristics from visible to near-infrared is demonstrated. The PST exhibits novel light-intensity-controlled memory-free and long-memory switching characteristics for fast image noise filtering, which provide the potential for the development of bio-inspired neuromorphic visual systems in the future. image
A technique and an algorithm of digital surface imageprocessing are proposed to increase the validity of real-time detection of small size defects. The algorithm is implemented in the MATLAB programming environment. ...
详细信息
A technique and an algorithm of digital surface imageprocessing are proposed to increase the validity of real-time detection of small size defects. The algorithm is implemented in the MATLAB programming environment. The technique is based on segmentation of the high-frequency component of surface texture because small size defects are especially pronounced in this component. The high-frequency component, in particular roughness, is extracted by means of wavelet transform for frequency components separation and homomorphic filtration for compensation of low-frequency distortion caused by nonuniform illumination of test surface. Segmentation of the high-frequency texture component consists in formation of a binary image using the texture descriptors derived from the gray-level co-occurrence matrix as the segmentation threshold. The proposed technique and algorithm are approved in applications to defect detection for a simulated surface, for real ground surface of hardened steel, and for surfaces of carbon fiber reinforced plastic composite. Extraction efficiency of the high-frequency component of surface texture is shown. It is found that texture descriptors, "contrast' and "energy," can be applied as segmentation thresholds for defect extraction/determination on the ground (anisotropic) surface while segmentation of an image of a plastic composite (isotropic) surface is effective just with "energy" as a threshold. The proposed technique can be applied for simultaneously real-time monitoring the surface texture and detecting the small size defect in machinevision systems during production and operation of tribosystems.
暂无评论