Moving Object Segmentation (MOS) is a fundamental task in computer vision. Due to undesirable variations in the background scene, MOS becomes very challenging for static and moving camera sequences. Several deep learn...
详细信息
Moving Object Segmentation (MOS) is a fundamental task in computer vision. Due to undesirable variations in the background scene, MOS becomes very challenging for static and moving camera sequences. Several deep learning methods have been proposed for MOS with impressive performance. However, these methods show performance degradation in the presence of unseen videos;and usually, deep learning models require large amounts of data to avoid overfitting. Recently, graph learning has attracted significant attention in many computer visionapplications since they provide tools to exploit the geometrical structure of data. In this work, concepts of graph signal processing are introduced for MOS. First, we propose a new algorithm that is composed of segmentation, background initialization, graph construction, unseen sampling, and a semi-supervised learning method inspired by the theory of recovery of graph signals. Second, theoretical developments are introduced, showing one bound for the sample complexity in semi-supervised learning, and two bounds for the condition number of the Sobolev norm. Our algorithm has the advantage of requiring less labeled data than deep learning methods while having competitive results on both static and moving camera videos. Our algorithm is also adapted for Video Object Segmentation (VOS) tasks and is evaluated on six publicly available datasets outperforming several state-of-the-art methods in challenging conditions.
Computer vision and Biometrics benefit from the recent advances in Pattern Recognition and Artificial Intelligence, which tends to make model-based face recognition more efficient. Also, deep learning combined with da...
详细信息
Computer vision and Biometrics benefit from the recent advances in Pattern Recognition and Artificial Intelligence, which tends to make model-based face recognition more efficient. Also, deep learning combined with data augmentation tends to enrich the training sets used for learning tasks. Nevertheless, face recognition still is challenging, especially because of imaging issues that occur in practice, such as changes in lighting, appearance, head posture and facial expression. In order to increase the reliability of face recognition, we propose a novel supervised appearance-based face recognition method which creates a low-dimensional orthogonal subspace that enforces the face class separability. The proposed approach uses data augmentation to mitigate the problem of training sample scarcity. Unlike most face recognition approaches, the proposed approach is capable of handling efficiently grayscale and color face images, as well as low and high-resolution face images. Moreover, proposed supervised method presents better class structure preservation than typical unsupervised approaches, and also provides better data preservation than typical supervised approaches as it obtains an orthogonal discriminating subspace that is not affected by the singularity problem that is common in such cases. Furthermore, a soft margins Support Vector machine classifier is learnt in the low-dimensional subspace and tends to be robust to noise and outliers commonly found in practical face recognition. To validate the proposed method, an extensive set of face identification experiments was conducted on three challenging public face databases, comparing the proposed method with methods representative of the state-of-the-art. The proposed method tends to present higher recognition rates in all databases. In addition, the experiments suggest that data augmentation also plays an essential role in the appearance-based face recognition, and that the CIELAB color space (L*a*b) is generally mor
Graph Neural Networks (GNNs) are neural models that use message transmission between graph nodes to represent the dependency of graphs. Variants of Graph Neural Networks (GNNs), such as graph recurrent networks (GRN),...
详细信息
Graph Neural Networks (GNNs) are neural models that use message transmission between graph nodes to represent the dependency of graphs. Variants of Graph Neural Networks (GNNs), such as graph recurrent networks (GRN), graph attention networks (GAT), and graph convolutional networks (GCN), have shown remarkable results on a variety of deep learning tasks in recent years. In this study, we offer a generic design pipeline for GNN models, go over the variations of each part, classify the applications in an organized manner, and suggest four outstanding research issues. Dealing with graph data, which provides extensive connection information among pieces, is necessary for many learning tasks. A model that learns from graph inputs is required for modelling physics systems, learning molecular fingerprints, predicting protein interfaces, and identifying illnesses. Reasoning on extracted structures (such as the dependency trees of sentences and the scene graphs of photos) is an important research issue that also requires graph reasoning models in other domains, such as learning from non-structural data like texts and images. Graph Neural Networks (GNNs) are primarily designed for dealing with graph-structured data, where relationships between entities are modeled as edges in a graph. While GNNs are not traditionally applied to image classification problems, researchers have explored ways to leverage graph-based structures to enhance the performance of Convolutional Neural Networks (CNNs) in certain scenario. GNN have been increasingly applied to Natural Language processing (NLP) tasks, leveraging their ability to model structured data and capture relationships between elements in a graph. GNN are also applied for traffic related problems particularly in modeling and optimizing traffic flow, analyzing transportation networks, and addressing congestion issues. GNN can be used for traffic flow prediction, dynamic routing & navigation, Anomaly detection, public transport network
The Internet of Things, artificial intelligence, machine learning, and big data are just a few of the cutting-edge technologies that are being integrated into manufacturing processes as part of the "Industry 4.0&...
详细信息
The Internet of Things, artificial intelligence, machine learning, and big data are just a few of the cutting-edge technologies that are being integrated into manufacturing processes as part of the "Industry 4.0" revolution. Computer vision is an essential component of Industry 4.0 regarding sustainability, developed as a disruptive technology that extracts and interprets visual information from digital photos or videos using imageprocessing techniques and advanced models. In the context of Industry 4.0, this article offers an overview of computer vision, including its associated prospects, difficulties, and applications. A particular emphasis is placed on sustainability. It explores computer visionapplications in robotics and automation, safety and security, process optimization, augmented reality, robotics and inspection, object identification and tracking, predictive maintenance, and quality control and inspection. The study also identifies the critical approaches used to overcome the difficulties in implementing computer vision solutions. Incorporating computer vision into Industry 4.0 holds promise for unleashing unprecedented levels of efficiency, novelty, and competitiveness in the industrial sector. The manufacturing and industrial sectors may use Industry 4.0's prospects and adopt sustainable practices by utilizing computer vision and overcoming its inherent limits. This will help to create an eco-conscious and efficient future.
Depth estimation and 3D object detection are critical for autonomous systems to gain context of their surroundings. In recent times, compute capacity has improved tremendously, enabling computer vision and AI on the e...
详细信息
machinevision system plays vital roles in the industrial application in order to maintain quality and control the process. machinevision technology has numerous applications in various industries like automotive ind...
详细信息
Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection...
详细信息
ISBN:
(纸本)9783031821523;9783031821530
Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection. Due to the higher computing complexity of this technique and the large amount of data carried by the video signal, it is nearly impossible for ordinary general-purpose processors GPPs or CPUs to run these techniques in real-time, especially for embedded systems applications. Therefore, special hardware that can acquire, control, or execute in parallel is required. These specialized hardware systems include Digital Signal Processors DSPs, Field Programmable Gate Arrays FPGAs, Visual processing Units VPUs, Tensor processing Units TPUs, Neural processing Units NPUs or Graphics processing Units GPUs. This work presents the benefits of accelerating traditional object detection methods on a high-end embedded system, the Jetson Nano Developer Kit. This single computer board is equipped with the Tegra K1 System on Chip SoC, which is composed of a quad-core ARM A15 and 192 cores of Kepler-embedded GPU. Computing acceleration was ensured via the use of the CUDA OpenCV library for both the Histogram of Oriented Gradients HOG and the Haar Cascade Classifier. For VGA resolution, results reveal that the GPU implementation on this embedded system is 1.4x faster than the CPU for the HOG method and 2x for the Haar Cascade Classifier method.
Actions speak more than words. In the context of the above statement, the importance of gestures and using them to control a system has become popular. The hand gesture recognition system for opening applications in W...
详细信息
The recognition of facial emotions has received growing focus in recent years due to its importance and the significant role it plays in shaping the way humans interact with computers. This can be achieved using deep ...
详细信息
Body movements are an essential part of non-verbal communication as they help to express and interpret human emotions. The potential of Body Emotion Recognition (BER) is immense, as it can provide insights into user p...
详细信息
ISBN:
(纸本)9783031667428;9783031667435
Body movements are an essential part of non-verbal communication as they help to express and interpret human emotions. The potential of Body Emotion Recognition (BER) is immense, as it can provide insights into user preferences, automate real-time exchanges and enable machines to respond to human emotions. BER finds applications in customer service, healthcare, entertainment, emotion-aware robots, and other areas. While face expression-based techniques are extensively researched, detecting emotions from body movements in the realworld presents several challenges, including variations in body posture, occlusions, and background. Recent research has established the efficacy of transformer deep-learning models beyond the language domain to solve video and image-related problems. A key component of transformers is the self-attention mechanism, which captures relationships among features across different spatial locations, allowing contextual information extraction. In this study, we aim to understand the role of body movements in emotion expression and to explore the use of transformer networks for body emotion recognition. Our method proposes a novel linear projection function of the visual transformer, which enables the transformation of 2D joint coordinates into a conventional matrix representation. Using an original method of contextual information learning, the developed approach enables a more accurate recognition of emotions by establishing unique correlations between individual's body motions over time. Our results demonstrated that the self-attention mechanism was able to achieve high accuracy in predicting emotions from body movements, surpassing the performance of other recent deep-learning methods. In addition, the impact of dataset size and frame rate on classification performance is analyzed.
暂无评论