The integration of artificial intelligence (AI) and unmanned aerial vehicle (UAV) technologies presents a significant advancement in enhancing safety in traffic, workplace, and healthcare environments. This study expl...
详细信息
ISBN:
(纸本)9783031835193;9783031835209
The integration of artificial intelligence (AI) and unmanned aerial vehicle (UAV) technologies presents a significant advancement in enhancing safety in traffic, workplace, and healthcare environments. This study explores the application of AI-driven computer vision algorithms in UAVs to detect and mitigate risks associated with substance abuse, fatigue, and health impairments. Utilizing sophisticated imageprocessing techniques, such as edge detection and support vector machine (SVM) algorithms, drones are equipped to autonomously monitor and analyze ocular characteristics and facial expressions of individuals. The research employs a mobile phone camera and Python-based libraries to conduct real-time assessments, providing critical data to medical and industrial professionals. The study demonstrates the potential of drones to enhance safety by checking sobriety and monitoring worker health. The experimental setup includes a detailed workflow for real-timevideo detection and facial analysis, leveraging pre-trained models and convolutional neural networks. The results confirm the effectiveness of this approach, highlighting significant progress in AI and UAV technology. Future work aims to transition these innovations from laboratory conditions to practical, real-world applications, continuously enhancing the algorithms and expanding their applicability across various safety-critical scenarios.
Lower resolutions and a lack of distinguishing features in large satellite imagery datasets make identification tasks challenging for traditional image classification models. Vision Transformers (ViT) address these is...
详细信息
ISBN:
(纸本)9781510673878;9781510673861
Lower resolutions and a lack of distinguishing features in large satellite imagery datasets make identification tasks challenging for traditional image classification models. Vision Transformers (ViT) address these issues by creating deeper spatial relationships between image features. Self attention mechanisms are applied to better understand not only what features correspond to which classification profile, but how the features correspond to each other within each separate category. These models, integral to computer vision machine learning systems, depend on extensive datasets and rigorous training to develop highly accurate yet computationally demanding systems. Deploying such models in the field can present significant challenges on resource constrained devices. This paper introduces a novel approach to address these constraints by optimizing an efficient Vision Transformer (TinEVit) for real-time satellite image classification that is compatible with ST Microelectronics AI integration tool, X-Cube-AI.
Hand gesture recognition is an advanced system that identifies hand movements in real-timevideo for applications such as volume control. The challenge in designing such a system lies in identifying the hand and creat...
详细信息
ISBN:
(数字)9798350350654
ISBN:
(纸本)9798350350661;9798350350654
Hand gesture recognition is an advanced system that identifies hand movements in real-timevideo for applications such as volume control. The challenge in designing such a system lies in identifying the hand and creating gestures recognizable by a single hand. This technology finds use in various fields, comprising sign language interpretation. The primary concept involves hand recognition, utilizing the Haar-cascade classifier (HCC) to implement hand motion recognition with OpenCV and Python. The research explores a method for identifying hand gestures based on shape-based feature recognition. The system configuration includes a single camera that captures user gestures and feeds them into the recognition system. The main target of gesture recognition is to develop a system capable of identifying specific human motions and using them to transmit control data to devices. With real-time gesture recognition, users can control a workstation by making specific gestures in front of the camera. Leveraging the OpenCV module, we create a hand gesture recognition system that allows device control without needing a keyboard or mouse. This approach involves several stages: capturing the hand gesture using a camera, processing the video frame to segment and the hand, and recognizing the gesture based on shape features. The HCC is employed for hand recognition due to its efficiency and accuracy in identifying hand regions in real-time. The implementation of this system promises a user-friendly and intuitive way of interacting with devices. By eliminating the need for physical input devices, it enhances accessibility and convenience. This research discusses the development of a hand gesture recognition system for volume control, highlighting the techniques used and the potential applications of this technology in improving human-computer interaction. The findings suggest that such systems can significantly enhance the user experience by providing an alternative, non-contact metho
The moving objects detection from freely moving camera like the one mounted on Unmanned Aerial Vehicle (UAV) stands as an important and challenging issue. This paper introduced a new MOD-IR method for moving objects d...
详细信息
The moving objects detection from freely moving camera like the one mounted on Unmanned Aerial Vehicle (UAV) stands as an important and challenging issue. This paper introduced a new MOD-IR method for moving objects detection from UAV-captured video sequences. The proposed method consists of four steps: (1) feature extraction and matching, (2) frame registration, (3) moving objects detection and (4) moving objects detection post-processing. Our method stands out from those of the literature in a number of ways. First, we enhanced the method effectiveness and robustness by handling the constraints related to this field through extracting robust features, on the one hand, and automatically defining the optimum threshold, on the other. Second, we proposed an efficient method able to deal with real-time applications by extracting keypoint features instead of pixel-to-pixel model estimation, and by simulating the search for the matching features among multiple trees. Finally, we involved the quick-shift segmentation in parallel with the three first steps, in order to enhance and accelerate the moving objects detection task. Relying on quantitative and qualitative evaluations of the proposed method on a variety of sequences extracted from several datasets (such as DARPA VIVID-EgTest05, Hopkins 155, UCF Aerial Action, etc.), we assessed the performance of our method compared to the state-of-the-art reference methods. Furthermore, the time cost evaluation has enabled us to emphasize that our MOD-IR method is the optimal choice for real-time applications, owing to its lower computational time requirement compared to the reference methods.
Deep convolutional neural networks have achieved great progress in image denoising tasks. However, their complicated architectures and heavy computational cost hinder their deployments on mobile devices. Some recent e...
详细信息
ISBN:
(纸本)9781728198354
Deep convolutional neural networks have achieved great progress in image denoising tasks. However, their complicated architectures and heavy computational cost hinder their deployments on mobile devices. Some recent efforts in designing lightweight denoising networks focus on reducing either FLOPs (floating-point operations) or the number of parameters. However, these metrics are not directly correlated with the on-device latency. In this paper, we identify the real bottlenecks that affect the CNN-based models' run-time performance on mobile devices: memory access cost and NPU-incompatible operations, and build the model based on these. To further improve the denoising performance, the mobile-friendly attention module MFA and the model reparameterization module RepConv are proposed, which enjoy both low latency and excellent denoising performance. To this end, we propose a mobile-friendly denoising network, namely MFDNet. The experiments show that MFDNet achieves state-of-the-art performance on real-world denoising benchmarks SIDD and DND under real-time latency on mobile devices. The code and pre-trained models will be released.
With the rapid development of artificial intelligence, human segmentation in video is becoming increasingly important in the field of computer vision. However, existing segmentation models suffer from inaccurate segme...
详细信息
With the rapid development of artificial intelligence, human segmentation in video is becoming increasingly important in the field of computer vision. However, existing segmentation models suffer from inaccurate segmentation, slow processing, and large model size that limit their deployment on resource-constrained devices. To this end, we propose a lightweight model called Efficient Memory Aggregation U-shaped Network (EMAUnet) for human segmentation in video, which is based on a traditional U-shaped network and attention mechanism. In EMAUnet, memory modules are combined with segmentation modules, enabling end-to-end learning of semantic extraction patterns for human images. Mobile-inverse bottleneck convolution (MBConv) is used as the network backbone that has relatively few parameters and computational complexity. Inverted sub-pixel down-sampling (ISP) is proposed to minimize information loss and achieve detail-preserving of segmentation. Coordinate attention (CA) is adopted to precisely locate the portrait area. Moreover, bidirectional memory update (BMU) and memory update trigger (MUT) are proposed to improve memory resource utilization and reduce unnecessary computation. Experimental results show that, compared with the classic model ISNet, EMAUnet has the mIoU, FPS and pixel accuracy increased by 2.3%, 25.0% and 1.8%, respectively, while the amount of parameters and the size of model decreased by 33.3% and 45.7%, respectively.
In this paper we present a method of fast computation of matrix transformation in the process of position transformation of objects of the scene between different, virtual or real, camera positions. The process finds ...
详细信息
In this paper we present a method of fast computation of matrix transformation in the process of position transformation of objects of the scene between different, virtual or real, camera positions. The process finds extensive use in virtual view generation in Free Viewpoint video (FVV) and virtual reality applications as well as in depth estimation algorithms. The proposed method relies on the reformulation of the matrix equation used in the process. As a result, the number of necessary arithmetic operations is reduced and some of the calculations can be reused for consecutive pixel position transformations. The presented algorithm produces identical output as an unoptimized algorithm with approximately 22% reduction of the processingtime averaged over the examined representative test sequences.
This research presents a novel computer vision-based attention monitoring system designed for both online and offline contexts. Leveraging advanced imageprocessing and machine learning algorithms, the system analyzes...
详细信息
In this paper, we propose a segmentation model using an anisotropic multi-well potential-based nonlinear transient PDE for colour images. A channel-wise greyscale classification approach is devised for colour image se...
详细信息
In this paper, we propose a segmentation model using an anisotropic multi-well potential-based nonlinear transient PDE for colour images. A channel-wise greyscale classification approach is devised for colour image segmentation. The time evolution of the PDE model is carried out by the implicit-explicit convexity splitting approach. Further, we consider the fractional version of the time-discretised model by replacing the Laplacian with its fractional counterpart. The spatial terms are approximated by the Fourier basis under the pseudo-spectral method. The convergence and the stability of the numerical scheme are elaborated. Both models (fractional and non-fractional) are tested on some synthetic images and few real-world standard test images. The results on synthetic images are compared with those from the literature using Dice similarity index, Jaccard similarity index and BF score. Later the method is successfully applied on several medical images to classify the same.
The image quality is degraded in bad weather situations such as haze or fog. This problem can affect imageprocessing applications such as computer vision, security, and some other real-timeimageprocessing systems. ...
详细信息
暂无评论