This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-base...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-based method to adaptively assign weights for a weighted average across neighboring samples, resulting in more precise prediction samples. The proposed method derives optimized weights for each intra prediction mode, for each block size, and for each sample position. To achieve a reasonable balance between encoding time and prediction accuracy, the conventional intra prediction mode is shared with the proposed method. Experimental evaluations have demonstrated that the proposed method provides bitrate reduction of up to 0.4%.
With advancements in computer vision and artificial intelligence, traffic sign recognition systems have become essential in advanced driver assistance and autonomous driving systems. These systems enable the precise d...
详细信息
With advancements in computer vision and artificial intelligence, traffic sign recognition systems have become essential in advanced driver assistance and autonomous driving systems. These systems enable the precise detection of key road information. However, recognizing small traffic signs in real-world scenarios remains a significant challenge due to their limited size and features. In this study, we propose YOLO-TSR, an efficient approach for detecting small traffic signs, inspired by the YOLOv8 framework. This method offers three major contributions: (1) We introduce an efficient attention mechanism in the Backbone to enhance focus on small targets;(2) We propose a downsampling process using slicing and reassembling operations in the backbone, which preserve information and improve feature extraction for small objects;(3) We refine the upsampling process in the head by applying the content-aware CARAFE operation, which enhances the model's detection performance. Experiments on the challenging TT100K and CCTSDB2021 datasets show that YOLO-TSR achieves a mAP50 of 72.73% and a mAP50-95 of 56.57% on TT100K, and a mAP50 of 87.86% and a mAP50-95 of 57.78% on CCTSDB2021, surpassing the performance of the original YOLOv8n on both datasets. Additionally, this method is real-time and demonstrates great potential for applications in advanced driver assistance systems and autonomous driving systems.
The automation of guitar tablature generation from video inputs holds significant promise for enhancing music education, transcription accuracy, and performance analysis. Existing methods face challenges with consiste...
详细信息
ISBN:
(纸本)9798350367782;9798350367775
The automation of guitar tablature generation from video inputs holds significant promise for enhancing music education, transcription accuracy, and performance analysis. Existing methods face challenges with consistency and completeness, particularly in detecting fretboards and accurately identifying notes. To address these issues, this paper introduces an advanced approach leveraging deep learning, specifically YOLO models for real-time fretboard detection, and Fourier Transform-based audio analysis for precise note identification. Experimental results demonstrate substantial improvements in detection accuracy and robustness compared to traditional techniques. This paper outlines the development, implementation, and evaluation of these methodologies, aiming to revolutionize guitar instruction by automating the creation of guitar tabs from video recordings.
Semantic segmentation on LiDAR imaging is increasingly gaining attention, as it can provide useful knowledge for perception systems and potential for autonomous driving. However, collecting and labeling real LiDAR dat...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Semantic segmentation on LiDAR imaging is increasingly gaining attention, as it can provide useful knowledge for perception systems and potential for autonomous driving. However, collecting and labeling real LiDAR data is an expensive and time-consuming task. While datasets such as SemanticKITTI [1] have been manually collected and labeled, the introduction of simulation tools such as CARLA [2], has enabled the creation of synthetic datasets on demand. In this work, we present a modified CARLA simulator designed with LiDAR semantic segmentation in mind, with new classes, more consistent object labeling with their counterparts from real datasets such as SemanticKITTI, and the possibility to adjust the object class distribution. Using this tool, we have generated SynthmanticLiDAR, a synthetic dataset for semantic segmentation on LiDAR imaging, designed to be similar to SemanticKITTI, and we evaluate its contribution to the training process of different semantic segmentation algorithms by using a naive transfer learning approach. Our results show that incorporating SynthmanticLiDAR into the training process improves the overall performance of tested algorithms, proving the usefulness of our dataset, and therefore, our adapted CARLA simulator. The dataset and simulator are available in https://***/vpulab/SynthmanticLiDAR.
This paper focuses on real-time object detection systems, analyzing existing Field-Programmable Gate Arrays (FPGAs) implementations that aim to achieve the best efficiency, performance, and accuracy at the same time. ...
详细信息
This paper focuses on real-time object detection systems, analyzing existing Field-Programmable Gate Arrays (FPGAs) implementations that aim to achieve the best efficiency, performance, and accuracy at the same time. These three metrics are typically crucial for domains such as autonomous driving, and robotics. Fortunately, recent advancements in object detection models, particularly based on Convolutional Neural Networks (CNNs), have significantly improved object detection accuracy and speed. When these models are combined with FPGAs, it is possible to achieve even more power efficiency and more easily satisfy real-time constraints. FPGAs can deliver low latency and high throughput by leveraging true parallelism making them suitable platforms for developing real-time object detection systems. This paper reviews existing literature on FPGA-based real-time object detection, discussing commonly used algorithms, acceleration techniques, and optimization strategies. Evaluation metrics and typical datasets for assessing real-time systems are also examined. We have compared the performance of these implementations by using pixel throughput as a fair metric across different systems while processingvideo streams or images. Insights into state-of-the-art works, comparative analysis, challenges, and future research directions are provided to guide researchers interested in leveraging FPGA devices for real-time object detection applications.
The COVID-19 pandemic has highlighted the need for efficient and non-contact health screening methods. Signal-based infrared imaging is an emerging field in biomedical engineering that enables remote monitoring of vit...
详细信息
The COVID-19 pandemic has highlighted the need for efficient and non-contact health screening methods. Signal-based infrared imaging is an emerging field in biomedical engineering that enables remote monitoring of vital signs. While fever is a common symptom, respiratory abnormalities often appear earlier, necessitating advanced screening systems that monitor both body temperature and respiratory patterns. This research presents an artificial intelligence-based screening device for health that identifies human respiratory patterns based on a deep learning model. The device is built with a Convolutional Neural Network (CNN) to extract features and a Long Short-Term Memory (LSTM) network to classify time-series patterns. The Softmax classifier accurately classifies respiratory patterns. It is learned on a specialized dataset of six breathing signal patterns, making it an effective model for real-time public health surveillance. The experimental result demonstrates that the proposed CNN-LSTM model achieves 91% accuracy, 90% precision, 93% recall, and an F1-score of 91%. It can be scaled up even further for medical real-time applications, paves the way to even greater future advancements in automated health surveillance.
In the current education field, the assessment of teaching management quality mostly relies on subjective judgment and static data, and lacks a real-time and dynamic feedback mechanism. In this study, we propose a dee...
详细信息
In the current education field, the assessment of teaching management quality mostly relies on subjective judgment and static data, and lacks a real-time and dynamic feedback mechanism. In this study, we propose a deep learning-based human behavior analysis method, which aims to assess teaching management quality in realtime by analyzing the behaviors of teachers and students in the classroom. First, in order to detect individual students in the video stream, an augmented detection framework based on YOLO v5s is introduced to process and analyze human actions and interaction patterns in the video data. Immediately after that, we design a channel residual decoupled convolutional neural network to recognize the different states of students. Teaching management quality is assessed by detecting students' classroom attention scores. By conducting experiments in different disciplines and teaching management environments to collect and train the model, the results show that the method can effectively improve the objectivity and accuracy of teaching management quality assessment.
With the increasing complexity of modern football tactics, how to intelligently and accurately analyze tactical changes in real-time during matches has become an important research direction. Traditional manual tactic...
详细信息
With the increasing complexity of modern football tactics, how to intelligently and accurately analyze tactical changes in real-time during matches has become an important research direction. Traditional manual tactical analysis methods are inefficient and susceptible to subjective bias. Therefore, using computer vision and deep learning technologies for tactical image recognition and analysis in football matches has gradually become a research hotspot. Convolutional Neural Networks (CNNs), as a powerful imageprocessing tool, have been widely applied in video analysis and player detection. However, multi-target motion prediction and tracking management in dynamic football match scenes still face significant challenges. Existing research mainly focuses on static image analysis or simple player tracking, but the high-frequency image updates, player interactions, and occlusion issues in football matches complicate multi-target tracking. While some deep learning-based methods for multi-target detection and tracking have made progress, challenges remain, such as handling high-density player targets and improving motion trajectory prediction accuracy. To address these shortcomings, this study proposes two core techniques based on CNNs: first, multi-target motion prediction, which accurately forecasts players' future positions based on historical motion data;second, multi-target tracking management, which uses deep learning to track and manage each player's movement trajectory in real-time. Through these two techniques, this research aims to improve the realtime and accuracy of tactical analysis in football matches, providing coaches and analysts with more scientific and efficient tactical decision-making support.
The false-target echo generation to Inverse Synthetic Aperture Radar (ISAR) is significant in jamming the enemy ISAR and promoting ISAR development. Generally, it requires false-target echo coherent with the radar, re...
详细信息
The false-target echo generation to Inverse Synthetic Aperture Radar (ISAR) is significant in jamming the enemy ISAR and promoting ISAR development. Generally, it requires false-target echo coherent with the radar, realtime and fine. However, conventional methods, such as digital image synthesizer (DIS), cannot meet those requirements. Moreover, existing methods do not consider the target's radial moving. To meet those demands, we propose an improved method in this study. We equivalently model echo formation as the synthesizer of two independent parts: (1) echo of remote target with radial moving and (2) echo of nearby extended target. In part one, accuracy is improved by utilising the Inner Pulse Motion (IPM) model and complexity is simplified by deducing it as a frequency offset modulation. In part two, the fine extended target echo is constructed by using convolution filtering whose resources consumption can be greatly reduced by separating it into an offline stage and a real-time stage. Our method is verified by algorithm simulations and actual experiments. The results indicate that it can build the fine false-target echo in real-time and can adapt to the target's radial velocity, different resolution and size. Compared with the conventional DIS method, our method reduces the computational complexity significantly and has more comprehensive functions.
Object detection technology is an important research content in the field of computer vision, and it is one of the important basic technologies for understanding image content. real-time operating system refers to the...
详细信息
ISBN:
(纸本)9798350374315
Object detection technology is an important research content in the field of computer vision, and it is one of the important basic technologies for understanding image content. real-time operating system refers to the operating system that can complete the processing of system request tasks within a specified time, and can provide timely response and high reliability are its main characteristics. Since the video-based target detection algorithm has high requirements on computing power and real-time performance, this paper proposes to deploy the target detection algorithm on the real-time operating system. The experiment verifies that the characteristics of the real-time operating system can improve the real-time performance of the target detection algorithm. Aiming at the hardware system of the industrial computer used in this paper, the principle and construction process of Xenomai real-time operating system are analyzed, and the scheme of building Linux+Xenomai real-time operating system on the industrial computer is proposed. Aiming at the application scenario with stable background and single target, the object detection algorithm based on imageprocessing is studied. Based on background difference method and three-frame difference method, an improved algorithm based on adaptive detection window of target region is proposed. Experimental results show that the improved algorithm has better real-time performance than the basic algorithm. Aiming at the application scenario with complex background and multiple targets, the object detection algorithm based on deep learning is studied, and the full convolutional neural network in Dlib machine learning library is selected for research and implementation. According to the hardware and system environment of this paper, a computational scale estimation method of the total convolutional neural network is proposed, and a method of deploying the network model trained in the GPU environment in the real-time operating system e
暂无评论