With the continuous progress of imageprocessing and machine vision technology, the demand for efficient and real-timeprocessing is becoming more and more prominent, especially in the field of high-noise image proces...
详细信息
ISBN:
(纸本)9798350377040;9798350377033
With the continuous progress of imageprocessing and machine vision technology, the demand for efficient and real-timeprocessing is becoming more and more prominent, especially in the field of high-noise imageprocessing. In this study, an adaptive Gaussian filtering algorithm is proposed, which is implemented based on FPGA and aims to improve the computational efficiency and real-time performance of the imageprocessing system. Compared with the traditional fixed-weight filter, this algorithm is able to dynamically adjust the filtering parameters according to different noise environments, effectively balancing noise suppression and image detail retention. We coded the algorithm using Verilog hardware description language and verified it on PYNQ-Z2 FPGA platform. The experimental results show that the adaptive algorithm outperforms the fixed-weight filtering method in terms of performance, especially in terms of noise suppression and detail preservation. Meanwhile, the FPGA hardware implements the reduction of filtering delay and optimization of resource consumption, making it well suited for real-time applications. This study demonstrates the promise of FPGA adaptive filtering for applications in medical imaging, remote sensing, and intelligent surveillance, which have stringent requirements for high-performance and high-efficiency processing. This research provides new hardware solutions for real-time, high-quality imageprocessing in constrained environments.
Transmitting a high data rate video to the cloud for real-timeprocessing purpose requires minimizing the latency, maximizing the application requirements, and optimizing power consumption for the entire system. In th...
详细信息
ISBN:
(纸本)9798350393767;9798350393774
Transmitting a high data rate video to the cloud for real-timeprocessing purpose requires minimizing the latency, maximizing the application requirements, and optimizing power consumption for the entire system. In this study, we employed a distributed videoprocessing model for an object detection task, assuming that video streams are captured by robots operating in the licensed 28 GHz Milliwave network, ensuring the stability of video uploads. Through the optimization of power consumption, the system efficiently allocated video analysis frames to appropriate devices, resulting in an 18% decrease in overall power usage.
Deep learning has become a popular tool across various fields and is increasingly being integrated into real-world applications such as autonomous driving cars and surveillance cameras. One area of active research is ...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Deep learning has become a popular tool across various fields and is increasingly being integrated into real-world applications such as autonomous driving cars and surveillance cameras. One area of active research is recognizing human actions, including identifying unsafe or abnormal behaviors. Temporal information is crucial for action recognition tasks. Global context, as well as the target person, are also important for judging human behaviors. However, larger networks that can capture all of these features face difficulties operating in real-time. To address these issues, we propose A*: Atrous Spatial Temporal Action Recognition for realtime Applications. A* includes four modules aimed at improving action detection networks. First, we introduce a Low-Level Feature Aggregation module. Second, we propose the Atrous Spatio-Temporal Pyramid Pooling module. Third, we suggest to fuse all extracted image and video features in an image-video Feature Fusion module. Finally, we integrate the Proxy Anchor Loss for action features into the loss function. We evaluate A* on three common action detection benchmarks, and achieve state-of-the-art performance on JHMDB and UCF101-24, while staying competitive on AVA. Furthermore, we demonstrate that A* can achieve real-time inference speeds of 33 FPS, making it suitable for real-world applications.
Owing to the impact of vibration on the carrier of a vehicle-mounted camera, video is shaking, resulting in decreased or failed recognition accuracy based on visual-target detection. To solve this problem, a video sta...
详细信息
Owing to the impact of vibration on the carrier of a vehicle-mounted camera, video is shaking, resulting in decreased or failed recognition accuracy based on visual-target detection. To solve this problem, a video stabilization algorithm based on grid motion statistics and an adaptive Kalman filter is proposed. Two important processes in video stabilization are motion estimation and motion smoothing. In the motion estimation stage, we adopt an erroneous matching removal algorithm that integrates grid motion statistics (GMS) to enhance the accuracy of motion estimation while reducing the matching time, further meeting the real-time and precision requirements of vehicle-mounted video stabilization. In the motion smoothing stage, we adaptively update the measurement noise covariance R in the adaptive Kalman filter based on the camera shake level, further improving the accuracy of motion smoothing under the condition of ensuring filter convergence. Finally, we compensate for the motion based on the relationship between the pre- and postsmooth motion trajectories, generating a stable video sequence. Experimental results demonstrate that the proposed algorithm exhibits good stability and effectiveness in vehicle-mounted video stabilization.
The uniformity of concrete is an important reference for the maturity of concrete, and is also closely related to the quality and safety of the product. In order to analyze the performance of concrete during the mixin...
详细信息
The uniformity of concrete is an important reference for the maturity of concrete, and is also closely related to the quality and safety of the product. In order to analyze the performance of concrete during the mixing process, in view of the problem that there is no scientific and effective method to detect the uniformity of concrete during the mixing process, this paper proposes an intelligent identification method for the uniformity of concrete based on dynamic mixing. The method measures the surface of concrete fluid through computer graphical modeling, applies mathematical and computational models to the interaction of fluid dynamics, and uses the computer to independently judge the characteristics of the concrete fluid state under non-artificial conditions, thereby obtaining the state of concrete uniformity. The experimental results show that the average accuracy of the intelligent identification method of concrete uniformity based on dynamic mixing is 97.14%, and the real-time monitoring speed reaches 12FPS/S, which has important reference significance for the real-time state detection of concrete. The identification accuracy and monitoring speed can both meet the actual monitoring needs of the concrete mixing station.
The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on mat...
详细信息
ISBN:
(纸本)9781713899921
The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on matching to estimate segmentation masks of subsequent frames. Lacking any adaptation mechanism, such methods are prone to test-time distribution shifts. This work focuses on matching-based VOS under distribution shifts such as video corruptions, stylization, and sim-to-real transfer. We explore test-time training strategies that are agnostic to the specific task as well as strategies that are designed specifically for VOS. This includes a variant based on mask cycle consistency tailored to matching-based VOS methods. The experimental results on common benchmarks demonstrate that the proposed test-time training yields significant improvements in performance. In particular for the sim-to-real scenario and despite using only a single test video, our approach manages to recover a substantial portion of the performance gain achieved through training on realvideos. Additionally, we introduce DAVIS-C, an augmented version of the popular DAVIS test set, featuring extreme distribution shifts like image-/video-level corruptions and stylizations. Our results illustrate that test-time training enhances performance even in these challenging cases. Project page: https://***/test-time-training-vos/
real-time semantic segmentation (SS) is a major task for various vision-based applications such as self-driving. Due to the limited computing resources and stringent performance requirements, streaming videos from cam...
详细信息
real-time semantic segmentation (SS) is a major task for various vision-based applications such as self-driving. Due to the limited computing resources and stringent performance requirements, streaming videos from camera-embedded mobile devices to edge servers for SS is a promising approach. While there are increasing efforts on task-oriented video compression, most SS-applicable algorithms apply more uniform compression, as the sensitive regions are less obvious and concentrated. Such processing results in low compression performance and significantly limits the capacity of edge servers supporting real-time SS. In this paper, we propose STAC, a novel task-oriented DNN-driven video compressive streaming algorithm tailed for SS, to strike accuracy-bitrate balance and adapt to time-varying bandwidth. It exploits DNN's gradients as sensitivity metrics for fine-grained spatial adaptive compression and includes a temporal adaptive scheme that integrates spatial adaptation with predictive coding. Furthermore, we design a new bandwidth-aware neural network, serving as a compatible configuration tuner to fit time-varying bandwidth and content. STAC is evaluated in a system with a commodity mobile device and an edge server with real-world network traces. Experiments show that STAC can save up to 63.7-75.2% of bandwidth or improve accuracy by 3.1-9.5% compared to state-of-the-art algorithms, while capable of adapting to time-varying bandwidth.
Versatile video Coding (VVC), standardized in 2022 as ITU-T Recommendation H.266, ISO/IEC 23090-3 and MPEG-I Part3, is the latest block-based hybrid video coding standard. It defines many tools to increase compression...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Versatile video Coding (VVC), standardized in 2022 as ITU-T Recommendation H.266, ISO/IEC 23090-3 and MPEG-I Part3, is the latest block-based hybrid video coding standard. It defines many tools to increase compression efficiency while maintaining the same quality level. The tradeoff is the computational complexity. The intra-coding loop of VVC is computationally very complex due to its nature of iteratively trying all possible Quad-Tree Multitype-Tree (QTMT) partitioning alternatives starting from 128x128 Coding Tree Unit (CTU) size and going down to 4x4 min Coding Unit (CU) size. For camera-taken real-world broadcast video, compared to smaller resolutions, the objects in 8K are larger with respect to the fixed CTU size of VVC. As a result, for such 8K video, less detailed coding in the VVC QTMT may be enough for practical purposes. In this paper, we define a new fast intra-partitioning algorithm for 8K video and compare its performance with Common Test Conditions (CTC) All-Intra (AI) configuration based on the compression efficiency (bits), quality (Y-PSNR, SSIM and MS-SSIM metrics), and computational complexity (runtime). We observe 81.61% runtime gain with only 2.99% increase in bitrate, 0.1339 dB decrease in Y-PSNR, 0.0022 decrease in SSIM and 0.0007 decrease in MS-SSIM on average. This is a remarkable gain in complexity with a limited effect on efficiency and quality.
image recognition and processing technology is an important application direction of artificial intelligence technology. With the growth of demand for various types of video intelligent analysis, the importance of usi...
详细信息
The wellsite serves as the fundamental unit in the development of oil and gas fields, functioning as a hub for the production activities, with workover operations being a critical means to ensure production continuity...
详细信息
The wellsite serves as the fundamental unit in the development of oil and gas fields, functioning as a hub for the production activities, with workover operations being a critical means to ensure production continuity. In addition, it plays a crucial role in environmental protection, preventing oil and gas leakage and pollution. Various pieces of mechanical equipment deployed at the wellsite are essential for tasks such as oil and gas extraction and well repair operations, holding a pivotal position in oil- and gasfield development. Consequently, intelligent wellsite implementation necessitates a primary focus on monitoring mechanical equipment, with video emerging as a vital form of multisource information at the wellsite. While existing research on wellsite video monitoring predominantly addresses system and data transmission issues, it falls short in addressing the challenges of real- time assessment and early warning in intelligent wellsite operations. This study introduces a method for identifying critical targets at the wellsite based on a scale- adaptive network. The model employs a multiscale fusion network to extract different image features and semantic features at various scales, facilitating their fusion. The processing of wellsite videoimages occurs in multiple stages, outputting predicted box locations and category information, enabling the localization and recognition of critical objects at the wellsite. Unlike traditional deep convolutional object detection methods, this model incorporates a parameter- free attention mechanism, enhancing the accurate feature learning of small targets during the extraction process and addressing the issue of multiscale imbalance. The experimental results validate the robust performance of the method, surpassing the latest one- stage object detection models and mainstream loss function methods. Comparative experiments demonstrate a 9.22% improvement in mean average precision (mAP) compared with YOLOv8, establishing th
暂无评论