As camera quality improves and their deployment moves to areas with limited bandwidth, communication bottlenecks can impair real-time constraints of an intelligent transportation systems application, such as video-bas...
详细信息
As camera quality improves and their deployment moves to areas with limited bandwidth, communication bottlenecks can impair real-time constraints of an intelligent transportation systems application, such as video-based real-time pedestrian detection. video compression reduces the bandwidth requirement to transmit the video which degrades the video quality. As the quality level of the video decreases, it results in the corresponding decreases in the accuracy of the vision-based pedestrian detection model. Furthermore, environmental conditions, such as rain and night-time darkness impact the ability to leverage compression by making it more difficult to maintain high pedestrian detection accuracy. The objective of this study is to develop a real-time error-bounded lossy compression (EBLC) strategy to dynamically change the video compression level depending on different environmental conditions to maintain a high pedestrian detection accuracy. We conduct a case study to show the efficacy of our dynamic EBLC strategy for real-time vision-based pedestrian detection under adverse environmental conditions. Our strategy dynamically selects the lossy compression error tolerances that maintain a high detection accuracy across a representative set of environmental conditions. Analyses reveal that for adverse environmental conditions, our dynamic EBLC strategy increases pedestrian detection accuracy up to 14% and reduces the communication bandwidth up to 14 x compared to the state-of-the-practice. Moreover, we show our dynamic EBLC strategy is independent of pedestrian detection models and environmental conditions allowing other detection models and environmental conditions to be easily incorporated.
Traditional spaceborne synthetic aperture radar (SAR) imaging algorithms are primarily designed for stationary targets. However, in practical scenarios, ship targets are often affected by complex angular motions induc...
详细信息
Nowadays fire detection and recognition are one of the major securities and security region for saving human life. Concern to this we are going to propose a new model for early detection and recognition of the fire fo...
详细信息
Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or vid...
This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilis...
详细信息
This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilised to maximise the degree of parallelism, making the programme capable of exploiting the GPU capabilities. The proposed approach enables to accelerate the ALF computation by an average of two times when compared to an already fully optimised version of the software decoder implementation over an embedded platform. Finally, this work presents an analysis of energy consumption, showing that the proposed methodology has a negligible impact on this key parameter.
video semantic segmentation is a challenging vision task since the temporal-spatial characteristics are difficult to model to satisfy the requirements of real-time and accuracy simultaneously. To tackle this problem, ...
详细信息
video semantic segmentation is a challenging vision task since the temporal-spatial characteristics are difficult to model to satisfy the requirements of real-time and accuracy simultaneously. To tackle this problem, this paper proposes a novel optical flow based method. We propose an adaptive threshold key frame scheduling strategy to model the temporal information by estimating the inter-frame similarity. To ensure segmentation accuracy, we construct a convolutional neural network named Quick Network with attention (QNet-attention), a lightweight image semantic segmentation model with a spatial-pyramid-pooling-attention module. The proposed network is further combined with optical flow estimation to realize a semantic segmentation framework. The performance of the proposed method is verified with existing benchmark methods. The experimental results indicated that our method achieves excellent balanced performance on accuracy and speed.
Recently the amount of videos produced by mobile devices has grown, as well as the variety of video analytics services, able to perform on-device classification, automated tagging, video retrieval, object tracking and...
详细信息
ISBN:
(纸本)9798350374292;9798350374285
Recently the amount of videos produced by mobile devices has grown, as well as the variety of video analytics services, able to perform on-device classification, automated tagging, video retrieval, object tracking and similarity analysis. However, high computational complexity limits their use in real-time applications (such as, objects detection in Closed Circuit Television (CCTV) system), may introduce poor user experience because of delays or may have privacy nuances because of video content sharing with edge computing services. Because of that, of special interest are approaches that use advanced hardware Artificial Intelligence (AI) features of modern mobile platforms, such as Qualcomm AI Engine, Samsung's AI Quantum Processor, MediaTek's AI processing Units etc. Online services for video search by an (optionally distorted) video fragment do exist, but they are very limited and bound to cloud-side processing. This inspired us to develop a core technology for near-duplicate video retrieval that would allow such a video search service to operate efficiently, be applicable for on-device execution and enable many other practical applications (e.g. video provenance, authenticity check or recommendation service). We propose an adaptation of modern Visual Transformer (VT) model for processing partially decompressed video stream and matching a small sets of reference frames (keyframes) with a pristine reference video. The adaptation of novel DINO model [3] in our proposed approach allows to improve the detection of near-duplicates by 20% (from 66.1% to 88.3%) in comparison with the state-of-the-art DnS [13] and S2VS [12] models. The proposed models are robust against popular video content modifications, such as affine transformations and visual effects, even with video transcoding by novel Essential video Coding (EVC) codec. Also, our proposed solution makes it possible to shorten processing duration down to 2.5 times in comparison with the approach the requires full v
The precise detection of plant centres is important for growth monitoring, enabling the continuous tracking of plant development to discern the influence of diverse factors. It holds significance for automated systems...
详细信息
ISBN:
(纸本)9798350372977;9798350372984
The precise detection of plant centres is important for growth monitoring, enabling the continuous tracking of plant development to discern the influence of diverse factors. It holds significance for automated systems like robotic harvesting, facilitating machines in locating and engaging with plants. In this paper, we explore the YOLOv4 (You Only Look Once) real-time neural network detector for plant centre detection. Our dataset, comprising over 12,000 images from 151 Arabidopsis thaliana accessions, is used to fine-tune the model. Evaluation of the dataset reveals the model's proficiency in centre detection across various accessions, boasting an mAP of 99.79% at a 50% IoU threshold. The model demonstrates real-timeprocessing capabilities, achieving a frame rate of approximately 50 FPS. This outcome underscores its rapid and efficient analysis of video or image data, showcasing practical utility in time-sensitive applications.
Recent deep generative models (DGMs) such as generative adversarial networks (GANs) and diffusion probabilistic models (DPMs) have shown their impressive ability in generating high-fidelity photorealistic images. Alth...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Recent deep generative models (DGMs) such as generative adversarial networks (GANs) and diffusion probabilistic models (DPMs) have shown their impressive ability in generating high-fidelity photorealistic images. Although looking appealing to human eyes, training a model on purely synthetic images for downstream imageprocessing tasks like image classification often results in an undesired performance drop compared to training on real data. Previous works have demonstrated that enhancing a real dataset with synthetic images from DGMs can be beneficial. However, the improvements were subjected to certain circumstances and yet were not comparable to adding the same number of realimages. In this work, we propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset. We hypothesize that the Content Gap accounts for a large portion of the performance drop when using synthetic images from DGM and propose strategies to better utilize them in downstream tasks. Extensive experiments on multiple datasets showcase that our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-real) and training on a mix of real and synthetic data (Data Augmentation), particularly in the data-scarce scenario.
Although the current deep neural network based no-reference video quality assessment (NR-VQA) methods can effectively simulate the human visual system (HVS), their interpretability is getting worse. The current method...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Although the current deep neural network based no-reference video quality assessment (NR-VQA) methods can effectively simulate the human visual system (HVS), their interpretability is getting worse. The current methods only extract the low-level features of space and time of the video and do not consider the impact of high-level semantics. However, the high-level semantic information in the video related to human subjective perception and related to its own quality can be perceived by the HVS. In this work, we design the multidimensional feature extractor (MDFE), which takes the text descriptions related to video quality factors as semantic guidance, and uses the Contrastive Language-image Pre-training (CLIP) model to perform zero-shot multidimensional feature extraction. Then, we further propose a zero-shot feature extraction method based on semantic guidance (ZE-FESG), which treats the MDFE as a feature extractor and acquires all the semantically corresponding features of the video by sliding over each frame of the video. Extensive experiments show that the proposed ZE-FESG has better interpretability and performance than the current mainstream 2D-CNN based feature extraction methods for NR-VQA. The code will be released on https://***/xiao-mi-d/ZE-FESG.
暂无评论