Lane detection is critical in autonomous driving and advanced driver assistance systems (ADAS), furnishing vital information for vehicle navigation and safety. The study introduces lane detection methodology leveragin...
详细信息
This paper presents a novel reconstruction algorithm for video Snapshot Compressive Imaging (SCI). Inspired by recent research works on Transformers and Self-Attention mechanism in computer vision, we propose the firs...
详细信息
ISBN:
(纸本)9781665469647
This paper presents a novel reconstruction algorithm for video Snapshot Compressive Imaging (SCI). Inspired by recent research works on Transformers and Self-Attention mechanism in computer vision, we propose the first video SCI reconstruction algorithm built upon Transformers to capture long-range spatio-temporal dependencies enabling the deep learning of feature maps. Our approach is based on a Spatiotemporal Convolutional Multi-head Attention (ST-ConvMHA) which enable to exploit the spatial and temporal information of the video scenes instead of using fully-connected attention layers. To evaluate the performances of our approach, we train our algorithm on DAVIS2017 dataset and we test the trained models on six benchmark datasets. The obtained results in terms of PSNR, SSIM and especially reconstruction time prove the ability of using our reconstruction approach for real-time applications. We truly believe that our research will motivate future works for more video reconstruction approaches.
It’s very helpful for the person with visual impairments to assist in detecting blind lane, A novelty real-time detection for blind lane detection based on context salient attention mechanism and transfer learning is...
详细信息
Foreground detection is a significant area of study within the realm of computer vision and plays a crucial role in video-based applications. The Vibe algorithm is an efficient foreground detection method, and this pa...
详细信息
Despite the recent success of deep learning architectures, person re-identification (ReID) remains a challenging problem in real-word applications. Several unsupervised single-target domain adaptation (STDA) methods h...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Despite the recent success of deep learning architectures, person re-identification (ReID) remains a challenging problem in real-word applications. Several unsupervised single-target domain adaptation (STDA) methods have recently been proposed to limit the decline in ReID accuracy caused by the domain shift that typically occurs between source and target video data. Given the multimodal nature of person ReID data (due to variations across camera viewpoints and capture conditions), training a common CNN backbone to address domain shifts across multiple target domains, can provide an efficient solution for real-time ReID applications. Although multi-target domain adaptation (MTDA) has not been widely addressed in the ReID literature, a straightforward approach consists in blending different target datasets, and performing STDA on the mixture to train a common CNN. However, this approach may lead to poor generalization, especially when blending a growing number of distinct target domains to train a smaller CNN. To alleviate this problem, we introduce a new MTDA method based on knowledge distillation (KD-ReID) that is suitable for real-time person ReID applications. Our method adapts a common lightweight student backbone CNN over the target domains by alternatively distilling from multiple specialized teacher CNNs, each one adapted on data from a specific target domain. Extensive experiments(1) conducted on several challenging person ReID datasets indicate that our approach outperforms state-of-art methods for MTDA, including blending methods, particularly when training a compact CNN backbone like OSNet. Results suggest that our flexible MTDA approach can be employed to design cost-effective ReID systems for real-timevideo surveillance applications.
In recent applications, a modern object recognition model is available together with the video encoder. In this work, an adaptive bitrate control algorithm is proposed using a You Only Look Once v8 (YOLOv8) model for ...
详细信息
video quality assessment, especially for a massive scale of user-generated content, is an essential yet challenging computer vision and video analysis problem. Prior methods have been shown to be effective in mirrorin...
详细信息
ISBN:
(纸本)9798350365474
video quality assessment, especially for a massive scale of user-generated content, is an essential yet challenging computer vision and video analysis problem. Prior methods have been shown to be effective in mirroring subjective human opinion scores;however, they fail to capture the complicated, multi-dimensional aspects of factors that impact the overall perceptual quality. In this paper, we introduce COVER, a comprehensive video quality evaluator, a novel framework designed to evaluate video quality holistically - from a technical, aesthetic, and semantic perspective. Specifically, COVER leverages three parallel branches: (1) a Swin Transformer backbone implemented on spatially sampled crops to predict technical quality;(2) a ConvNet employed on subsampled frames to derive aesthetic quality;(3) a CLIP image encoder executed on resized frames to obtain semantic quality. We further propose a simplified cross-gating block to interact with the three branches before feeding into the predicting head. The final quality score is attained using a weighted sum of each sub-score, making a multi-faceted metric. Our experimental results demonstrate that COVER exceeds the state-of-the-art models in multiple UGC video quality datasets. Moreover, COVER offers a diagnosable quality report to explain the quality score in multiple pillars, while it is capable of processing 1080p videos at 3x faster speed than the real-time requirement. To facilitate future research on efficient and explainable video quality research, the code is available at https://***/vztu/COVER.
With the development of artificial intelligence technology, urban traffic management has become increasingly convenient, and the task of illegal parking detection has become a major research focus. Currently, most ill...
详细信息
Unmanned aerial vehicle (UAV) has the advantages of simple operation, sensitive response, flexible flight, long battery life and low cost, and has become a conventional way of power inspection. However, the video sign...
详细信息
Drone usage is increasing significantly in our daily life, from military to delivery purposes. Although drones are also used to detect objects by using different techniques, they are limited to detecting flying small ...
详细信息
暂无评论