Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the obj...
详细信息
Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object's appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases. In this paper, we introduce a multi-frame spatio-temporal context memory (STCM) network to exploit discriminative spatio-temporal cues in multiple adjacent frames by utilizing a multi-frame context interaction module (MCI) for memory construction. Based on the proposed MCI module, a sparse group memory reader is developed to enable efficient sparse matching during memory reading. Our proposed method is generic and achieves state-of-the-art performance with real-time speed on benchmark datasets such as DAVIS and YouTube-VOS. In addition, our model exhibits robustness to sparse videos with low frame rates.
As a model of cross-media intelligence that combines computer vision and natural language processing, video semantic annotation facilitates automatic location of events in videos and describes video content in natural...
详细信息
The latest video coding standard, the Universal video Coding Standard (VVC), uses new coding tools to greatly improve compression efficiency. However, the adaptive QP module in the coding framework ignores the charact...
详细信息
Balancing accuracy and speed is crucial for semantic segmentation in autonomous driving. While various mechanisms have been explored to enhance segmentation accuracy in lightweight deep learning networks, adding more ...
详细信息
Balancing accuracy and speed is crucial for semantic segmentation in autonomous driving. While various mechanisms have been explored to enhance segmentation accuracy in lightweight deep learning networks, adding more mechanisms does not always lead to better performance and often significantly increases processingtime. This paper investigates a more effective and efficient integration of three key mechanisms - context, attention, and boundary - to improve real-time semantic segmentation of road scene images. Based on an analysis of recent fully convolutional encoder-decoder networks, we propose a novel Scale-adaptive Attention and Boundary Aware (SABA) segmentation network. SABA enhances context through a new pyramid structure with multi-scale residual learning, refines attention via scale-adaptive spatial relationships, and improves boundary delineation using progressive refinement with a dedicated loss function and learnable weights. Evaluations on the Cityscapes benchmark show that SABA outperforms current real-time semantic segmentation networks, achieving a mean intersection over union (mIoU) of up to 76.7% and improving accuracy for 17 out of 19 object classes. Moreover, it achieves this accuracy at an inference speed of up to 83.4 frames per second, significantly exceeding real-timevideo frame rates. The code is available at https://***/liuchunyan66/SABA.
Guided upsampling is an effective approach for accelerating high-resolution imageprocessing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is repre...
详细信息
Guided upsampling is an effective approach for accelerating high-resolution imageprocessing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear interpolation of two low-resolution pixels, whose indices and weights are optimized to minimize the upsampling error. The downsampling can be jointly optimized in order to prevent missing small isolated regions. Our method can be derived from the color line model and local color transformations. Compared to previous methods, our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring. It is efficient, easy to implement, and free of sensitive parameters. We evaluate the proposed method with a wide range of image operators, and show its advantages through quantitative and qualitative analysis. We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution videoprocessing. In particular, for interactive editing, the joint optimization can be precomputed, thus allowing for instant feedback without hardware acceleration.
360° video streaming is one of the prevalent communication technologies for enhancing user experience and has thus seen widespread adoption in virtual and mixed reality applications. However, delivering content a...
详细信息
The increasing popularity of video streaming services and the widespread accessibility of high-speed internet underscore the importance of delivering cost-effective and seamless streaming experiences. Shared internet ...
详细信息
The increasing popularity of video streaming services and the widespread accessibility of high-speed internet underscore the importance of delivering cost-effective and seamless streaming experiences. Shared internet connections may lead to varying speeds, impacting Quality of Experience (QoE). Rate adaptation techniques aim to ensure smooth video transmission, but overly optimistic adaptations can compromise user experience. Objective video quality assessment is crucial for efficient rate adaptation to ensure smooth QoE. This research proposes a novel method incorporating temporal channel shifting into Convolutional Neural Networks (CNN) for video quality assessment while maintaining the computational simplicity of a 2D CNN model. The proposed approach relies on the EfficientNet architecture, initially pre-trained on quality-aware images, and fine-tune it using datasets of rate-adaptive videos. The model is trained and evaluated on two benchmark datasets, namely "Waterloo sQoE III" and "LIVE Netflix II," which consist of rate-adaptive videos annotated with subjective quality scores. Experimental results encompass the evaluation of Pearson, Spearman, and Kendall correlation coefficients, along with the computation time ratio for the proposed approach. The outcomes reveal competitive scores of 0.795, 0.652, 0.772, and 0.216 for the "Live Netflix II dataset" and 0.782, 0.713, 0.721, and 0.230 for the "Waterloo sQoE III dataset." Our proposed method, compared to 24 approaches for "Waterloo sQoE III" and 25 for "LIVE Netflix II," attains the highest correlation scores while maintaining near-real-timeprocessing efficiency. These results affirm the efficacy of our approach in accurately predicting human judgment (QoE) with computational efficiency.
In order to solve the problem of speed matching between image data output and acquisition, it is convenient to provide simple and flexible transmission for high-speed digital cameras and image acquisition cards. This ...
详细信息
In accordance to the problem of high latency in H.264/H.265 video transmission schemes, research was conducted on the design optimization scheme for H.264/H.265 low latency transmission based on the analysis of video ...
详细信息
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. Visual inspection of medicine packaging imag...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Counterfeit medicines present a severe public health threat, especially in low-resource countries where consumers lack reliable means to verify the medicines they purchase. Visual inspection of medicine packaging images through keypoint matching techniques offers a promising approach for detecting design inconsistencies that could indicate counterfeit products. However, conventional methods often struggle with high computational costs and reduced accuracy when processingimages of varying quality and perspectives. To address these limitations, we propose the Angle and Scale Voting (ASVote) method, which enhances keypoint-based image matching by introducing a 2D voting mechanism that leverages relative angles and scales of the keypoints to eliminate false matches(outliers) while identifying consistent matches (inliers). This approach significantly improves both processingtime and accuracy. Experiments on a real-world dataset of medicine packages show that ASVote improves processingtime and accuracy, outperforming conventional methods.
暂无评论