Robust foreground detection is a fundamental precursor of many video processing applications. Although various approaches were advanced, there still exist many factors making detection very challenging: 1) Dynamic bac...
详细信息
ISBN:
(纸本)9781424444625
Robust foreground detection is a fundamental precursor of many video processing applications. Although various approaches were advanced, there still exist many factors making detection very challenging: 1) Dynamic background with gradual brightness changes, camera movement and large amount of noises. 2) Sharp illumination changes caused by shadows, light on-off, and so on. 3) Real-time requirement for practical systems. To overcome these problems, a new approach is proposed in this paper. It is based on the background of conventional Gaussian Mixed Model, incorporating tempo-spatial consistency validation to search genuine foreground seeds, so that foreground segments can be reliably acquired using region growth method. Experiments demonstrate that our approach achieves better performance than conventional GMM approach in detection accuracy, adaptability to sudden illumination changes and computation time.
Medium Grain Scalable (MGS) coding is widely used as the quality scalable video coding (SVC) method. In this paper, we study bitstream extractor for MGS-based bitstream. In MGS, the coded data corresponding to a quant...
详细信息
Error Control is an important aspect for video coding and transmission. In this paper, we study error control for MGS-based bitstream. Our method not only duplicate the key data of MGS slice to improve the quality of ...
详细信息
Nonsubsampled contourlet transform (NSCT) can provide flexible multiresolution, anisotropy, and directional expansion for images. Compared with the original contourlet transform, it is shift-invariant and can overcome...
详细信息
Nonsubsampled contourlet transform (NSCT) can provide flexible multiresolution, anisotropy, and directional expansion for images. Compared with the original contourlet transform, it is shift-invariant and can overcome the pseudo-gibbs phenomena around singularities. Fuzzy logic is an efficient intelligent method to handle uncertain information. In this paper, a novel image fusion algorithm is proposed based on the NSCT and fuzzy logic. Extensive experiments show that the proposed method can improve subjective and objective results compared to some other fusion approaches.
A novel objective quality for image fusion based on structural similarity and visual attention mechanism (VAM) is presented. By giving higher weight to the salient areas in the input images, the quality measure can es...
详细信息
A novel objective quality for image fusion based on structural similarity and visual attention mechanism (VAM) is presented. By giving higher weight to the salient areas in the input images, the quality measure can estimate how much visual meaningful information is preserved in the fused image. The correlation analysis between objective measure and subjective evaluation showed that our measures are more consistent with human subjective evaluation.
Medium Grain Scalable (MGS) coding is widely used as the quality scalable video coding (SVC) method. In this paper, we study bitstream extractor for MGS-based bitstream. In MGS, the coded data corresponding to a quant...
详细信息
Medium Grain Scalable (MGS) coding is widely used as the quality scalable video coding (SVC) method. In this paper, we study bitstream extractor for MGS-based bitstream. In MGS, the coded data corresponding to a quantization step size can be fragmented into at most 15 sub-layers, which is called MGS slices. The contribution of each slice to the video quality is evaluated by our proposed algorithm, and based on the importance of MGS slices, the optimal extraction is proposed to get the best video quality. Compared to conventional method of SVC, the proposed method can both expand the range of supported bitrates and improve the quality.
Current methods for image-text retrieval commonly propose various fusion modules to achieve robust visual-textual alignment, primarily relying on in-batch learning to guide the matching process. Some follow-up methods...
详细信息
Current methods for image-text retrieval commonly propose various fusion modules to achieve robust visual-textual alignment, primarily relying on in-batch learning to guide the matching process. Some follow-up methods seek to enlarge the number of negative samples to boost image-text contrastive learning. However, these methods often face challenges posed by semantic-consistent negatives, i.e., negatives samples that share correspondence with the ground truth, leading to confusion in learning cross-modal semantics. To address this issue, we propose a novel Retrieve with Authentic negative repository Learning (ReAL) method, which constructs a specific Authentic Negative Repository filled with valuable negative sample pairs. By introducing a Unique Negative Filter with a Discriminative Triplet Ranking Loss, ReAL effectively filters out the semantic-consistent negatives through similarity distribution analysis and threshold learning. Moreover, existing fusion paradigms suffer from intricate use of fine-grained representations from word- and region-level instances to progressively refine the fused embedding. In this paper, we propose a lightweight Cluster Refinement Module to exploit cross-modal semantics in a 1-way-1-out paradigm. Each visual-textual alignment can spontaneously uncover correlations with adjacent alignments through aggregation and re-allocation, without the need for a redundant and cost-inefficient refinement stage. Furthermore, ReAL employs dual momentum encoders with two memory banks, expanding the selection range of the Authentic Negative Repository to include a broader set of negatives. Extensive experiments conducted on Flickr30K, MS-COCO, and the augmented Flickr30K (with more hard negatives) demonstrate the superiority and robustness of ReAL, while also showcasing its significantly reduced inference time compared to other competitive baselines.
暂无评论