In this paper, a new background subtraction framework is proposed to deal with possible scenarios occurring in natural scenes. In this method, a combination of two feature descriptors, namely color information in HSV ...
详细信息
Robust visual tracking is a challenging task due to factors motion blur, fast motion, partial occlusion and illumination variation. Existing tracking algorithms represent a target candidate by templates or a linear co...
详细信息
High efficiency video coding (HEVC) defines 35 prediction modes in its intra prediction stage to signal the direction information of residual blocks. Traditionally, separable two-dimension (2-D) transforms (integer DC...
详细信息
ISBN:
(纸本)9781509053162
High efficiency video coding (HEVC) defines 35 prediction modes in its intra prediction stage to signal the direction information of residual blocks. Traditionally, separable two-dimension (2-D) transforms (integer DCT and DST) are utilized in a similar manner as in the previous H.264/AVC standards. However, such 2-D transforms cannot yield the best energy compaction for a 2-D directional source where the dominating directional information is other than the horizontal or vertical one. In order to overcome this drawback, we build an elliptical model with directionality and design some non separable transforms based on the Karhunen-Loeve transform in this paper. Specifically, we derive a non-separable transform in closed-form for each intra-prediction mode and replace the default transform in HEVC. Simulation results reveal that 1.7% and 2.0% on average and up to 7.7% and 8.1% BD-rate reduction can be achieved for luma and chroma component, respectively. In the meantime, the test results show that both the encoding time and decoding time increase only about 5%.
Segment propagation transfers object priors among images, which is an important prior generation manner in image segmentation. The existing propagation methods focus on object foreground propagation, while the detaile...
详细信息
ISBN:
(纸本)9781509053162
Segment propagation transfers object priors among images, which is an important prior generation manner in image segmentation. The existing propagation methods focus on object foreground propagation, while the detailed part propagation is deficiency, which is caused by the challenges that not only the multiple part regions, but also their relationships need to be transferred. In this paper, a part propagation method is proposed. Two level propagations such as object level propagation, and part level propagation are successively used for the part propagation. The object level propagation is to transfer global shape information among images, which is formulated as graph matching based edge fragments matching problem, with dynamic programming solution. The part level propagation is to transfer the more detailed part labels, which is formulated as pixel level structure matching problem, and is efficiently solved by traditional dense pixel matching methods. The proposed method is verified on 15 challenging classes selected from PASCAL 2010 dataset, Bird dataset and Cat-Dog dataset. The experimental results demonstrate the effectiveness of the proposed method.
This paper proposes a DC coefficient estimation algorithm for intra-predicted residual blocks in the High Efficiency Video Coding (HEVC) standard. Discarding the DC coefficient in the current coding block leads to a s...
详细信息
ISBN:
(纸本)9781509053162
This paper proposes a DC coefficient estimation algorithm for intra-predicted residual blocks in the High Efficiency Video Coding (HEVC) standard. Discarding the DC coefficient in the current coding block leads to a substantial bit-saving but produces at the same time strong discontinuities between this block and its neighboring reconstructed blocks. To overcome this problem, we propose an estimation algorithm for the DC coefficient, which solves an optimal offset in a closed form in the pixel domain to recover the corresponding block edges. Test results show that our algorithm achieves 1.0% and 1.4% BD-rate reduction on average for luma and chroma as compared with HM-16.6, respectively, when the sign-bit-hiding (SBH) technique is disabled. When SDH is set on, namely under the common test condition (CTC), the BD-rate reduction drops slightly to 0.7% and 1.1% for luma and chroma, respectively. In the meantime, the test results show that both encoding time and decoding time increase only slightly (about 10%, without any special optimization on programming our proposed algorithm).
Repairing co-segmentation results by consistency evaluation shows the improvement of the co-segmentation performance. However, the existing co-segmentation refinement methods focus on color feature, while the mid-leve...
详细信息
ISBN:
(纸本)9781509053162
Repairing co-segmentation results by consistency evaluation shows the improvement of the co-segmentation performance. However, the existing co-segmentation refinement methods focus on color feature, while the mid-level features based repairing, such as shape, is ignored. In this paper, we propose a new shape based co-segmentation refinement method. An edge map based segment completeness evaluation and a shape based segment consistency evaluation are firstly proposed. Then, we use the initial segments and their evaluation scores to refine each result by employing the object proposals. By repeating such two evaluation and refinement steps, final refined results are obtained. Compared with traditional methods where only the bad segment is repaired, all segments are simultaneously evaluated and refined in an iteration process in our method to achieve better results. We verify our method based on Icoseg dataset. The results show larger IOU values than the original results.
In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multi...
详细信息
In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference subspaces of different tasks are compared. Experiments are conducted on the identification of game tasks. The experimental results show that the proposed method with multimodal information outperforms the method in which only the single modality of image or spoken dialogue is applied. The proposed method achieves accurate performance even if less spoken dialogue is applied.
We focus on the problem of single image super-resolution in this paper. Given a low-resolution image, we seek to synthesize its underlying high-resolution details using a learning based method. Inspired by recent prog...
详细信息
ISBN:
(纸本)9780819482341
We focus on the problem of single image super-resolution in this paper. Given a low-resolution image, we seek to synthesize its underlying high-resolution details using a learning based method. Inspired by recent progress in compressive sensing, we use sparse representation prior to regularize this ill-posed problem. On the other hand, with natural image statistics taken into consideration, we enforce the prior only on those image patches associated with image primitives rather than on arbitrary ones. Specifically, each patch from primitive layer of the lowresolution image, which can be viewed as a low-dimensional projection of a high-resolution primitive patch, is conjectured to have a sparse representation concerning an over-complete dictionary. Under mild conditions, the sparse representation can be correctly restored from the low-dimensional projection according to the theory of compressive sensing. We also construct a dictionary using image primitive patches which works well on generic input images. Experiment results show the efficiency of our method by outperforming other learning-based methods both subjectively and objectively.
An image can be considered as a collection of small regions. Most researches of image understanding extract features of these regions, and investigate relationships between these regions and keywords of images that ar...
详细信息
ISBN:
(纸本)9780819469946
An image can be considered as a collection of small regions. Most researches of image understanding extract features of these regions, and investigate relationships between these regions and keywords of images that are annotated manually. There are also some researches that explore the ontology of words. However, little attention has been paid to the relationships among regions in an image. In this paper, we make a close study of this type of relationships without the assumption that they are independent for visual content understanding. We first analyze the co-occurrence of regions using a statistical relevance probability model (SRP). Since human attention in the perception process of an image first focuses in one region and then moves on to other relevant regions, we propose a novel model called region sequence prediction model (RSP) to describe it. In RSP, annotation keywords for region sequences of the image and their probabilities are generated by a hidden Markov model. Experimental results of both image annotation and retrieval on the Corel dataset (an open image dataset) show that mining the relationships of image regions will achieve comparative or better performance in visual content understanding.
暂无评论