Automatic image annotation assigns semantic labels to images thus presents great potential to achieve semantic-aware image retrieval. However, existing annotation algorithms are not scalable to this emerging need, bot...
详细信息
Automatic image annotation assigns semantic labels to images thus presents great potential to achieve semantic-aware image retrieval. However, existing annotation algorithms are not scalable to this emerging need, both in terms of comput.tional efficiency and the number of tags they can deal with. Facilitated by recent development of the large-scale image category recognition data such as ImageNet, we extrapolate from it a model for scalable image annotation and semantic-aware image retrieval, namely ObjectBook. The element in the ObjectBook, which is called an ObjectWord, is defined as a collection of discriminative image patches annotated with the corresponding objects. We take ObjectBook as a high-level semantic preserving visual vocabulary, and hence are able to easily develop efficient image annotation and inverted file indexing strategies for large-scale image collections. The proposed retrieval strategy is compared with state-of-the-art algorithms. Experimental results manifest that the ObjectBook is both discriminative and scalable for large-scale semantic-aware image retrieval.
This paper introduces a self-similarity matrix (SSM) based video copy detection scheme and a visual character-string (VCS) descriptor for SSM matching. SSM, which exploits the spatial and temporal information in a vid...
详细信息
This paper introduces a self-similarity matrix (SSM) based video copy detection scheme and a visual character-string (VCS) descriptor for SSM matching. SSM, which exploits the spatial and temporal information in a video clip, is extracted from exhaustive calculation of distances between the frames. The SSM based method treats the video clip as a whole and transforms the temporal self-similarity into a matrix. Moreover, by implementing the proposed VCS descriptor, the problem of SSM alignment failure and size variation can also be solved properly. Experimental evaluations based on CIVR07 copy detection corpus validate the effectiveness of the proposed solution.
In this paper, we present a hybrid text segmentation approach for embedded text in images, aiming to combining the advantages of the difference-based methods and the similarity-based methods together. First a new stro...
详细信息
In this paper, we present a hybrid text segmentation approach for embedded text in images, aiming to combining the advantages of the difference-based methods and the similarity-based methods together. First a new stroke edge filter is applied to obtain stroke edge map. Then a two-threshold method based on the improved Niblack thresholding technique is utilized to identify stroke edges. Those pixels between the edge pairs above the high threshold are collected to estimate the representative of stroke color, so that stroke pixels are further extracted by comput.ng the color similarity. Finally some heuristic rules are devised to integrate stroke edge and stroke region information to obtain better segmentation results. The experimental results show that our approach can effectively segment text from background.
Recently, visual saliency has drawn great research interest in the field of comput.r vision and multimedia. Various approaches aiming at calculating visual saliency have been proposed. To evaluate these approaches, se...
详细信息
Recently, visual saliency has drawn great research interest in the field of comput.r vision and multimedia. Various approaches aiming at calculating visual saliency have been proposed. To evaluate these approaches, several datasets have been presented for visual saliency in images. However, there are few datasets to capture spatiotemporal visual saliency in video. Intuitively, visual saliency in video is strongly affected by temporal context and might vary significantly even in visually similar frames. In this paper, we present an extensive dataset with 7.5-hour videos to capture spatiotemporal visual saliency. The salient regions in frames sequentially sampled from these videos are manually labeled by 23 subjects and then averaged to generate the ground-truth saliency maps. We also present three metrics to evaluate competing approaches. Several typical algorithms were evaluated on the dataset. The experimental results show that this dataset is very suitable for evaluating visual saliency. We also discover some interesting findings that would be addressed in future research. Currently, the dataset is freely available online together with the source code for evaluation.
Potential faults have greatly reduced the dependability of business process.s, so fault diagnosis is becoming an important issue which aims at supporting self-healing service flow execution. The existing fault handlin...
详细信息
Potential faults have greatly reduced the dependability of business process.s, so fault diagnosis is becoming an important issue which aims at supporting self-healing service flow execution. The existing fault handling mechanism provided by BPEL can only identify the faults which have been pre-defined in standards or by users. However, unexpected faults are also the main cause of failures in service flow execution. Therefore an effective diagnosis approach is needed to solve this problem. In this paper, we propose a logic-based approach for diagnosing unexpected faults in Web service flows. This approach uses dynamic description logic (DDL) to model business process.s, and diagnoses faults based on DDL reasoning. We provide the DDL-based diagnosing algorithm, which takes process.description and runtime information as inputs, and returns the related information of possible faults as the result. Moreover, to improve the efficiency of online diagnosis, the incremental DDL-based diagnosing algorithm is presented. Experimental results on a demo system show the effectiveness of this approach.
In this paper, by considering the multiple spatial-temporal characteristic of visual perception system, we propose a novel home video attention analysis method. Firstly, each frame of the video is segmented into regio...
详细信息
In this paper, we propose a visual-aural attention modeling based video content analysis approach, which can be used to automatically detect the highlights of the popular TV program - talk show video. First, the visua...
详细信息
In this paper, we propose a visual-aural attention modeling based video content analysis approach, which can be used to automatically detect the highlights of the popular TV program - talk show video. First, the visual and aural affective features are extracted to represent and model the human attention of highlight. For efficiency consideration, the adopted affective features are kept as few as possible. Then, a specific fusion strategy called ordinal-decision is used to combine the visual, aural attention models and form the attention curve for a video. This curve can reflect the change of human attention while watching TV. Finally, highlight segments are located at the peaks of the attention curve. Moreover, sentence boundary detection is used to refine the highlight boundaries in order to keep the segments' integrality and fluency. This framework is extensible and flexible in integrating more affective features with a variety of fusion schemes. Experimental results demonstrate our proposed visual-aural attention analysis approach is effective for talk show video highlight detection.
暂无评论