Model-based approach to videoretrieval requires ground-truth data for training the models. This leads to the development of video annotation tools that allow users to annotate each shot in the video sequence as well ...
详细信息
Model-based approach to videoretrieval requires ground-truth data for training the models. This leads to the development of video annotation tools that allow users to annotate each shot in the video sequence as well as to identify and label scenes, events, and objects by applying the labels at the shot-level. The annotation tool considered here also allows the user to associate the object-labels with an individual region in a key-frame image. However, the abundance of video data and diversity of labels make annotation a difficult and overly expensive task. To combat this problem, we formulate the task of annotation in the framework of supervised training with partially labeled data by viewing it as an exercise in active learning. In this scenario, one first trains a classifier with a small set of labeled data, and subsequently updates the classifier by selecting the most informative, or most uncertain subset of the available data-set. Consequently, propagation of labels to yet unlabeled data is automatically achieved as well. The purpose of this paper is primarily twofold. The first is to describe a video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise. The tool is semi-automatic in that it automatically propagates labels to "similar" shots, which requires the user to confirm or reject the propagated labels. The second purpose is to show how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool. While many versions of active learning could be thought of, we specifically report results on experiments with support vector machine classifiers with polynomial kernels.
How to quickly and effectively exchange video information with the user is a major task for video searching engine's user interface. In this paper, we proposed to use Moving Edge Overlaid Frame (MEOF) image to sum...
详细信息
How to quickly and effectively exchange video information with the user is a major task for video searching engine's user interface. In this paper, we proposed to use Moving Edge Overlaid Frame (MEOF) image to summarize both the local object motion and global camera motion information of the video clip into a single image. MEOF will supplement the motion information that is generally dropped by the key frame representation, and it will enable faster perception for the user than viewing the actual video. The key technology of our MEOF generating algorithm involves the global motion estimation (GME). In order to extract the precise global motion model from general video, our GME module takes two stages, the match based initial GME and the gradient based GME refinement. The GME module also maintains a sprite image that will be aligned with the new input frame in the background after the global motion compensation transform. The difference between the aligned sprite and the new frame will be used to extract the masks that will help to pick out the moving objects' edges. The sprite is updated with each input frame and the moving edges are extracted at a constant interval. After all the frames are processed, the extracted moving edges are overlaid to the sprite according to there global motion displacement with the sprite and the temporal distance with the last frame, thus create our MEOF image. Experiments show that the MEOF representation of the video clip helps the user acquire the motion knowledge much faster and also be compact enough to serve the needs of online applications.
video copy detection is a complementary approach to watermarking. As opposed to watermarking, which relies on inserting a distinct pattern into the video stream, video copy detection techniques match content-based sig...
详细信息
video copy detection is a complementary approach to watermarking. As opposed to watermarking, which relies on inserting a distinct pattern into the video stream, video copy detection techniques match content-based signatures to detect copies of video. Existing typical content-based copy detection schemes have relied on image matching. This paper proposes two new sequence-matching techniques for copy detection and compares the performance with one of the existing techniques. Motion, intensity and color-based signatures are compared in the context of copy detection. Results are reported on detecting copies of movie clips.
Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essent...
详细信息
Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in summary evaluation with user studies.
This research addresses the problem of automatically extracting semantic video scenes from daily movies using multimodal information. A 3-stage scene detection scheme is proposed. In the first stage, we use pure visua...
详细信息
This research addresses the problem of automatically extracting semantic video scenes from daily movies using multimodal information. A 3-stage scene detection scheme is proposed. In the first stage, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. In the second stage, the audio cue is integrated to further refine scene detection results by considering various kinds of audio scenarios. Finally, in the third stage, we allow users to directly interact with the system so as to fine-tune the detection results to their own satisfaction. The generated scene structure can provide a compact yet meaningful abstraction of the video data, which will apparently facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results.
A method to handle searching a face image database (FID) is proposed to support police officers when searching criminal records from a central registration database system (CRDS). The proposed method assumes that each...
详细信息
A method to handle searching a face image database (FID) is proposed to support police officers when searching criminal records from a central registration database system (CRDS). The proposed method assumes that each FID consists of a fixable object and object correlation. The proposed method employs a database search, so that all images with a similarity-based measure are retrieved. Consequently, the proposed method is much faster than sequential searching, especially when an additional set of attributes, like scar, is defined. Moreover it requires less storage space.
This paper describes an ultrasound image compression algorithm, related to mosaic image compression, where each compressed object is represented as a set of indexes to the database of mosaic elements. The proposed app...
详细信息
This paper describes an ultrasound image compression algorithm, related to mosaic image compression, where each compressed object is represented as a set of indexes to the database of mosaic elements. The proposed approach is considered as an alternative method for ultrasound biomedical image compression. A memory efficient implementation based upon the tree tessellation algorithm v.10 (TTA10) indexing/retrieval solution for managing mosaic elements. The principal advantage of the current approach is that for the very specific kind of ultrasound images (cardiology), it classifies a particular type of image (compression stage) first, and secondly uses a specially constructed schema to decompose an object into parts. Each part of the image is identified as the most similar mosaic image in the database using TTA10, and finally only the index is stored in compressed structure of an image. Since the size of ultrasound imagestorage is similar to the video one, while the specific biomedical content is highly correlated, the usage of mosaic image compression is an effective solution for storage.
Ire introduce a simple image coding method, the block truncation coding (BTC) technique, as a novel approach to the construction of colour imagedatabases. It is shown that BTC cars riot only be used to compress the i...
详细信息
ISBN:
(纸本)9628576623
Ire introduce a simple image coding method, the block truncation coding (BTC) technique, as a novel approach to the construction of colour imagedatabases. It is shown that BTC cars riot only be used to compress the images thus achieving storage efficiency, the BTC codes cart also be used directly, to construct image features for effective imageretrieval. From the BTC code we have developed an image feature termed the BTC colour co-occurrence matrix (BCCM) as an effective measure of image contents. Experimental results are presented to show that BCCM is comparable to state of the art techniques, such as color correlogram, in imageretrieval.
We introduce a simple image coding method, the block truncation coding (BTC) technique, as a novel approach to the construction of colour imagedatabases. It is shown that BTC can not only be used to compress images, ...
详细信息
ISBN:
(纸本)9628576623
We introduce a simple image coding method, the block truncation coding (BTC) technique, as a novel approach to the construction of colour imagedatabases. It is shown that BTC can not only be used to compress images, thus achieving storage efficiency, but the BTC codes can also be used directly to construct image features for effective imageretrieval. From the BTC code we have developed an image feature termed the BTC colour co-occurrence matrix (BCCM) as an effective measure of image contents. Experimental results are presented to show that BCCM is comparable to state of the art techniques, such as color correlogram, in imageretrieval.
A nine-direction lower-triangular (9DLT) matrix describes the relative spatial relationships among the objects in a symbolic image. In this paper, the 9DLT matrix will be transformed into a linear string, called 9DLT ...
详细信息
A nine-direction lower-triangular (9DLT) matrix describes the relative spatial relationships among the objects in a symbolic image. In this paper, the 9DLT matrix will be transformed into a linear string, called 9DLT string. Based on the 9DLT string, two metrics of similarity in image matching measures, simpler but more precise, are provided to solve the subimage and similar imageretrieval problems. Moreover, a common component binary tree (CCBT) structure will be refined to save a set of 9DLT strings. The revised CCBT structure not only eliminates the redundant information among those 9DLT strings, but also diminishes the processing time for determining the image matching distances between query frames and video frames. Experiments indicate that the storage space and the processing time are greatly reduced through the revised CCBT structure. A fast dynamic programming approach is also proposed to handle the problem of sequence matching between a query frame sequence and a video frame sequence, a zool Academic Press.
暂无评论