While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for...
详细信息
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e. speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based on morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes such as applause, explosion, bird's sound, etc. This fine-level classification and indexing step is based on time-frequency analysis of audio signals and the use of hidden Markov model (HMM) as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90% for the coarse-level classification, and higher than 85% for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
Multimedia Information Systems are experiencing a tremendous growth as a direct consequence of the popularity and pervasive use of world wide web. As a consequence, it is becoming increasingly important to provide eff...
详细信息
Multimedia Information Systems are experiencing a tremendous growth as a direct consequence of the popularity and pervasive use of world wide web. As a consequence, it is becoming increasingly important to provide efficient and flexible solutions for accessing and retrieving multimedia data. images and video are emerging as significant data types in multimedia systems. And yet, most commercial systems are still text and key-word based and do not fully exploit the image content of these systems. We believe that there is an opportunity to build a novel interactive multimedia system for some specific applications in electronic commerce. In this paper we present an overview of our approach, the rationale behind it and the problems that are inherent in building such a system. We address some of the technical issues in representing and analysing image primitive features. These are the building blocks of any such systems. They can be generalized into a much broader range of applications as well.
This paper presents algorithms to deal with problems associated with indexing high-dimensional feature vectors that characterize video data. Indexing high dimensional vectors is well known to be computationally expens...
详细信息
This paper presents algorithms to deal with problems associated with indexing high-dimensional feature vectors that characterize video data. Indexing high dimensional vectors is well known to be computationally expensive. Our solution is to optimally split the high dimensional vector into a few low dimensional feature vectors and querying the system for each feature vector. This involves solving an important sub-problem: developing a model of retrieval that enables us to query the system efficiently. Once we formulate the retrieval problem in terms of a retrieval model, we present an optimality criterion to maximize the number of results using this model. The criterion is based on a novel idea of using the underlying probability distribution of the feature vectors. A branch-and-prune strategy optimized per each query, is developed. This uses the set of features derived from the optimality criterion. Our results show that the algorithm performs well, giving a speedup of a factor of 25 with respect to a linear search while retaining the same level of Recall.
For the last few years, shot boundary detection has been recognized as an important research issue on videoretrieval. Also as a preliminary step for the task, it is essential to extract salient features from videos. ...
详细信息
For the last few years, shot boundary detection has been recognized as an important research issue on videoretrieval. Also as a preliminary step for the task, it is essential to extract salient features from videos. Recently, it has become common to perform the two tasks in compressed domain to alleviate their computational costs. In this paper, we propose novel shot boundary detection technique, which uses two feature images, or DC and edge images, extracted directly from MPEG compressed video. While a DC image can be easily obtained, edge image extraction usually requires considerable computational burden. For fast edge image extraction, we suggest to utilize only a few AC coefficients of each DCT block in motion compensated P-frames and B-frames as well as I-frames. This drastically reduces the computational burden compared to edge extraction in the spatial domain. In order to further reduce the computational burden, another edge image extraction technique is also suggested on the basis of AC prediction using DC images. By using the edge energy diagram obtained from edge images and histograms from DC images, shot boundaries such as abrupt transitions, fades, and dissolves are detected automatically. Simulation results show that the proposed techniques are fast and effective.
With the currently existing shot change detection algorithms, abrupt changes are detected fairly well. It is thus more challenging to detect gradual changes including fades, dissolves, and wipes as these are often mis...
详细信息
With the currently existing shot change detection algorithms, abrupt changes are detected fairly well. It is thus more challenging to detect gradual changes including fades, dissolves, and wipes as these are often missed or falsely detected. In this paper, we focus on the detection of wipes. The proposed algorithm begins by processing the visual rhythm, a portion of the DC image sequence. It is a single image, a sub-sampled version of a full video in which the sampling is performed in a pre-determined and in a systematic fashion. The visual rhythm contains distinctive patterns or visual features for many different types of video effects. The different video effects manifest themselves differently on the visual rhythm. In particular, wipes appear as curves that run from the top to the bottom of the visual rhythm. Thus, using the visual rhythm, it becomes possible to automatically detect wipes simply by determining various lines and curves on the visual rhythm.
This article describes the use of gesture recognition techniques in computer vision as a natural interface for video content navigation, and the design of a navigation and browsing system that caters to these natural ...
详细信息
This article describes the use of gesture recognition techniques in computer vision as a natural interface for video content navigation, and the design of a navigation and browsing system that caters to these natural means of computer-human interaction. For consumer applications, video content navigation presents two challenges: (1) how to parse and summarize multiple video streams in an intuitive and efficient manner, and (2) what type of interface will enhance the ease of use for video browsing and navigation in a living room setting or an interactive environment. In this paper, we address the issues and propose the techniques that combine video content navigation with gestures, seamlessly and intuitively, in an integrated system. The current framework can incorporate speech recognition technology. We present a new type of browser for browsing and navigating video content, as well as a gesture recognition interface for this browser.
In this paper we propose a method for tracking a video object in an ordered sequence of two-dimensional images, where the outcome is the trajectory of the video object throughout the time sequence of images. This meth...
详细信息
In this paper we propose a method for tracking a video object in an ordered sequence of two-dimensional images, where the outcome is the trajectory of the video object throughout the time sequence of images. This method is designed to run in real-time in a synchronous video collaboration environment, and used for producing dynamic object annotations for enhanced video content understanding. A dynamic object is an object whose location or size in the video frame constantly changes due to the camera motion, the motion of its own, or both. We suggest a novel method for finding the trajectory of the object in the intermediate frames given the locations and shapes of the object in two end frames. In addition to the shape and location information of the object, its texture information in the end frames is used to predict the location and search space of it in the intermediate frames.
In this paper, we propose a new image feature extraction method for MPEG compressed video. To minimize the MPEG decoding process, we use only DC values for Y, Cr, and Cb components for each macroblock. Then, we can ob...
详细信息
In this paper, we propose a new image feature extraction method for MPEG compressed video. To minimize the MPEG decoding process, we use only DC values for Y, Cr, and Cb components for each macroblock. Then, we can obtain a feature vector using the decoded DC values of Y, Cr, and Cb components for all macroblocks in an I frame. The feature vector consists of histograms for various colors, luminance, and edge types. In obtaining histograms for colors and luminance features, we consider the ratio of contributing pure colors and luminance to the chroma DC values for each macroblock. Then, we update all contributing colors and/or luminance histograms accordingly. Otherwise, if the macro block is classified as an edge block, then we update the corresponding edge type histogram. To demonstrate the performance of the proposed feature extraction method, we apply it to a scene change detection problem.
image search has been actively studied in recent years. On the other hands, image browsing has received little attention. image browsing refers to the process of presenting some forms of overview or summary of the ima...
详细信息
image search has been actively studied in recent years. On the other hands, image browsing has received little attention. image browsing refers to the process of presenting some forms of overview or summary of the image relationships, thus facilitating a user to navigate across the data set and find images of interests. In this paper, we present a new data structure built on the multi-linearization of image attributes for efficient organization of the data set and fast visual browsing of the images. We describe new techniques for multi-linearization based on multiple space-filling curves and hierarchical clustering techniques. In addition to providing fast navigation, our proposed data structure allows computationally efficient insertion and deletion of images from the data set. We then present a novel image navigator and browser built on dual-linearization data structure and intuitive presentation of image relevance and relationships, demonstrate the image navigation process, and report results on 1000 and 22,000 imagedatabases. We also discuss how our data structure can be extended to support fast image search.
Illumination invariance is of paramount importance to annotate video sequences stored in large videodatabases consistently. Yet, popular texture analysis methods such as multichannel filtering techniques do not yield ...
详细信息
Illumination invariance is of paramount importance to annotate video sequences stored in large videodatabases consistently. Yet, popular texture analysis methods such as multichannel filtering techniques do not yield illumination-invariant texture representations. In this paper, we assess the effectiveness of three illumination normalisation schemes for texture representations derived from Gabor filter outputs. The schemes aim at overcoming intensity scaling effects due to changes in ilumination conditions. A theoretical analysis and experimental results enable us to select one scheme as the most promising one. In this scheme, a normalising factor is derived at each pixel by combining the energy responses of different filters at that pixel. The scheme overcomes illumination variations well, while still preserving discriminatory textural information. Further statistical analysis may shed light on other interesting properties or limitations of the scheme.
暂无评论