While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for...
详细信息
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e. speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based on morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes such as applause, explosion, bird's sound, etc. This fine-level classification and indexing step is based on time-frequency analysis of audio signals and the use of hidden Markov model (HMM) as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90% for the coarse-level classification, and higher than 85% for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
Recent research on imagedatabases has been aimed at the development of content-based retrieval techniques for the management of visual information. Compared with such visual information as color, texture, and spatial...
详细信息
Recent research on imagedatabases has been aimed at the development of content-based retrieval techniques for the management of visual information. Compared with such visual information as color, texture, and spatial constraints, shape is so important a feature associated with those image objects of interest that shape alone may be sufficient to identify and classify an object completely and accurately. This paper presents a novel method based on feature point histogram indexing for object shape representation in imagedatabases. In this scheme, the feature point histogram is obtained by discretizing the angles produced by the Delaunay triangulation of a set of unique feature points which characterize object shape in the context, and then counting the number of times each discrete angle occurs in the resulted triangulation. The proposed shape representation technique is translation, scale, and rotation independent. Our various experiments concluded that the Euclidean distance performs very well as the similarity measure function in combination with the feature point histogram computed by counting the two largest angles of each individual Delaunay triangle. Through the further experiment, we also found evidence that an image object representation using a feature point histogram provides an effective cue for image object discrimination.
Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in imagedatabases. Localization of such queries under changes in appearance, occ...
详细信息
Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in imagedatabases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem, for which current spatial access structures in databases are not suitable. In this paper we present a new method of indexing imagedatabases called location hashing that uses a special data structure called the location hash tree (LHT) for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing and determines simultaneously, the relevant images in the database and the regions within them that are most likely to contain a 2d pattern query without incurring detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query.
There is a growing need for the ability to query imagedatabases based on image content rather than strict keyword search. Most current image database systems that perform query by content require a distance computati...
详细信息
ISBN:
(纸本)0819424331
There is a growing need for the ability to query imagedatabases based on image content rather than strict keyword search. Most current image database systems that perform query by content require a distance computation for each image in the database. Distance computations can be time consuming, limiting the usability of such systems. There is thus a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for run-time creation of distance measures. In this paper, we introduce FIDS, or ''Flexible image Database System.'' FIDS allows the user to query the database based on user-defined polynomial combinations of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can return matches to the query image without directly comparing the query image to much of the database. FIDS is currently being tested on a database of eighteen hundred images.
A high-level representation of a video clip comprising information about its physical and semantic structure is necessary for providing appropriate processing, indexing and retrieval capabilities for videodatabases. ...
详细信息
ISBN:
(纸本)0819427527
A high-level representation of a video clip comprising information about its physical and semantic structure is necessary for providing appropriate processing, indexing and retrieval capabilities for videodatabases. We describe a novel technique which reduces a sequence of MPEG encoded video frames to a trail of points in a low dimensional space. In our earlier work,(1) we presented techniques applicable in 3-D, but in this paper, we describe techniques that can be extended to higher dimensions where improved performance is expected. In the low-dimensional space, we can cluster frames, analyze transitions between clusters and compute properties of the resulting trail. Portions of the trail can be classified as either stationary or transitional, leading to high-level descriptions of the video. Tracking the interaction of clusters over time, we lay the groundwork for the complete analysis and representation of the video's physical and semantic structure.
Current feature-based imagedatabases can typically perform efficient and effective searches on scalar feature information. However, many important features, such as graphs, histograms, and probability density functio...
详细信息
ISBN:
(纸本)081941767X
Current feature-based imagedatabases can typically perform efficient and effective searches on scalar feature information. However, many important features, such as graphs, histograms, and probability density functions, have more complex structure. Mechanisms to manipulate complex feature data are not currently well understood and must be further developed. The work we discuss in this paper explores techniques for the exploitation of spectral distribution information in a feature-based image database. A six band image was segmented into regions and spectral information for each region was maintained. A similarity measure for the spectral information is proposed and experiments are conducted to test its effectiveness. The objective of our current work is to determine if these techniques are effective and efficient at managing this type of image feature data.
Although content based retrieval of images is increasingly common, the use of media content as a basis for navigation has received relatively little attention. In this paper we describe our recent development of facil...
详细信息
ISBN:
(纸本)0819424331
Although content based retrieval of images is increasingly common, the use of media content as a basis for navigation has received relatively little attention. In this paper we describe our recent development of facilities in the MAVIS/Microcosm architecture for generic link authoring and following from non-text media and in particular, the use of shape and texture for content based navigation from images. Applications from a product catalogue and an archaeological collection are presented, together with an outline of an image viewer providing rapid delineation of object shapes in images when authoring or following links.
A novel similarity measure based on the Choquet integral was introduced for retrieving images from a image database that "mostly" fit the query image. We showed that in certain conditions the measure is a no...
详细信息
ISBN:
(纸本)0819431273
A novel similarity measure based on the Choquet integral was introduced for retrieving images from a image database that "mostly" fit the query image. We showed that in certain conditions the measure is a norm, a fact that can be used to reduce the searching time using the triangle inequality. To test the new measure, a content based imageretrieval system was built. The system was benchmarked against the visual retrieval cartridge, Virage, built into Oracle 8 database system. The results suggested that the new measure is useful for imageretrieval.
In this paper we address the problem of choosing appropriate features to describe the content of still pictures or video sequences including audio. As the computational analysis of these features is often time-consumi...
详细信息
In this paper we address the problem of choosing appropriate features to describe the content of still pictures or video sequences including audio. As the computational analysis of these features is often time-consuming it is useful to identify a minimal set allowing for an automatic classification of some class or genre. Further it can be shown that deleting the coherence of the features characterizing some class is not suitable to guarantee an optimal classification result. The central question of the paper is thus which features should be selected and how they should be weighted to optimize a classification problem.
An experimental video server for middle-scale video-on-demand services that uses a 'redundant double-layered disk array' can read out 100 MPEG-1 1.5-Mbps video streams simultaneously with a response time of un...
详细信息
ISBN:
(纸本)0819420441
An experimental video server for middle-scale video-on-demand services that uses a 'redundant double-layered disk array' can read out 100 MPEG-1 1.5-Mbps video streams simultaneously with a response time of under one second through an FDDI-LAN. An exclusive data method that switches between normal data and fast data and a skip-search method are used to provide fast visual search. The gateway connecting the video server LAN to a 6.312-Mbps constant bit-rate line allows broadcast services to be integrated with on- demand services. The protocol implemented in this gateway controls the visual search rate, corrects errors in downloaded data, and accelerates the playback mode changes.
暂无评论