This dissertation presents a solution to problems arising from the demand for fast information access and for sharing in real-time multimedia transmission over the Internet. Our solution exploits software agents that ...
This dissertation presents a solution to problems arising from the demand for fast information access and for sharing in real-time multimedia transmission over the Internet. Our solution exploits software agents that are placed throughout the network environment. These hierarchical video analysis agents process multimedia streams in real time, and automatically decompose and understand the multimediacontent so as to facilitate information access and sharing. multimediacontent contains both the perceptual content such as color, motion, or acoustic features and the conceptual content, which is specified based on concepts or semantics that can be expressed by text descriptions. Both types of contents are embedded simultaneously in multimedia streams, and usually are complementary to each other. This dissertation adaptively analyzes both kinds of video contents by combining mixed media cues from audio, video and text. First, a high-performance module for on-line video segmentation based on scene-change detection is developed. It serves as the first step of any video stream construction and analysis. To meet the high computational demand, our proposed video scene change detection algorithms are very efficient while maintaining high accuracy and recall rates for fast on-line video analysis. Second, the perceptual features of audio and video data are analyzed in a bottom-up manner and integrated so as to discriminate among the different events in any video stream effectively. An efficient decision-tree learning algorithm is used to induce a set of if-then rules which link perceptual features with the video conceptual semantic contents. These rules not only serve as a video classifier, but also guide on-line real-time video/audio feature extraction and data redistribution. A novel knowledge-based system, where knowledge is stored as learned rules, is proposed to serve as a video semantic inference/classification engine. Third, we propose a hierarchical video categorizatio
This Volume 4519 of the conference proceedings contains 30 papers. Topics discussed include video segmentation, video content analysis and retrieval, semantics and knowledge representations, image analysis and retriev...
详细信息
This Volume 4519 of the conference proceedings contains 30 papers. Topics discussed include video segmentation, video content analysis and retrieval, semantics and knowledge representations, image analysis and retrieval, audio analysis and retrieval, video compression and delivery, video access and browsing, vide servers, video rate control and adaptation and video applications.
The volume of multimedia data generated nowadays is exploding. To efficiently access and retrieve desired information, tools that enable automated analysis based on content are becoming indispensable. multimedia conte...
The volume of multimedia data generated nowadays is exploding. To efficiently access and retrieve desired information, tools that enable automated analysis based on content are becoming indispensable. multimediacontent is defined at both perceptual and conceptual levels. The former refers to the content characterized purely by intrinsic perception properties such as color, motion, or acoustic features. The latter refers to the content that is specified based on concepts or semantics such as sunset, anchors, or news headline stories. At both levels, the content is embedded in multiple forms that are usually complimentary to each other. The main objective of this thesis is to adaptively analyze the multimediacontent by integrating cues from multiple modalities, including audio, video, and text, mainly in the scope of news broadcast. At the perceptual level, news broadcast data is segmented and classified into different video events such as news reporting and commercials. Audio and visual features are developed and integrated, aiming at discriminating different events effectively. Various classification mechanisms, including linear fuzzy threshold, maximum likelihood using Gaussian Mixture Model and Hidden Markov Model, Neural Network, as well as Support Vector Machine, are benchmarked. At the conceptual level, algorithms and demonstration systems for three applications are developed. In News Broadcast Browsing System, recovering and presentation of the embedded hierarchy structure of news broadcast are addressed. Important semantic objects such as hosting characters and headline news stories are adaptively extracted using the audio/visual models that are bootstrapped from on-line data. The problem of efficient search and retrieval of segmented multimedia objects based on audio is discussed in Query-by-example in Audio System. A distance metric framework is proposed to determine the difference of mixture type Probability Density Functions, and is applied in measuring
A fundamental task in video analysis is to organize and index multimedia data in a meaningful manner so as to facilitate user access for tasks such as browsing and retrieval. This paper addresses the problem of automa...
详细信息
ISBN:
(纸本)0819442437
A fundamental task in video analysis is to organize and index multimedia data in a meaningful manner so as to facilitate user access for tasks such as browsing and retrieval. This paper addresses the problem of automatic index generation of movie databases based on audiovisual information. In particular, given a movie we first extract key movie events including two-speaker dialog scenes, multiple-speaker dialog scenes and hybrid scenes by using the proposed window-based sweep algorithm and the K-means clustering algorithm. Following event detection, the identity of each individual speaker in a dialog scene is recognized based on a statistical maximum likelihood approach. The identification relies on the likelihood ratio calculation between the incoming speech data and Gaussian mixture models of the speakers and the background. It is evident that the event and the speaker identity information will serve as a crucial part of the movie index table. Preliminary experimental results show that, by integrating multiple media information, we can obtain robust and meaningful event detection and speaker identification results.
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application proces...
详细信息
ISBN:
(纸本)0819442437
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client;different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employes a hybrid approach by integrating a query-based (database) mechanism with content-based retrieval (CBR) functions;its specific language (CAROL/ST with CBR) supports spatiotemporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the "histories" of various clients' query activities;such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
In semantic content-based image/video browsing and navigation systems, efficient mechanisms to represent and manage a large collection of digital images/videos are needed. Traditional keyword-based indexing describes ...
详细信息
ISBN:
(纸本)0819439886
In semantic content-based image/video browsing and navigation systems, efficient mechanisms to represent and manage a large collection of digital images/videos are needed. Traditional keyword-based indexing describes the content of multimedia data through annotations such as text or keywords extracted manually by the user from a controlled vocabulary This textual indexing technique lacks the flexibility of satisfying various kinds of queries requested by database users and also requires huge amount of work for updating the information. Current content-based retrieval systems often extract a set of features such as color, texture, shape motion, speed, and position from the raw multimedia data automatically and store them as content descriptors. This content-based metadata differs from text-based metadata in that it supports wider varieties of queries and can be extracted automatically, thus providing a promising approach for efficient database access and management. When the raw data volume grows very large, explicitly extracting the content-information and storing it as metadata along with the images will improve querying performance since metadata requires much less storage than the raw image data and thus will be easier to manipulate. In this paper we maintain that storing metadata together with images will enable effective information management and efficient remote query. We also show, using a texture classification example, that this side information can be compressed while guaranteeing that the desired query accuracy is satisfied. We argue that the compact representation of the image contents not only reduces significantly the storage and transmission rate requirement, but also facilitates certain types of queries. algorithms are developed for optimized compression of this texture feature metadata given that tile goal is to maximize tile classification performance for a given rate budget.
The new MPEG-4 Audio standard provides two toolsets for synthetic Audio generation, Audio processing and multimediacontent description called Structured Audio (SA) and BInary Format for Scenes (BIFS). Moving from a s...
详细信息
ISBN:
(纸本)0780370414
The new MPEG-4 Audio standard provides two toolsets for synthetic Audio generation, Audio processing and multimediacontent description called Structured Audio (SA) and BInary Format for Scenes (BIFS). Moving from a systematic analysis of SA and from the implementation of an efficient SA decoder, this paper describes the design of a virtual DSP architecture able to exploit the data level parallelism contained in many typical audio processing algorithms. The proposed virtual DSP architecture shows good performance on general purpose platforms and can be easily adapted and optimized for parallel superscalar devices. The porting and results on a V-LIW DSP device confirm the effectiveness and flexibility of the approach, particularly suitable for standalone embedded solutions.
This paper analyzes the asymptotic performance of Maximum Likelihood (ML) channel estimation algorithms in wideband code division multiple access (WCDMA) scenarios. We concentrate on systems with periodic spreading se...
详细信息
ISBN:
(纸本)0780370414
This paper analyzes the asymptotic performance of Maximum Likelihood (ML) channel estimation algorithms in wideband code division multiple access (WCDMA) scenarios. We concentrate on systems with periodic spreading sequences (period larger than or equal to the symbol span) with high spreading factors, where the transmitted signal contains a code division multiplexed pilot for channel estimation purposes. Assuming randomized training and code sequences, we derive and compare the asymptotic covariances of the training-only (TO), semi-blind conditional ML (CML) and semi-blind Gaussian ML (GML) channel estimators.
The problem of controlling access to multimedia multicasts requires the distribution and maintenance of keying information. The conventional approach to distributing keys is to use a channel independent of the multime...
详细信息
ISBN:
(纸本)0780370414
The problem of controlling access to multimedia multicasts requires the distribution and maintenance of keying information. The conventional approach to distributing keys is to use a channel independent of the multimediacontent. We propose a second approach that involves the use of an data-dependent channel, and can be achieved for multimedia by using data embedding techniques. Using data embedding to convey rekeying messages can provide an additional layer of security when compared with the traditional approach. We then introduce multicast key distribution, and employ a recent tree-based key distribution scheme to exhibit the factors involved in transmitting keys using data embedding.
A promising class of nonlinear multiuser detectors is introduced for CDMA systems. These "iterated-decision" multiuser detectors use optimized multipass algorithms to successively cancel multiple-access inte...
详细信息
ISBN:
(纸本)0780370414
A promising class of nonlinear multiuser detectors is introduced for CDMA systems. These "iterated-decision" multiuser detectors use optimized multipass algorithms to successively cancel multiple-access interference (MAI) from received data and generate symbol decisions whose reliability increases monotonically with each iteration. They significantly outperform decorrelating detectors and linear minimum mean-square error (MMSE) multiuser detectors, but have the same order of computational complexity., When the ratio of the number of users to the spreading factor is below a certain threshold, iterated-decision multiuser detectors asymptotically achieve the performance of the "optimum" multiuser detector, i.e., maximum-likelihood (ML) decoding.
暂无评论