In this paper we present a novel approach to personal photo album management allowing the end user to efficiently access the collection without any need for tedious manual annotation or indexing of the photos. The pro...
详细信息
ISBN:
(纸本)9780819469922
In this paper we present a novel approach to personal photo album management allowing the end user to efficiently access the collection without any need for tedious manual annotation or indexing of the photos. The proposed work exploits methods and technology from the field of computer vision and pattern recognition for face detection, face representation and image annotation to automatically create description of images useful for content-based searching and retrieval. In fact, even if most of the used techniques are not reliable enough to address the general problem of content-based image retrieval, we show that, in a limited domain such as the one of personal photo album, it is possible to obtain results that improve the browsing capabilities of current photo album management systems. In particular, starting from the observation that most personal photos depict a usually small number of people in a relatively small number of different context (indoor, outdoor, beach, mountain, city etc...) we propose the use of automatic techniques to index images based on who is present in the scene and on the-context where the picture was taken. Experiments on a personal photo collection of about a thousand images proved that relatively simple content-based techniques lead to surprisingly good results in term of easyness of user access to-the data.
In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio-data. Our algorithm is based on '&...
详细信息
ISBN:
(纸本)9780819469922
In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio-data. Our algorithm is based on '' semantic audio texture analysis.'' At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based-on those classes. We study and present two types of scene changes, those corresponding to an overall audio-texture change and those corresponding to a-special '' transition marker '' used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establish its importance in multimedia search in general and for some specific queries in particular. The sys...
详细信息
ISBN:
(纸本)9780819469922
We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establish its importance in multimedia search in general and for some specific queries in particular. The system, inspired by an existing work on text detection and recognition in images, has been developed using, techniques involving detailed analysis of video frames producing candidate text regions. The text regions are then binarized and sent to a commercial OCR resulting in ASCII text, that is finally used to create search indexes. The system is evaluated using the TREVID data.. We compare the system's performance from an information retrieval perspective with another VOCR developed, using multi-frame integration and empirically demonstrate that deep analysis on individual video frames result in better video retrieval. We also evaluate the effect of various textual sources on multimedia retrieval by combining the VOCR outputs with automatic speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources outperforms the other system by a very large extent. For search queries that involve named entities, especially people names, the VOCR system even outperforms speech transcripts, demonstrating that source selection for particular query types is extremely essential.
In this paper, we present a logo and trademark retrieval system for unconstrained color image databases that extends the Color Edge Co-occurrence Histogram (CECH) object detection scheme. We introduce more accurate in...
详细信息
ISBN:
(纸本)9780819469922
In this paper, we present a logo and trademark retrieval system for unconstrained color image databases that extends the Color Edge Co-occurrence Histogram (CECH) object detection scheme. We introduce more accurate information to the, CECH, by virtue of incorporating color edge detection using vector order statistics. This produces a more accurate representation of edges in color images, in comparison to the simple color pixel difference classification of edges as seen in the CECH. Our proposed method is thus reliant on edge gradient information, and as such, we call this the Color Edge Gradient Co-occurrence Histogram (CEGCH). We use this as the main mechanism for our unconstrained color logo and trademark retrieval scheme. Results illustrate that the proposed retrieval system retrieves logos and trademarks with good accuracy, and outperforms the CECH object detection scheme with higher precision and recall.
Embedded multimedia applications consist of regular and irregular memory access patterns. Particularly, irregular pattern are not amenable to static analysis for extraction of access patterns, and thus prevent efficie...
详细信息
Embedded multimedia applications consist of regular and irregular memory access patterns. Particularly, irregular pattern are not amenable to static analysis for extraction of access patterns, and thus prevent efficient use of a Scratch Pad Memory (SPM) hierarchy for performance and energy improvements. To resolve this, we present a compiler strategy to optimize data layout in regular/ irregular multimedia applications running on embedded multiprocessor environments. The goal is to maximize the amount of accesses to the SPM over the entire system which leads to a reduction in the energy consumption of the system. This is achieved by optimizing data placement of application-wide reused data so that it resides in the SPMs of processing elements. Specifically, our scheme is based on a profiling that generates a memory access footprint. The memory access footprint is used to identify data elements with fine granularity that can profitably be placed in the SPMs to maximize performance and energy gains. We present a heuristic approach that efficiently exploits the SPMs using memory access footprint. Our experimental results show that our approach is able to reduce energy consumption by 30% and improve performance by 18% over cache based memory subsystems for various multimedia applications.
Applications that deliver multimediacontent to and display such content on mobile devices have become increasingly common in recent years. When faced with a large amount of content, unfamiliar users can make use of e...
详细信息
ISBN:
(纸本)9781424426843
Applications that deliver multimediacontent to and display such content on mobile devices have become increasingly common in recent years. When faced with a large amount of content, unfamiliar users can make use of each others' recommendations, through recommender systems, to find content of interest to them. As a case in point we present the design of a recommender system that can be used by tourists to request a travel itinerary, and subsequently browse multimediacontent for each recommended tourist spot. The techniques combined in our recommender system include genetic algorithms and fuzzy logic. Recommendations are chosen to match a user profile based on a user's personal preferences. Our system design has wide applicability in multimediasystems where the user requires assistance in content selection.
This paper addresses the inter-operability problem of multimedia terminal in media content delivery over heterogeneous networks and devices. We design and implement a terminal which includes a content browser providin...
详细信息
ISBN:
(纸本)9781424419678
This paper addresses the inter-operability problem of multimedia terminal in media content delivery over heterogeneous networks and devices. We design and implement a terminal which includes a content browser providing presentation, navigation and interaction with MPEG-21 Digital Item Declaration (MPEG-21 DID) in order to support universal media access (UMA). We optimized the architecture in a client-server distributed approach with Web Service support. This terminal enables the MPEG-21 standard compliant contentaccessible on different terminal devices via common Web browsers. Such a design strategy illustrates a next-generation multimedia terminal supporting inter-operability in multimediacontent adaptation over a heterogeneous delivery chain.
We present an unsupervised method to enrich textual applications with relevant images and colors. The images are collected by querying large image repositories and subsequently the colors are computed using image proc...
详细信息
ISBN:
(纸本)9780819469922
We present an unsupervised method to enrich textual applications with relevant images and colors. The images are collected by querying large image repositories and subsequently the colors are computed using image processing. A prototype system based on this method is presented where the method is applied to song lyrics. In combination with a lyrics synchronization algorithm the system produces a rich multimedia experience. In order to identify terms within the text that may be associated with images and colors, we select noun phrases using a part of speech tagger. Large image repositories are queried with these terms. Per term representative colors are extracted using the collected images. Hereto, we either use a histogram-based or a mean shift-based algorithm. The representative color extraction uses-the non-uniform distribution of the colors found in the large repositories. The images that are ranked best by the search engine are displayed on a screen, while the extracted representative colors are rendered on controllable lighting devices in the living room. We evaluate our method by comparing the computed colors to standard color representations of a set of English color terms. A second evaluation focuses on the distance in color between a queried term in English and its translation in a foreign language. Based on results from three sets of terms, a measure of suitability of a term for color extraction based on KL Divergence is proposed. Finally, we compare the performance of the algorithm using either the automatically indexed repository of Google Images and the manually annotated ***. Based on the results of these experiments, we conclude that using the presented method we can compute the relevant color for a term using a large image repository and image processing.
This paper presents a standards-based architecture for a complex and generic distributed multimedia scenario, which combines content search and retrieval, DRM, and context-based content adaptation together. The most i...
详细信息
ISBN:
(纸本)9780769532998
This paper presents a standards-based architecture for a complex and generic distributed multimedia scenario, which combines content search and retrieval, DRM, and context-based content adaptation together. The most innovative part of the proposed work comes from the integration of a flexible language for multimedia search based on MPEG Query Format (MPQF) standard with the application of video analysis algorithms for the automatic extraction of low-level features.
The exceptionally large nature of multimediacontent has motivated the creation of many different compression algorithms and encapsulation formats to make its transportation and storage feasible. Developers of multime...
详细信息
ISBN:
(纸本)9781605583037
The exceptionally large nature of multimediacontent has motivated the creation of many different compression algorithms and encapsulation formats to make its transportation and storage feasible. Developers of multimedia applications have to deal repeatedly with the massive number of forms in which content is present, turning the single task of media access into an unnecessary challenge. The open source project FOBS provides a way to abstract developers from these difficulties, by offering an intuitive and powerful object oriented multimediaaccess API. FOBS has been conceived to be in-herently platform independent and to be easily adaptable to multiple programming languages, making the addition of multimedia support possible in almost any application.
暂无评论