Scalable video coding is a technique which allows a compressed video stream to be decoded in several different ways. This ability allows a user to adaptively recover a specific version of a video depending on its own ...
详细信息
ISBN:
(纸本)0819456586
Scalable video coding is a technique which allows a compressed video stream to be decoded in several different ways. This ability allows a user to adaptively recover a specific version of a video depending on its own requirements. video sequences have temporal, spatial and quality scalabilities. In this work we introduce a novel fully scalable video codec. It is based on a motion-compensated temporal filtering (MCTF) of the video sequences and it uses some of the basic elements of JPEG 2000. This paper describes several specific proposals for video on demand and video-conferencing applications over non-reliable packet-switching data networks.
The article focuses on the audio and video analysis for multimedia interactive services. It describes a system that automates home video editing. It automatically extracts a set of highlight segments from a set of raw...
详细信息
The article focuses on the audio and video analysis for multimedia interactive services. It describes a system that automates home video editing. It automatically extracts a set of highlight segments from a set of raw home videos and aligns them with user-supplied incidental music based on the content of the video and incidental music. Finally, it introduces a method for interactive image retrieval using query feedback. It learns the user query as well as the correspondence between high-level user concepts and their low-level machine representation by performing retrievals according to multiple queries supplied by the user during the course of a retrieval session.
Traditional visual communication systems convey only two-dimensional (2-D) fixed field-of-view (FOV) video information. The viewer is presented with a series of flat, non-stereoscopic images, which fail to provide a r...
详细信息
Traditional visual communication systems convey only two-dimensional (2-D) fixed field-of-view (FOV) video information. The viewer is presented with a series of flat, non-stereoscopic images, which fail to provide a realistic sense of depth. Furthermore, traditional video is restricted to only a small part of the scene, based on the director's discretion and the user is not allowed to "look around" in an environment. The objective of this work is to address both of these issues and develop new techniques for creating stereo panoramic video sequences. A stereo panoramic video sequence should be able to provide the viewer with stereo vision at any direction (complete 360-degree FOV) at video rates. In this paper, we propose a new technique for creating stereo panoramic video using a multicamera approach, thus creating a high-resolution output. We present a setup that is an extension of a previously known approach, developed for the generation of still stereo panoramas, and demonstrate that it is capable of creating high-resolution stereo panoramic video sequences. We further explore the limitations involved in a practical implementation of the setup, namely the limited number of cameras and the nonzero physical size of real cameras. The relevant tradeoffs are identified and studied.
A multi-scale DCT-domain image registration technique for two MPEG video inputs is proposed in this work. Several edge detectors are first applied to the luminance component of DC coefficients to generate the so-calle...
详细信息
ISBN:
(纸本)0819456586
A multi-scale DCT-domain image registration technique for two MPEG video inputs is proposed in this work. Several edge detectors are first applied to the luminance component of DC coefficients to generate the so-called difference maps for each input image. Then, a threshold is selected for each difference map to filter out regions of lower activity. Following that, we estimate the displacement parameters by examining the difference maps of the two input images associated with the same edge detector. Finally, the ultimate displacement vector is calculated by averaging the parameters from all detectors. In order to reach higher quality of the output mosaic, ID alignment is locally applied to pixels around the boundaries of displacement that is decided in the previous step. It is shown that the proposed method reduces the computation complexity dramatically as compared to pixel-based image registration techniques while reaching a satisfactory result in composition. Moreover, we discuss how the overlapping region affects the quality of alignment.
Modern displays and the introduction of video in personal computers attract a lot of attention for de-interlacing technology. In this paper, a subjective assessment of various de-interlacing techniques is presented an...
详细信息
ISBN:
(纸本)0819456586
Modern displays and the introduction of video in personal computers attract a lot of attention for de-interlacing technology. In this paper, a subjective assessment of various de-interlacing techniques is presented and compared with the popular PSNR metric. We used paired comparison, and tested five very different algorithms including inter- and intra-field linear, edge dependent, motion compensated and content adaptive intra-field methods. Our study reveals that the subjective scores of the de-interlacing techniques is highly correlated with the objective PSNR criterion.
Automatic video surveillance systems are used in many different fields of applications like intelligent vehicles, intelligent highways or in security tasks. An automatic and correct interpretation of video sequences i...
详细信息
ISBN:
(纸本)0819456586
Automatic video surveillance systems are used in many different fields of applications like intelligent vehicles, intelligent highways or in security tasks. An automatic and correct interpretation of video sequences is based on the detection, tracking and classification of various objects under highly diverse conditions. This requires highly sophisticated algorithms as well as high computational performance to fulfil real-time constraints. Here, Infineon Technologies develops a fully programmable, scalable multi-processor architecture optimized for videoprocessing, which provides a processing performance similar to actual PCs but at much lower costs, lower power consumption and lower physical size. This architecture supports data-parallelism for low-level imageprocessing, parallelism of tasks in the medium- and high-level as well as control oriented processing. A cycle accurate, virtual prototype of the architecture is available. A library of optimized imageprocessing functions supports a comfortable application development and the reuse of existing application software. Beside the implementation of standard low-level operators new efficient approaches for motion estimation, object detection and tracking are developed and tested in applications for intelligent vehicles and intelligent highway scenarios. The integration of these application specific tasks into the imageprocessing library results in a powerful embedded vision platform for video surveillance systems, which supports a comfortable application development.
In human visual system the spatial resolution of a scene under view decreases uniformly at points of increasing distance from the point of gaze, also called foveation point. This phenomenon is referred to as foveation...
详细信息
ISBN:
(纸本)0819456586
In human visual system the spatial resolution of a scene under view decreases uniformly at points of increasing distance from the point of gaze, also called foveation point. This phenomenon is referred to as foveation and has been exploited in foveated imaging to allocate bits in image and video coding according to spatially varying perceived resolution. Several digital imageprocessing techniques have been proposed in the past to realize foveated images and video, In most cases a single foveation point is assumed in a scene[1-3]. Recently there has been a significant interest in dynamic as well as multi-point foveation. The complexity involved in identification of foveation points is however significantly high in the proposed approaches [4-5]. In this paper, an adaptive multi-point foveation technique for video data based on the concepts of regions of interests (ROIs) is proposed and its performance is investigated. The points of interest are assumed to be centroid of moving objects and dynamically determined by the foveation algorithm proposed. Fast algorithm for implementing region based multi-foveation processing is proposed. The proposed adaptive multi-foveation fully integrates with existing video codec standard in both spatial and DCT domain.
Recent developments have given birth to the H.264/AVC, offering better bandwidth to video quality ratios than MPEG-2. It is expected that the H.264/AVC will take over the digital video market, replacing the use of MPE...
详细信息
ISBN:
(纸本)0780391950
Recent developments have given birth to the H.264/AVC, offering better bandwidth to video quality ratios than MPEG-2. It is expected that the H.264/AVC will take over the digital video market, replacing the use of MPEG-2 in most digital video applications. The complete migration to the new video-coding algorithm will take several years given the wide scale use of MPEG-2 in the market place today. In this paper, we study the capabilities of the intra-frame prediction, a part of a high-efficient H.264 encoder. We introduce the two different methods defined in the H.264 video coding standard for the intraframe prediction. Then carry out a performance evaluation of these methods in terms of their computational cost and rate-distortion results. Finally, we outline a simple method to reduce the computational cost when time is the critical factor.
This paper proposes a projective image registration algorithm, oriented to consumer devices. it exploits a "multi-resolution feature based method" for estimating the projective parameters through a 2D Daubec...
详细信息
ISBN:
(纸本)0819456586
This paper proposes a projective image registration algorithm, oriented to consumer devices. it exploits a "multi-resolution feature based method" for estimating the projective parameters through a 2D Daubechies Discrete Wavelet Transform (DWT). The algorithm has been fully tested with real image sequences acquired by CMOS sensors and compared to other registration techniques. The obtained results highlight the accuracy of the registration parameters.
Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acous...
详细信息
ISBN:
(纸本)0819456586
Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to videoconference. In the proposed method, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.
暂无评论