This paper deals with video coding of static scenes viewed by a moving camera. We propose an automatic way to encode such video sequences using several 3D models. Contrary to prior art in model-based coding where 3D m...
详细信息
This paper deals with video coding of static scenes viewed by a moving camera. We propose an automatic way to encode such video sequences using several 3D models. Contrary to prior art in model-based coding where 3D models have to be known, the 3D models are automatically computed from the original video sequence. We show that several independent 3D models provide the same functionalities as one single 3D model, and avoid some drawbacks of the previous approaches. To achieve this goal we propose a novel algorithm of sliding adjustment, which ensures consistency of successive 3D models. The paper presents a method to automatically extract the set of 3D models and associate camera positions. The obtained representation can be used for reconstructing the original sequence, or virtual ones. It also enables 3D functionalities such as synthetic object insertion, lightning modification, or stereoscopic visualization. Results on real video sequences are presented.
In this paper, we present two recursive methods for the real-time estimation of long-term three-dimensional (3-D) motion parameters from monocular image sequences suitable for synthetic/natural hybrid coding face anim...
详细信息
In this paper, we present two recursive methods for the real-time estimation of long-term three-dimensional (3-D) motion parameters from monocular image sequences suitable for synthetic/natural hybrid coding face animation and model-based coding applications. based on feature point extractions in every frame, the 3-D motion parameters of a human face are estimated with a predictive approach,The first method uses a recursive linear least squares approach and the second employs a nonlinear extended Kalman filter, which does not rely on a linearized model of the face motion. Both methods perform a prediction and correction loop at every time step. Compared to other methods described in the literature, the recursive and predictive structure of the proposed estimation process solves the problem of error accumulation in long-term motion estimation, This makes the estimation stable and consistent over long periods. Experimental results are presented for synthetic data and real image sequences, which demonstrate the performance of the estimation methods and compare the two approaches.
In this paper, input from an active camera is used for MPEG4 modelbasedcoding. First, the background is compensated considering a moving camera (tilt or pan). Second, the talking face is segmented from the compensat...
详细信息
In this paper, input from an active camera is used for MPEG4 modelbasedcoding. First, the background is compensated considering a moving camera (tilt or pan). Second, the talking face is segmented from the compensated background using frame differences fusion. A morphological filter is then applied to make the system less sensitive to noise. Third, Hough Transform and deformable template coupled with color information are exploited to detect the facial features, e.g., eyes, mouth. Fourth, a wireframe model is adapted to the extracted face. The feasibility of the proposed system is demonstrated using a real active video sequence. (C) 1999 Elsevier Science B.V. All rights reserved.
model-based video coding has been adopted as a core experiment in ISO MPEG-4 standard. The clip-and-paste technique for putting video objects in line is an important tool to reduce the transmission rate. To assist the...
详细信息
model-based video coding has been adopted as a core experiment in ISO MPEG-4 standard. The clip-and-paste technique for putting video objects in line is an important tool to reduce the transmission rate. To assist the clip-and-pasting method fitting into the 2-D model, we propose several smoothing algorithms for improving the quality of the reconstructed images. In this paper, the proposed smoothing algorithms can adjust deformations of zoom, tilt, and rotation object images. Luminance smoothing algorithm is also applied to compensate the light source variations. Simulation results show that the smoothing methods help to improve clip-and-paste images to achieve a satisfactory quality in visual perceptions.
Few networks offer sufficient bandwidth for the transmission of high resolution two- and three-dimensional medical image sets without incurring significant latency. Traditional compression methods achieve bit-rate red...
详细信息
Few networks offer sufficient bandwidth for the transmission of high resolution two- and three-dimensional medical image sets without incurring significant latency. Traditional compression methods achieve bit-rate reduction based on pixel statistics and ignore visual cues that are important in identifying visually informative regions. This paper describes an approach to managing image transmission in which spatial regions are selected and prioritized for transmission so that visually informative data is received in a timely manner. This context-based image transmission (CBIT) scheme is a lossless form of progressive image transmission (PIT) in which gross structure, represented by an approximate iconic image, is transmitted first. Each part of this iconic image is progressively updated, using a simple set of rules that take into account viewing requirements. CBIT is realized using knowledge about image composition to segment, label, prioritize, and fit geometric models to regions of an image. Tests, using neurological images, show that, with CBIT, a valuable transmitted image is received with a latency that is about one-tenth that of traditional PIT schemes. Frequently, the necessary regions of the image are transmitted in about half the time taken to transmit the full image.
In this paper, we propose the use of modified Hough transforms to efficiently extract object feature parameters, which are usually contaminated by heavily noisy corrugation and discontinuity. The modified HT (MHT) is ...
详细信息
In this paper, we propose the use of modified Hough transforms to efficiently extract object feature parameters, which are usually contaminated by heavily noisy corrugation and discontinuity. The modified HT (MHT) is developed by introducing spatial and parameter weighting functions to improve the detection performance for the traditional Hough transform (HT), which generally fails to robustly detect natural object parameters. Using designed test patterns and real images, simulations show that the proposed weighting functions are helpful in detecting noise-corrupted object features. Due to its robustness, the MHT can be easily figured with a coarse-re-fine adaptive search mechanism to reduce the huge amount of computation for feature parameters extraction.
Automatic wire-frame fitting and automatic wire-frame tracking are the two most important and most difficult issues associated with semantic-based moving image coding. A novel approach to high-speed tracking of import...
详细信息
Automatic wire-frame fitting and automatic wire-frame tracking are the two most important and most difficult issues associated with semantic-based moving image coding. A novel approach to high-speed tracking of important facial features is presented as a part of a complete fitting-tracking system we have developed. The method allows real-time processing of head-and-shoulders sequences using software tools only. The algorithm is based on eigenvalue decomposition of the sub-images extracted from subsequent frames of the Video sequence. Since each facial feature (the left eye, the right eye, the nose and the lips) is tracked separately, the algorithm can be easily adapted for a parellel machine. The algorithm was tested on numerous widely used head-and-shoulders video sequences containing speaker's head pan, rotation and zoom with remarkably good results. The experiments we have carried out prove that it is possible to maintain tracking even when the facial features are partially occluded. (C) 2000 Elsevier Science B.V. All rights reserved.
We show that traditional waveform coding and 3-D model-based coding are not competing alternatives, but should be combined to support and complement each other. Both approaches are combined such that the generality of...
详细信息
We show that traditional waveform coding and 3-D model-based coding are not competing alternatives, but should be combined to support and complement each other. Both approaches are combined such that the generality of waveform coding and the efficiency of 3-D model-based coding are available where needed. The combination is achieved by providing the block-based video coder with a second reference frame for prediction, which is synthesized by the model-based coder. The model-based coder uses a parameterized 3-D head model, specifying shape and color of a person. We therefore restrict our investigations to typical videotelephony scenarios that show head-and-shoulder scenes. Motion and deformation of the 3-D head model constitute facial expressions which are represented by facial animation parameters (FAP's) based on the MPEG-4 standard. An intensity gradient based approach that exploits the 3-D model information is used to estimate the FAP's, as well as illumination parameters, that describe changes of the brightness in the scene. model failures and objects that are not known at the decoder are handled by standard block-based motion-compensated prediction, which is not restricted to a special scene content, but results in lower coding efficiency. A Lagrangian approach is employed to determine the most efficient prediction for each block from either the synthesized model frame or the previous decoded frame. Experiments on five video sequences show that bit-rate savings of about 35% are achieved at equal average peak signal-to-noise ratio (PSNR) when comparing the model-aided codec to TMN-10, the state-of-the-art test model of the H.263 standard. This corresponds to a gain of 2-3 dB in PSNR when encoding at the same average bit rate.
Recently a new type of video coding method called model-based image coding has attracted much attention as a potential candidate for low bit-rate visual communication services, This technique reconstructs the facial i...
详细信息
Recently a new type of video coding method called model-based image coding has attracted much attention as a potential candidate for low bit-rate visual communication services, This technique reconstructs the facial image with a preknown three-dimensional (3-D) human face model and its received model motion parameters, The parameters of the head motion are mainly divided into two parts: global motion parameters describe the rigid movement of the head, such as rotation and translation, and local motion parameters which deal with the nonrigid movements of facial expressions, such as the opening and closing of the mouth and eyes. In this paper, we propose a new approach which can estimate the head global motion more robustly and accurately, Comparing with the existing techniques to match only a few key points, here we extract 3-D contour feature points and use chamfer distance matching to estimate head global motion, This can improve and enhance the contour tracking performance greatly. We also develop another technique called facial normalization transform, It maps the facial region of the current input frame back to the normalized pose of the initial frame, Using this transform, we can analyze facial expressions at the same orientation and fixed region, This simplifies the analysis work a lot, Then, we do our encoding by the clip-and-paste method along with adaptive codebook technique. In the following, the coder and decoder system are briefly described, Since we mainly focus the work on the analysis and synthesis of the facial portion images, background analysis and bitstream coding technique will not be discussed in this paper.
This paper describes a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence. The proposed scheme starts with a hierarchical dynamic programming technique for m...
详细信息
This paper describes a procedure for model-based analysis and coding of both left and right channels of a stereoscopic image sequence. The proposed scheme starts with a hierarchical dynamic programming technique for matching across the epipolar line for efficient disparity/depth estimation. Foreground/background segmentation is initially based on depth estimation and is improved using motion and luminance information. The model is initialised by the adaptation of a wireframe model to the consistent depth information. Robust classification techniques are then used to obtain an articulated description of the foreground of the scene (head, neck, shoulders). The object articulation procedure is based on a novel scheme for the segmentation of the rigid 3D motion fields of the triangle patches of the 3D model object. Spatial neighbourhood constraints are used to improve the reliability of the original triangle motion estimation. The motion estimation and motion field segmentation procedures are repeated iteratively until a satisfactory object articulation emerges. The rigid 3D motion is then re-computed for each sub-object and finally, a novel technique is used to estimate flexible motion of the nodes of the wireframe from the rigid 3D motion vectors computed for the wireframe triangles containing each specific node. The performance of the resulting analysis and compression method is evaluated experimentally. (C) 1999 Elsevier Science B.V. All rights reserved.
暂无评论