This paper investigates several multiple overlapping transform based image coders. For better image representation, decorrelation and energy compaction the coders are based on new and powerful classes of multiple over...
详细信息
This paper investigates several multiple overlapping transform based image coders. For better image representation, decorrelation and energy compaction the coders are based on new and powerful classes of multiple overlapping localized trigonometric bases (LCB-x, x>2). The compression results outperform the wavelet coders SPIHT [A. Said et al., June 1996] and JPEG2000 [S. Trautmann] for various test images in both objective and subjective coding performance. In the last years, much of the research activities in image coding have been focused on the discrete wavelet transform (DWT), which avoids blocking artifacts. Several new classes of multiple overlapping localized cosine bases [K. Bittner, 2002, A. Mali et al., 2003] can also eliminate completely blocking artifacts. Additionally unlike the wavelet transform, textures and oscillating patterns in a lot number of images are well preserved using trigonometric bases [F.G. Meyer, June 2002]. This provides significant improvements in reconstructed image quality over the discrete cosine transform and the discrete wavelet transform. First we recall the main results on multiple overlapping trigonometric bases. To solve the problem of infinite dual trigonometric bases, various sequences of new window functions for finite signals are introduced. Then, with aid of the generalized unfolding operator we propose fast algorithms for signal analysis and synthesis. Finally, we apply the transforms for image compression.
We present a new robust algorithm for reconstructing images into a linear subspace using MAP estimation. The algorithm takes into account the a priori distribution of the subspace variables and the noise is robustly m...
详细信息
We present a new robust algorithm for reconstructing images into a linear subspace using MAP estimation. The algorithm takes into account the a priori distribution of the subspace variables and the noise is robustly modeled to allow for occlusions. The subspace distribution is estimated using nonparametric density estimation techniques. An efficient optimization scheme based on the mean shift procedure D Comaniciu et al. (2002) and on half-quadratic theory [ D Geman et al. (1992), P Charbonnier et al. (1997)] is developed, making optimization of the MAP function feasible for high-dimensional images. Preliminary results on real images demonstrate the contribution of a priori distribution modeling of sub-space variables, with respect to standard reconstruction methods over linear subspaces.
As traditional shot segmentation may not produce video segments that possess one-to-one correspondence to semantic views, we present an integrated segmentation and classification approach to label soccer video into se...
详细信息
ISBN:
(纸本)0780381858
As traditional shot segmentation may not produce video segments that possess one-to-one correspondence to semantic views, we present an integrated segmentation and classification approach to label soccer video into semantic units in this paper. In our system, each P frame is divided to a 6 /spl times/ 4 blocks with color and motion features extracted on both block and frame levels. First, a threshold is used to divide the video stream into relatively static parts and active parts. Then every active part is segmented into sub-parts according to 4 view types and the motion features are used to classify segments with support vector machines. Finally, static parts are merged with classified active sub-parts to form labeled segments. Four 10-minute test clips from the World Cup 2002 are used to evaluate our system resulting in a promising classification rate of 79.8%.
We study the spatialization of the sound field in a room, in particular the evolution of room impulse responses as a function of their spatial positions. The presented technique allows us to characterize completely th...
详细信息
We study the spatialization of the sound field in a room, in particular the evolution of room impulse responses as a function of their spatial positions. The presented technique allows us to characterize completely the sound field in any arbitrary location if the sound field is known in a certain finite number of positions. Existing techniques usually make use of room models to recreate the sound field present at some point in the space. Our technique simply starts from the measurements of impulse responses at a finite number of positions, and, with this information, the total sound field can be recreated. An analytical solution of the problem is given for different cases of spaces. Further, we determine the number of microphones and the spacing between them needed to reconstruct the sound field perfectly up to a certain temporal frequency. The optimal sampling pattern for the microphone positions is given. Applications are also discussed.
A new sampling scheme for extending the dynamic range of a high-sensitivity complementary metal oxide semiconductor (CMOS) image sensor is proposed. The photodiode-type CMOS pixel imager, consisting of a transfer gate...
详细信息
A new sampling scheme for extending the dynamic range of a high-sensitivity complementary metal oxide semiconductor (CMOS) image sensor is proposed. The photodiode-type CMOS pixel imager, consisting of a transfer gate transistor for transporting photoelectrons from a sensing region to a readout region, is highly sensitive but suffers form a poor dynamic range. The new sampling scheme and a ratiometric processing circuit can improve the dynamic range of the pixel imager by about 24 dB by making use of the photo response of the readout node. Optimization of the sensing region area/readout region area can be applied to specific applications such as low-light imaging and high-speed imaging applications. Experimental results indicate that this pixel can maintain high sensitivity at low illumination while obtaining an extended dynamic range, about 10 dB higher than that of the conventional 3-transistor (3-T) photodiode-type CMOS pixel imager.
The proceedings contains 69 papers from the conference on visual communications and image processing 2002. The topics discussed include: image segmentation;video sequences;virtual blue screens;visual communication;edg...
详细信息
The proceedings contains 69 papers from the conference on visual communications and image processing 2002. The topics discussed include: image segmentation;video sequences;virtual blue screens;visual communication;edge-based predictions;wavelet filtering;video processing;video coders;context-based denoising;sharpness enhancement;feature extraction;video coded bitstreams;optical flows;motion estimation;wavelet transforms and computer simulation.
The proceedings contained 54 topics from the Proceedings of SPIE: visualcommunications and imageprocessing2002. The topics discussed included: efficient temporal error concealment based on motion estimation of enla...
详细信息
The proceedings contained 54 topics from the Proceedings of SPIE: visualcommunications and imageprocessing2002. The topics discussed included: efficient temporal error concealment based on motion estimation of enlarged block size;enhanced service differentiation for layered video multicast in differentiated service networks;robust video transmission using adaptive bit allocation and face classification using curvature-based multiscale morphology.
This paper outlines a generalized image reconstruction approach to improve the resolution of an Electro-Optic (EO) imaging sensor using multiple frames of an image sequence. This method only assumes the constituent vi...
详细信息
ISBN:
(纸本)0819444111
This paper outlines a generalized image reconstruction approach to improve the resolution of an Electro-Optic (EO) imaging sensor using multiple frames of an image sequence. This method only assumes the constituent video has some ambient motion between the sensor and stationary background, and the optical image is physically captured by a staring focal plane array.
The application of Human perceptual models in image and video coding is motivated by the fact that non-perceptual distortion metrics (mean square error) do not correlate well with the perceived quality at lower bit-ra...
详细信息
ISBN:
(纸本)0819444111
The application of Human perceptual models in image and video coding is motivated by the fact that non-perceptual distortion metrics (mean square error) do not correlate well with the perceived quality at lower bit-rates despite their acceptable signal to noise ratio. In this paper, we propose a novel approach for indexing the visual content of images based on human perceptual thresholds employed for encoding. In other words, the thresholds that are employed in perceptual coding also serve as an index. These thresholds depend on the overall luminance, frequency/orientation, and the variety of patterns in an image and can serve as indexing features. These features therefore have the potential to retrieve perceptually similar images in response to a query image. Detailed simulations have been carried out using the proposed indexing concept in the DCT compressed domain. Here, the indices have been computed using the DCTune coding technique, which has been shown to provide a superior visual quality in encoding images. Simulation results demonstrate that superior retrieval performance can be achieved for specific classes of images while comparable performance is obtained for other image classes.
暂无评论