Network camera, made possible by recent advances in the integration of sensing, compression, and communication hardware, is a new video source that can be easily deployed and remotely managed. Unobtrusively located al...
详细信息
Network camera, made possible by recent advances in the integration of sensing, compression, and communication hardware, is a new video source that can be easily deployed and remotely managed. Unobtrusively located along highways, at airports, or in office buildings, such cameras can form a visual sensor network, or camera web, an extremely rich source of visual information. In its infancy today, camera web deployment will likely accelerate in the future and one can expect visual sensing devices to eventually become as ubiquitous as electric bulbs. While the capturing hardware has evolved tremendously, hardware and algorithms necessary for effective analysis and efficient communication of multi-camera data clearly lag. In this article, I overview one particular aspect of visual data analysis, namely, space-time video segmentation that is often a pre-requisite for motion estimation, video compression, event detection, scene understanding, etc. I introduce the concept of object tunnel, a 3-D surface in space-time through which a video object travels, and the associated concept of occlusion volume. I present examples of object tunnels and occlusion volumes on surveillance data that, upon further processing, may lead to automatic event detection or scene understanding. Finally, I describe challenges in extending video analysis algorithms to visual sensor networks, and I outline some possible approaches.
In this work, we develop a new data representation framework, called constrained relaxation for image compression. Our basic observation is that an image is not a random 2-D array of pixels. They have to satisfy a set...
详细信息
In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multi...
详细信息
In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference subspaces of different tasks are compared. Experiments are conducted on the identification of game tasks. The experimental results show that the proposed method with multimodal information outperforms the method in which only the single modality of image or spoken dialogue is applied. The proposed method achieves accurate performance even if less spoken dialogue is applied.
In this paper, the authors propose a new procedure for copyright protection by using a bio-inspired wavelet based data hiding approach. The proposed method takes advantage of Human visual System (HVS) characteristics ...
详细信息
ISBN:
(纸本)9781424412358
In this paper, the authors propose a new procedure for copyright protection by using a bio-inspired wavelet based data hiding approach. The proposed method takes advantage of Human visual System (HVS) characteristics to provide better watermarked image quality. It also exploits visual Secret Sharing (VSS) technique to guarantee the security of the procedure. Performance improvement with respect to the existing algorithms is obtained by Particle Swarm Optimization (PSO).
Three-dimensional (t+2D) wavelet coding schemes have been demonstrated to be efficient techniques for video compression applications. However, the separable wavelet transform used for removing the spatial redundancy a...
详细信息
ISBN:
(纸本)9780819466211
Three-dimensional (t+2D) wavelet coding schemes have been demonstrated to be efficient techniques for video compression applications. However, the separable wavelet transform used for removing the spatial redundancy allows a limited representation of the 2D texture because of spatial isotropy of the wavelet basis functions. In this case, anisotropic transforms, such as fully separable wavelet transforms (FSWT), can represent a solution for spatial decorrelation. FSWT inherits the separability, the computational simplicity and the filter bank characteristics of the standard 2D wavelet transform, but it improves the representation of directional textures, as the ones which can be found in temporal detail frames of t + 2D decompositions. The extension of both classical wavelet and wavelet-packet transforms to fully separable decompositions preserve at the same time the low-complexity and best-bases selection algorithms of these ones. We apply these transforms in t + 2D video coding schemes and compare them with classical decompositions.
In packet switched networks such as the Internet, packets may get lost during transmission due to, e.g., network congestion. This leads to a quality degradation of the original signal. As video communication is a band...
详细信息
ISBN:
(纸本)9780819466211
In packet switched networks such as the Internet, packets may get lost during transmission due to, e.g., network congestion. This leads to a quality degradation of the original signal. As video communication is a bandwidth consuming application, the original data are first compressed. This compression step increases the impact of information loss even more. In wavelet based image and video coding, the low frequency data is the most important. Loss of low frequency coefficients results in annoying black holes in the received images and video. This effect can be countered by post processing error concealment: a lost coefficient is estimated from its neighboring coefficients. In this paper we present a locally adaptive interpolation method for the lost low frequency coefficients. For each lost low frequency coefficient, we estimate the optimal interpolation direction (horizontal or vertical) using novel error measures. In this way, we preserve the edges in the reconstructed image much better. Compared to older techniques of similar complexity, our scheme reconstructs images with the same or better quality. This is reflected in the visual as well as in the numerical results: there is an increase of up to 4.4 dB compared to bilinear concealment. The proposed scheme is fast and simple, which makes it suitable for real-time applications.
A new support tool using object tracking and motion based segmentation is developed for machine learning and pattern recognition. In the learning step, an object of interest is tracked while learning is performed from...
详细信息
ISBN:
(纸本)9780819466211
A new support tool using object tracking and motion based segmentation is developed for machine learning and pattern recognition. In the learning step, an object of interest is tracked while learning is performed from segmented frames. In the recognition step, target is tracked until favorable conditions allow identification. This tool is used in the, context of the Aqu@theque project which includes an automatic fish recognition system. Tracking is a difficult task especially in case of real world images. Particle filtering methods incorporating motion based segmentation measurement in importance sampling step improve performance.
In the paper is presented new method for efficient compression of compound still images, containing pictures and texts/graphics. The method is based on the Inverse Difference Pyramid (IDP) image decomposition and loss...
详细信息
ISBN:
(纸本)9789612480363
In the paper is presented new method for efficient compression of compound still images, containing pictures and texts/graphics. The method is based on the Inverse Difference Pyramid (IDP) image decomposition and lossless coding of the obtained data. The method permits the recognition of texts and graphics in compound images, the setting of corresponding regions of interest (ROI) and their coding with the most efficient tools. The method ensures easy access and transfer of visual information via Internet aimed at distance learning applications.
We consider the problem of recovering a high-resolution (HR) frame from a sequence of low-resolution (LR) frames. It is challenging to design a super-resolution (SR) algorithm for arbitrary video sequences. Video fram...
详细信息
ISBN:
(纸本)9780819466211
We consider the problem of recovering a high-resolution (HR) frame from a sequence of low-resolution (LR) frames. It is challenging to design a super-resolution (SR) algorithm for arbitrary video sequences. Video frames in general cannot be related through global parametric transformation due to the arbitrary individual pixel movement between frame pairs. Hence a local motion model needs to be used for frame alignment. An accurate alignment is the key to success of reconstruction-based super-resolution algorithms. Motivated by this challenge we propose to employ region-matching technique for image registration in this paper. The proposed algorithm consists of the alignment step to produce a blurred version of the HR frame and the restoration step to estimate the HR frame. The experimental results of the proposed algorithm are compared with the results of using affine, block matching, and optical flow motion models. It is shown that the use of region matching for SR is very promising in producing higher quality images.
This paper describes an attempt to correlate lip movement visual information acquired via a camera with speech audio information acquired via a microphone from a human speaker in order to prevent audio created by exte...
详细信息
ISBN:
(纸本)9781424411894
This paper describes an attempt to correlate lip movement visual information acquired via a camera with speech audio information acquired via a microphone from a human speaker in order to prevent audio created by external noise from being misrecognized as speech emitted by said speaker. images of the face of a human speaker are acquired via a PC camera and are then separated into images that indicate lip movement and images that do not indicate lip movement. The data of lip movement image signals is saved in shared memory and shared with the speech recognition process. This data is analyzed by the speech activity detection process, which is a pre-processing step of sound recognition. We combined a speech recognition processor and an image recognizer, and the interworking function successfully operated at the rate of 99.3%.
暂无评论