In this paper, we propose a new blind image watermarking method in discrete cosine transform (DCT) domain, which is widely used in compression applications and consequently in digital distribution networks. Four water...
详细信息
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most...
详细信息
ISBN:
(纸本)0780374029
Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the perfon-nance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.
Recent advances in areas of both the Discrete Wavelet Transforms (DWT) [1,2,3], and the Continuous Wavelet Transforms (CWT) representing human visual system neural network [4,5] have resulted in improved video compres...
详细信息
ISBN:
(纸本)0780372786
Recent advances in areas of both the Discrete Wavelet Transforms (DWT) [1,2,3], and the Continuous Wavelet Transforms (CWT) representing human visual system neural network [4,5] have resulted in improved video compression, restoration, and filtering techniques. These software techniques are capable of achieving quality performance in video, the computational complexity requires a special design hardware called WaveNet [6] to run a real time live video through radio. The brassboard integrated with computers can potentially provide us many applications including remote sensors, security systems, commercial and home video teleconferencing. This paper describes a low cost board to support a video compression, restoration, and filter system in real time processing. The WaveNet board has been optimized for wavelet-based image and video compression and enhancement techniques [7,8].
This paper describes a recently developed remote monitoring system, based on a combination of embedded computing, digital signal processing, wireless communications, GPS, and trunking system technologies. The system i...
详细信息
ISBN:
(纸本)0780375106
This paper describes a recently developed remote monitoring system, based on a combination of embedded computing, digital signal processing, wireless communications, GPS, and trunking system technologies. The system includes mini camera, portable mobile trunking radio installed inside each monitored vehicle and a central station located in police office. The trunking system we used is 800MHz Motorola's TETRA system.
In this paper we present initial work towards a video-realistic visual speech synthesiser based on statistical models of shape and appearance. A synthesised image sequence corresponding to an utterance is formed by co...
详细信息
ISBN:
(纸本)0780374029
In this paper we present initial work towards a video-realistic visual speech synthesiser based on statistical models of shape and appearance. A synthesised image sequence corresponding to an utterance is formed by concatenation of synthesis units (in this case phonemes) from a pre-recorded corpus of training data. A smoothing spline is applied to the concatenated parameters to ensure smooth transitions between frames and the resultant parameters applied to the model - early results look promising.
Many techniques, both conventional and morphological, have been proposed in the literature for the segmentation of images. Morphological image segmentation methods, particularly those using a watershed algorithm, have...
详细信息
This paper aims to device an architecture which uses capability of asynchronous concurrency of the data flow architecture as well as spatial parallelism of SIMD machines for a class of imageprocessing applications us...
详细信息
ISBN:
(纸本)5742202601
This paper aims to device an architecture which uses capability of asynchronous concurrency of the data flow architecture as well as spatial parallelism of SIMD machines for a class of imageprocessing applications using reconfigurable processing elements (RPEs). Overall processing speed is enhanced by a) concurrent functioning of the RPEs and b) replacing software execution of signal processing functions by hardware approach using FPGAs as RPEs. Thus, a hybrid architecture, which functions as a data flow machine at a functional level and exploits the capability of handling spatial parallelism by incorporating a modified SIMD concepts is presented.
An adaptive regularity scalable wavelet image coding algorithm is proposed in this paper. The bitstream is generated in the order of regularity such that visually more important components of a decoded image can be ob...
详细信息
An adaptive regularity scalable wavelet image coding algorithm is proposed in this paper. The bitstream is generated in the order of regularity such that visually more important components of a decoded image can be obtained first, followed by the higher regularity components as more bits are received and decoded. The scalability is achieved by selecting different extents of wavelet coefficients at various regularity levels. These regularity levels are determined adaptively. Regularity of the image is estimated from the interscale ratios of the separable wavelet transform magnitude sums. Compared to the wavelet image coder generating resolution scalable bitstreams, significant improvement in visual quality and PSNR can be achieved.
Facial feature detection plays an important role in applications such as human computer interaction, video surveillance, face detection and face recognition. We propose a facial feature detection algorithm for all typ...
详细信息
ISBN:
(纸本)0780375084
Facial feature detection plays an important role in applications such as human computer interaction, video surveillance, face detection and face recognition. We propose a facial feature detection algorithm for all types of face images in the presence of several image conditions. There are two main step: the facial feature extraction from original face image, and the coverage of the features by rectangular blocks. A neural visual model(NVM) is used to recognize all possibilities of facial feature positions for the first step. Input parameters are obtained from the face characteristics and the positions of facial features not including any intensity information. For the better results, some incorrect decisions of facial feature positions are improved by imageprocessing technique called dilation. Our algorithm is successfully tested with various types of faces which are color images, gray images, binary images, wearing the sunglasses, wearing the scarf, lighting effect, noise and blurring images, color and sketch images from animated cartoon.
Measures of image quality are presented here that have been developed to assess both the immediate quality of an image and the potential at intermediate points in an imaging chain for enhanced image quality. The origi...
详细信息
ISBN:
(纸本)0819444863
Measures of image quality are presented here that have been developed to assess both the immediate quality of an image and the potential at intermediate points in an imaging chain for enhanced image quality. The original intent of the metric(s) was to provide an optimand for interpolator design, and the metrics have subsequently been used for a number of differential image quality analyses and imaging system component designs. The metrics presented are of the same general form as the National imagery Interpretability Rating Scale (NIIRS1), representing quality as the base-2 logarithm of linear resolution, so that one unit of differential quality represents a doubling or halving of the resolution of imagery. Analysis of a simple imaging chain is presented in terms of the metrics, with conclusions regarding interpolator design, consistency of the latent and apparent image quality metrics, and the relationship between interpolator and convolution kernel design in a system where both are present. Among the principal results are an optimized division of labor between interpolators and Modulation Transfer Function Correction (MTFC filters, consistency of the analytical latent and apparent image quality metrics with each other and with visually optimized aim curves, and an introduction to sharpening interpolator design methodology.
暂无评论