We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which...
详细信息
ISBN:
(纸本)0780362934
We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images.
In this paper, an innovative method of HEVC video pre-processing is proposed. The method applies a simple linear iterative clustering (SLIC), which adapts a k-means clustering to group pixels into perceptually meaning...
详细信息
ISBN:
(纸本)9781628415001
In this paper, an innovative method of HEVC video pre-processing is proposed. The method applies a simple linear iterative clustering (SLIC), which adapts a k-means clustering to group pixels into perceptually meaningful atomic regions of superpixels. By calculating the average of weighted average of luminance differences around each pixel in the superpixel, a suitable parameter of Gaussian filter for the superpixel is determined. Experimental results show that bit rate can be reduced up to 29% without loss in visual quality.
In this paper, we present a novel method for inverse filtering a two dimensional (2-D) signal using phase-based processing techniques. A 2-D sequence can be represented by a sufficient number of samples of the phase o...
详细信息
ISBN:
(纸本)9781628415001
In this paper, we present a novel method for inverse filtering a two dimensional (2-D) signal using phase-based processing techniques. A 2-D sequence can be represented by a sufficient number of samples of the phase of its Fourier transform and its region of support. This is exploited to perform deconvolution. We examine the effects of additive noise and incomplete knowledge of the point spread function on the performance of this deconvolution method and compare it with other 2-D deconvolution methods. The problem of finding the region of support will also be briefly addressed. Finally, an application example will be presented.
Currently available Personal video Recorders find and store whole TV programs. Our system, video Scouting, not only finds and stores programs;it automatically segments and indexes story segments from the programs acco...
详细信息
ISBN:
(纸本)0780370414
Currently available Personal video Recorders find and store whole TV programs. Our system, video Scouting, not only finds and stores programs;it automatically segments and indexes story segments from the programs according to viewers' profiles. The extracted descriptions serve the viewers' content information requests for program segment selection, e.g. play the three minute interview with Hillary Clinton. To achieve this, the system combines information from the audio, visual, and transcript domains in a probabilistic framework based on Bayesian networks. In this paper we describe the overall architecture, a system implementation, and discuss some experimental results.
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The...
详细信息
ISBN:
(纸本)9781628415001
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The initial affine transformation model will be achieved. In subsequent multiresolution octaves, we apply the transformation affine model getting from upper octave to current octave, then, combined with geometrical distribution of matched keypoints to further remove incorrect mappings and update affine transformation model. The strategy ends with the best affine transformation model on the bottom octave(full-size image). Experimental results show that the proposed method can achieve comparative accuracy with less computational than original SIFT.
Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various ...
详细信息
ISBN:
(纸本)0780377508
Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visualinformation for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found. based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear vi the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.
In stereo vision, depth information is one of the important parameters to understand the real world. One method for extracting such depth information is based on the geometry of stereo vision using two cameras displac...
详细信息
ISBN:
(纸本)0819424897
In stereo vision, depth information is one of the important parameters to understand the real world. One method for extracting such depth information is based on the geometry of stereo vision using two cameras displaced from each other by a baseline distance. In tills paper, we show an improved triangulation method based on stereo vision angles. We setup a stereo vision system which extracts the distance to the object by detecting moving objects using difference image and by obtaining depth information using the improved triangulation method. It has been implemented employing a TMS320C30 DSP board in a stereo vision system. As a result of experiment, the proposed vision system has the accuracy of 0.2mm in the range of 400mm.
This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be ...
详细信息
ISBN:
(纸本)0780362934
This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visualinformation is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming the two independent data streams. Recent work with multi-modal MSHMM's has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously, however this has been restricted to output fusion via single-stream HMM's. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification.
In this paper, we present the real-time implementation of image filtering for impulse and mixture of impulsive and multiplicative noise removal with detail preservation by means of use of DSP TMS320C6701. The filterin...
详细信息
ISBN:
(纸本)0819444863
In this paper, we present the real-time implementation of image filtering for impulse and mixture of impulsive and multiplicative noise removal with detail preservation by means of use of DSP TMS320C6701. The filtering scheme is given for two filters connected in cascade. In the first stage, we use the MM-KNN (Median M-type K-Nearest Neighbor) filter to provide detail preservation and impulsive noise rejection. The second stage is proposed to use an M filter to provide multiplicative noise suppression. We use different types of influence functions in the M-estimator to provide better noise suppression. Extensive simulation results demonstrate that the proposed filter consistently outperforms other filters by balancing the tradeoff between noise suppression and detail preservation.
In this paper, we present implementation of the robust RM-estimators with different influence functions such as the cut median (skipped median) function and Hampel function. We obtained that use of these functions in ...
详细信息
ISBN:
(纸本)0819440833
In this paper, we present implementation of the robust RM-estimators with different influence functions such as the cut median (skipped median) function and Hampel function. We obtained that use of these functions in the RM algorithms demonstrated better robustness in comparison with the simplest cut median function. Applications of these functions in filtering procedures provide the preservation of fine details, impulsive noise removal and suppression of the multiplicative noise. The implementation of the cut median and Hampel functions in the RM-KNN filter has shown that its use is a good tool for preservation of fine details and suppression of noise by means of use DSP TMS320C6701. The deterministic and statistical properties of the designed filters have been investigated and shown their effectiveness. The optimal values for parameters of these filters for different noise mixture are presented in this paper. Finally, DSP implementation has demonstrated that in the case of use the simplest cut median the time of processing is less than in the case of applications the cut median and Hampel functions, but noise suppression is better when cut median or Hampel functions were applied.
暂无评论