Voice activity detection systems attempt to discriminate between voice and other ambient sounds. Most systems use a single microphone approach and rely on training prior to employment. The performance of these systems...
详细信息
Voice activity detection systems attempt to discriminate between voice and other ambient sounds. Most systems use a single microphone approach and rely on training prior to employment. The performance of these systems relies heavily on reverberation and noise levels. In this paper we present an unsupervised voice activity detection system that uses pairs of microphones to discern between a coherent acoustic source and spatially diffuse noise of low coherence. Measurement of coherency is performed using an information theoretic metric that integrates means to filter out more effectively the effect of reverberation and noise. Using extensive experiments, the performance of the system is investigated. Based on the conditions imposed by the experimental environments it is shown that the proposed system remains more robust than its counterparts in all cases.
Voice activity detection systems attempt to replicate one of the basic operations of the human auditory system, namely, discrimination between voice and other ambient sounds. Most systems that work toward the solution...
详细信息
Voice activity detection systems attempt to replicate one of the basic operations of the human auditory system, namely, discrimination between voice and other ambient sounds. Most systems that work toward the solution of this problem use a single-microphone approach and rely on extensive training prior to its use. The performance of these systems relies heavily on reverberation and noise levels. A voice activity detection system is presented, which uses pairs of microphones to discern between a coherent acoustic source and spatially diffuse noise of low coherence. Measurement of coherency is performed using an information-theoretic metric that integrates a way to filter out more effectively the effect of reverberation and noise. Using extensive computer simulations, the effect of physical parameters, such as the relative positions of source and receivers, as well as effects of different design parameters are investigated. The optimal parameters are then used to examine the performance of the system in real-world experiments. Based on the conditions imposed by these parameters it is shown that the proposed system remains more robust than the reference systems in a variety of conditions.
A novel stochastic gradient algorithm for finite impulse response (FIR) adaptive filters, termed the least sum of exponentials (LSE), is introduced. In order to provide a generalisation of the class of weighted mixed ...
详细信息
A novel stochastic gradient algorithm for finite impulse response (FIR) adaptive filters, termed the least sum of exponentials (LSE), is introduced. In order to provide a generalisation of the class of weighted mixed norm algorithms and at the same time avoid problems associated with a large number of free paramaters of such algorithms, LSE is derived by minimising a sum of error exponentials. A rigourous mathematical analysis is provided, resulting in closed form expressions for the optimal weights and the upper bound of the learning rate. The analysis is supported by simulations in a system identification setting.
In this paper, we address face tracking of multiple people in complex 3D scenes, using multiple calibrated and synchronized far-field recordings. We localize faces in every camera view and associate them across the di...
详细信息
In this paper, we address face tracking of multiple people in complex 3D scenes, using multiple calibrated and synchronized far-field recordings. We localize faces in every camera view and associate them across the different views. To cope with the complexity of 2D face localization introduced by the multitude of people and unconstrained face poses, a combination of stochastic and deterministic trackers, detectors and a Gaussian mixture model for face validation are utilized. Then faces of the same person seen from the different cameras are associated by first finding all possible associations and then choosing the best option by means of a 3D stochastic tracker. The performance of the proposed system is evaluated and is found enhanced compared to existing systems.
This paper proposes a robust background estimator for fixed cameras, to be used for foreground segmentation in tracking systems. The estimator is based on a variation of Stauffer's dynamic background algorithm, wh...
详细信息
This paper proposes a robust background estimator for fixed cameras, to be used for foreground segmentation in tracking systems. The estimator is based on a variation of Stauffer's dynamic background algorithm, where the background learning rate is spatiotemporally adapted. The adaptation is based on the position, size and velocity of the various foreground objects already detected. The evidence for the initialization and tracking of the foreground objects is obtained by combining a pixel map showing the temporal persistence of each image pixel and the edge binary image. The spatiotemporal adaptation of the learning rate overcomes the problem of fading immobile or slowly moving objects into the background encountered in all to-date variations of Stauffer's algorithm, while the combination with edge information allows for objects already present in the scene at startup time and new objects to be treated by the same image processing module
We present a system that estimates the direction of arrival of two competing acoustic sources using two closely spaced receivers that form a differential microphone array. The main advantage of the proposed array topo...
详细信息
ISBN:
(纸本)9781604234497
We present a system that estimates the direction of arrival of two competing acoustic sources using two closely spaced receivers that form a differential microphone array. The main advantage of the proposed array topology is that null steering can be essentially performed by adapting a set of two scalars. The direction of arrival estimation relies on the successful estimation of the relative delays between the microphone signals using the decorrelation constraint. Processing is performed in real-time by operating on blocks of recorded data. We examine the performance of the system for different block sizes and investigate its robustness in environments of strong multipath reflections where algorithms often fail to distinguish between the true direction of arrival and that of a dominant reflection. The overall performance of the system is compared to the simple omni-directional array topology. The results indicate that the examined framework can track the two directions of arrival adequately.
This paper describes the video-based face recognition evaluation performed under the CHIL project and the systems that participated to it, along with the obtained first year results. The evaluation methodology compris...
详细信息
This paper describes the video-based face recognition evaluation performed under the CHIL project and the systems that participated to it, along with the obtained first year results. The evaluation methodology comprises a specially built database of videos and an evaluation protocol. Two complete automatic face detection and recognition systems from two academic institutions participated to the evaluation. For comparison purposes, a baseline system is also developed using well-known methods for face detection and recognition
暂无评论