We propose a method that performs dense motion classification integrated with particle filter tracking for monitoring whether the viewer is involved in the screened content or not. We first perform the color based par...
详细信息
We propose a method that performs dense motion classification integrated with particle filter tracking for monitoring whether the viewer is involved in the screened content or not. We first perform the color based particle filtering that enables us tracking head of the user through the video sequence. It is followed by optical flow estimation via SIFT flow applied on the tracked regions. Finally the features extracted based on the viewer head rotation and location are fed into the random forest classifier to report the involvement level of the tracked person. It is shown that the used probabilistic motion estimation model with the support of tracking significantly reduces the computational complexity while it provides comparable performance with the state-of-the-art methods. The proposed scheme allows online monitoring the viewer therefore can be integrated to the interactive multimedia systems.
In this paper, we propose an adaptive time-frequency resolution based single channel sound source separation method using Non-negative Tensor Factorization (NTF). The model aims to alleviate drawbacks of working by fi...
详细信息
ISBN:
(纸本)9781479903573
In this paper, we propose an adaptive time-frequency resolution based single channel sound source separation method using Non-negative Tensor Factorization (NTF). The model aims to alleviate drawbacks of working by fixed length Short Time Fourier Transform (STFT) by minimizing the smearing of signal energy in both time and frequency. A joint optimization scheme has been applied based on KL-divergence where each layer of the tensor represents the mixture at a different resolution. In order to enclose sparseness into factorization, the resynthesis is made through an adaptive weighted fusion procedure which combines the separated sources in a manner that maximizes the energy concentration. Test results reported over a large sound database indicate the introduced NTF based fusion method improves the sound quality both in terms of conventional and perceptual distortion measures.
We propose a two-class classification scheme with a small number of features for sleepiness detection. Unlike the conventional methods that rely on the linguistics content of speech, we work with prosodic features ext...
详细信息
ISBN:
(纸本)9781479903573
We propose a two-class classification scheme with a small number of features for sleepiness detection. Unlike the conventional methods that rely on the linguistics content of speech, we work with prosodic features extracted by psychoacoustic masking in spectral and temporal domain. Our features also model the variations between non-sleepy and sleepy modes in a quasi-continuum space with the help of code words learned by a bag-of-features scheme. These improve the unweighted recall rates for unseen people and minimize the language dependence. Recall rates reported based on Karolinska Sleepiness Scale (KSS) for Support Vector Machine and Learning Vector Quantization classifiers show that the developed system enables us monitoring sleepiness efficiently with a lower complexity compared to the reported benchmarking results for Sleepy Language Corpus.
In this paper we describe the system designed by the ITU MSPR group.for content based video fingerprinting as applied to the TRECVID 2010 Content Based Copy Detection (CBCD) benchmark. This year focus of the system wa...
详细信息
In this paper we describe the system designed by the ITU MSPR group.for content based video fingerprinting as applied to the TRECVID 2010 Content Based Copy Detection (CBCD) benchmark. This year focus of the system was on integration of audio and video fingerprinting to improve the robustness to attacks. The proposed system consists of three main modules: Audio/video fingerprint extraction, audio/video search and retrieval, and audiovisual decision fusion. We propose a video feature extraction scheme based on the Nonnegative Matrix Factorization (NMF) which is an efficient dimension reduction technique in video processing. Video fingerprint generation module takes the factorization matrices generated by NMF as its input and converts them to binary hashes by differencial coding [1, 2]. For audio data we perform an audio fingerprinting method that is similar to the one proposed in [3]. Extracted audio and video hashes are indexed into a database. Searching module first applies a hash matching procedure to locate potential matching points both in audio and video. This is followed by decision fusion that eliminates false alarms and finalizes the matching and retrieval.
ITU MSPR group.participates the TREC Video Retrieval Evaluation (TRECVID) in Content Based Copy Detection (CBCD) task. The system proposed by ITU MSPR consists of two main modules: Extraction of video fingerprints and...
详细信息
ITU MSPR group.participates the TREC Video Retrieval Evaluation (TRECVID) in Content Based Copy Detection (CBCD) task. The system proposed by ITU MSPR consists of two main modules: Extraction of video fingerprints and search/retrieval. We propose a feature extraction scheme based on the Nonnegative Matrix Factorization(NMF)[1], which is an efficient dimension reduction technique in video processing[2]. Video fingerprint generation module takes the factorization matrices generated by NMF as its input and converts them to binary hashes by differencial coding. Extracted hashes are indexed into a database. Searching module first applies a hash matching procedure to locate potential matching points. It is followed by temporal merging that eliminates false alarms while combining subsegments. Initial results are promising for insertion of pattern, reencoding, blurring, change of gamma and noise addition. Future work will include impoving the current results and searching for robustness to geometric transformations such as shift, crop, flip and picture-in-picture.
We study the problem of representing images within a multimedia Database Management System (DBMS), in order to support fast retrieval operations without compromising storage efficiency. To achieve this goal, we propos...
We study the problem of representing images within a multimedia Database Management System (DBMS), in order to support fast retrieval operations without compromising storage efficiency. To achieve this goal, we propose new image coding techniques which combine a wavelet representation, embedded coding of the wavelet coefficients, and segmentation of image-domain regions in the wavelet domain. A bitstream is generated in which each image region is encoded independently of other regions, without having to explicitly store information describing the regions. Simulation results show that our proposed algorithms achieve coding performance which compares favorably, both perceptually and objectively, to that achieved using state-of-the-art image/video coding techniques while additionally providing region-based support.
暂无评论