We propose a tracking adaptation to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. the hypothesized word sequences of an automatic sign language recogn...
详细信息
RWTH's system for the 2008 IWSLT evaluation consists of a combination of different phrase-based and hierarchical statistical machine translation systems. We participated in the translation tasks for the Chinese-to...
详细信息
We present a method to fully automatically fit videos in 16:9 format on 4:3 screens and vice versa. It can be applied to arbitrary aspect ratios and can be used to make videos suitable for mobile viewing devices with ...
详细信息
We present a method to fully automatically fit videos in 16:9 format on 4:3 screens and vice versa. It can be applied to arbitrary aspect ratios and can be used to make videos suitable for mobile viewing devices with small and possibly uncommonly sized displays. The cropping sequence is optimised over time to create smooth transitions and thus leads to an excellent viewing experience. Current televisions have simple and often disturbing methods which either show the centre region of the image, distort the image, or pad it with black borders. The technique presented here can fully automatically find the "right" viewing area for each image in a video sequence. It works in real-time with only very little time-shift. We employ different low-level features and a log-linear model to learn how to find the right area. The method is able to automatically decide whether padding with black borders is necessary or whether all relevant image areas fit on screen by cropping the image. Evaluation is done on ten videos from five different types of content and the baseline methods are clearly outperformed.
We propose several tracking adaptation approaches to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. to the hypothesized word sequences of an automatic ...
详细信息
We propose several tracking adaptation approaches to recover from early tracking errors in sign languagerecognition by optimizing the obtained tracking paths w.r.t. to the hypothesized word sequences of an automatic sign languagerecognition system. Hand or head tracking is usually only optimized according to a tracking criterion. As a consequence, methods which depend on accurate detection and tracking of body parts lead to recognition errors in gesture and sign language processing. We analyze an integrated tracking and recognition approach addressing these problems and propose approximation approaches over multiple hand hypotheses to ease the time complexity of the integrated approach. Most state-of-the-art systems consider tracking as a preprocessing feature extraction part. Experiments on a publicly available benchmark database show that the proposed methods strongly improve the recognition accuracy of the system.
We present a method to classify images into different categories of pornographic content to create a system for filtering pornographic images from network traffic. Although different systems for this application were ...
详细信息
We propose to explicitly model white-spaces for Arabic handwriting recognition within different writing variants. Position-dependent character shapes in Arabic handwriting allow for large white-spaces between characte...
详细信息
ISBN:
(纸本)9781424421749
We propose to explicitly model white-spaces for Arabic handwriting recognition within different writing variants. Position-dependent character shapes in Arabic handwriting allow for large white-spaces between characters even within words. Here, a separate character model for white-spaces in combination with a lexicon using different writing variants and character model length adaptation is proposed. Current handwriting recognition systems model the white-spaces implicitly within the character models leading to possibly degraded models, or try to explicitly segment the Arabic words into pieces of Arabic words being prone to segmentation errors. Several white-space modeling approaches are analyzed on the well known IFN/ENIT database and outperform the best reported error rates.
This paper describes a lexical trigger model for statistical machine translation. We present various methods using triplets incorporating long-distance dependencies that can go beyond the local context of phrases or n...
详细信息
Search is a central component of any statistical machine translation system. We describe the search for phrase-based SMT in detail and show its importance for achieving good translation quality. We introduce an explic...
详细信息
We present a method to classify images into different categories of pornographic content to create a system for filtering pornographic images from network traffic. Although different systems for this application were ...
详细信息
ISBN:
(纸本)9781424421749
We present a method to classify images into different categories of pornographic content to create a system for filtering pornographic images from network traffic. Although different systems for this application were presented in the past, most of these systems are based on simple skin colour features and have rather poor performance. Recent advances in the image recognition field in particular for the classification of objects have shown that bag-of-visual-words-approaches are a good method for many image classification problems. The system we present here, is based on this approach, uses a task-specific visual vocabulary and is trained and evaluated on an image database of 8500 images from different categories. It is shown that it clearly outperforms earlier systems on this dataset and further evaluation on two novel web-traffic collections shows the good performance of the proposed system.
This paper focuses on confidence scores for use in acoustic model adaptation. Frame-based confidence estimates are used in linear transform (CMLLR and MLLR) and MAP adaptation. We show that adaptation approaches with ...
详细信息
This paper focuses on confidence scores for use in acoustic model adaptation. Frame-based confidence estimates are used in linear transform (CMLLR and MLLR) and MAP adaptation. We show that adaptation approaches with a limited number of free parameters such as linear transform-based approaches are robust in the face of frame labeling errors whereas adaptation approaches with a large number of free parameters such as MAP are sensitive to the quality of the supervision and hence benefit most from use of confidences. Different approaches for using confidence information in adaptation are investigated. This analysis shows that a thresholding approach is effective in that it improves the frame labeling accuracy with little detrimental effect on frame recall. Experimental results show an absolute WER reduction of 2.1% over a CMLLR adapted system on a video transcription task.
暂无评论