The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art ...
详细信息
ISBN:
(数字)9781728165530
ISBN:
(纸本)9781728165547
The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art approaches typically tackle this problem by integrating an elaborate attention mechanism or (part-) localization method into a standard convolutional neural network (CNN). Also in this work the aim is to enhance the performance of a backbone CNN such as ResNet by including three efficient and lightweight components specifically designed for FGVC. This is achieved by using global k-max pooling, a discriminative embedding layer trained by optimizing class means and an efficient localization module that estimates bounding boxes using only class labels for training. The resulting model achieves state-of-the-art recognition accuracies on multiple FGVC benchmark datasets.
The task of fine-grained visual categorization is related to both general object recognition and specialized tasks such as face recognition. Hence, we propose to combine two methods popular for general object recognit...
详细信息
The task of fine-grained visual categorization is related to both general object recognition and specialized tasks such as face recognition. Hence, we propose to combine two methods popular for general object recognition and face recognition to build a new model-free system for fine-grained visual categorization. Specifically, we use Local Naive-Bayes Nearest Neighbor as a pre-selection method and 2D-Warping as a refinement step. For the latter, we explore different ways to use the alignments computed by a 2D-Warping algorithm for classification. We demonstrate the performance of our approach on the CUB200-2011 database and show that our approach outperforms the recognition accuracy of current state-of-the-art methods.
In statistical machine translation, word lattices are used to represent the ambiguities in the preprocessing of the source sentence, such as word segmentation for Chinese or morphological analysis for German. Several ...
详细信息
ISBN:
(纸本)9781622765928
In statistical machine translation, word lattices are used to represent the ambiguities in the preprocessing of the source sentence, such as word segmentation for Chinese or morphological analysis for German. Several approaches have been proposed to define the probability of different paths through the lattice with external tools like word segmenters, or by applying indicator features. We introduce a novel lattice design, which explicitly distinguishes between different preprocessing alternatives for the source sentence. It allows us to make use of specific features for each preprocessing type and to lexicalize the choice of lattice path directly in the phrase translation model. We argue that forced alignment training can be used to learn lattice path and phrase translation model simultaneously. On the news-commentary portion of the German→English WMT 2011 task we can show moderate improvements of up to 0.6% Bleu over a state-of-the-art baseline system.
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM)...
ISBN:
(数字)9781509066315
ISBN:
(纸本)9781509066322
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.
The task of fine-grained visual classification (FGVC) deals with classification problems that display a small inter-class variance such as distinguishing between different bird species or car models. State-of-the-art ...
详细信息
We propose a state-of-the-art system for recognizing real-world handwritten images exposing a huge degree of noise and a high out-of-vocabulary rate. We describe methods for successful image demising, line removal, de...
详细信息
We propose a state-of-the-art system for recognizing real-world handwritten images exposing a huge degree of noise and a high out-of-vocabulary rate. We describe methods for successful image demising, line removal, deskewing, deslanting, and text line segmentation. We demonstrate how to use a HMM-based recognition system to obtain competitive results, and how to further improve it using LSTM neural networks in the tandem approach. The final system outperforms other approaches on a new dataset for English and French handwriting. The presented framework scales well across other standard datasets.
In this work we analyze the contribution of preprocessing steps for Latin handwriting recognition. A preprocessing pipeline based on geometric heuristics and image statistics is used. This pipeline is applied to Frenc...
详细信息
In this work we analyze the contribution of preprocessing steps for Latin handwriting recognition. A preprocessing pipeline based on geometric heuristics and image statistics is used. This pipeline is applied to French and English handwriting recognition in an HMM based framework. Results show that preprocessing improves recognition performance for the two tasks. The Maximum Likelihood (ML)-trained HMM system reaches a competitive WER of 16.7% and outperforms many sophisticated systems for the French handwriting recognition task. The results for English handwriting are comparable to other ML-trained HMM recognizers. Using MLP preprocessing a WER of 35.3% is achieved.
In this paper, we describe a source-side reordering method based on syntactic chunks for phrase-based statistical machine translation. First, we shallow parse the source language sentences. Then, reordering rules are ...
详细信息
This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-rep...
详细信息
ISBN:
(纸本)9782951740891
This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-reported post-editing effort achieved with such approach are shown to improve those obtained by the technique based in counting characters, mainly employed for automatic subtitle segmentation currently. The corpus used to train and test the proposed automated segmentation method is also described and shared with the community, in order to foster further research in this area.
Handwritten text is generally captured through two main modalities: off-line and on-line. Each modality has advantages and disadvantages, but it seems clear that smart approaches to handwritten text recognition (HTR) ...
详细信息
暂无评论