Photo retouching aims at enhancing the aesthetic visual quality of images that suffer from photographic defects such as over/under exposure, poor contrast, inharmonious saturation. Practically, photo retouching can be...
详细信息
Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct co...
详细信息
ISBN:
(纸本)9781728132945
Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.
A trend towards capturing or filming images using cellphone and sharing images on social media is a part and parcel of day to day activities of humans. When an image is forwarded several times in social media it may b...
详细信息
Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and ...
详细信息
It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in th...
详细信息
Scene text detection for equipment nameplates in the wild is important for equipment inspection robot since it enables inspection robot to take specific actions for different equipment's. Although text detection i...
详细信息
In image restoration tasks, like denoising and super-resolution, continual modulation of restoration levels is of great importance for real-world applications, but has failed most of existing deep learning based image...
详细信息
ISBN:
(纸本)9781728132945
In image restoration tasks, like denoising and super-resolution, continual modulation of restoration levels is of great importance for real-world applications, but has failed most of existing deep learning based image restoration methods. Learning from discrete and fixed restoration levels, deep models cannot be easily generalized to data of continuous and unseen levels. This topic is rarely touched in literature, due to the difficulty of modulating well-trained models with certain hyper-parameters. We make a step forward by proposing a unified CNN framework that consists of little additional parameters than a single-level model yet could handle arbitrary restoration levels between a start and an end level. The additional module, namely AdaFM layer, performs channel-wise feature modification, and can adapt a model to another restoration level with high accuracy. By simply tweaking an interpolation coefficient, the intermediate model - AdaFM-Net could generate smooth and continuous restoration effects without artifacts. Extensive experiments on three image restoration tasks demonstrate the effectiveness of both model training and modulation testing. Besides, we carefully investigate the properties of AdaFM layers, providing a detailed guidance on the usage of the proposed method.
Context information plays an indispensable role in the success of semantic segmentation. Recently, non-local self-attention based methods are proved to be effective for context information collection. Since the desire...
详细信息
Deep Neural Networks (DNNs) have achieved remarkable successes in large-scale visual recognition. However, they often suffer from overfitting under noisy labels. To alleviate this problem, we propose a conceptually si...
详细信息
ISBN:
(纸本)9781728132945
Deep Neural Networks (DNNs) have achieved remarkable successes in large-scale visual recognition. However, they often suffer from overfitting under noisy labels. To alleviate this problem, we propose a conceptually simple but effective MetaCleaner, which can learn to hallucinate a clean representation of an object category, according to a small noisy subset from the same category. Specially, MetaCleaner consists of two flexible submodules. The first sub-module, namely Noisy Weighting, can estimate the confidence scores of all the images in the noisy subset, by analyzing their deep features jointly. The second submodule, namely Clean Hallucinating, can generate a clean representation from the noisy subset, by summarizing the noisy images with their confidence scores. Via MetaCleaner, DNNs can strengthen its robustness to noisy labels, as well as enhance its generalization capacity with richer data diversity. Moreover, MetaCleaner can be easily integrated into the standard training procedure of DNNs, which promotes its value for real-life applications. We conduct extensive experiments on two popular benchmarks in noisy-labeled recognition, i.e., Food-101N and Clothing1M. For both datasets, our MetaCleaner significantly outperforms baselines, and achieves the state-of-the-art performance.
The goal of text spotting is to perform text detection and recognition simultaneously. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape variance...
The goal of text spotting is to perform text detection and recognition simultaneously. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape variance of the same character are ignored in recent works, since most characters in natural images are rendered in standard fonts. To solve this problem, we present a Chinese Artistic Dataset, termed as ARText, which contains 33, 000 artistic images with rich shape deformation and font diversity. Based on this database, we develop a deformation robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts. Specifically, we propose a geometric prior module to highlight the important features based on the unsupervised landmark detection sub-network. A graph convolution network is further constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters. The experiments are conducted on ARText and IC19-ReCTS datasets. Our results demonstrate the effectiveness of our proposed method. The datasets and models will become publicly available after publication.
暂无评论