In this paper, we study deep transfer learning as a way of overcoming object recognition challenges encountered in the field of digital pathology. Through several experiments, we investigate various uses of pre-traine...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we study deep transfer learning as a way of overcoming object recognition challenges encountered in the field of digital pathology. Through several experiments, we investigate various uses of pre-trained neural network architectures and different combination schemes with random forests for feature selection. Our experiments on eight classification datasets show that densely connected and residual networks consistently yield best performances across strategies. It also appears that network fine-tuning and using inner layers features are the best performing strategies, with the former yielding slightly superior results.
In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using ...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using a convolutional neural network (CNN). The CNN architecture outputs rotated rectangles, providing a symbolized approximation that works well for small buildings. Experiments are conducted on the four cities in the DeepGlobe Challenge dataset (Las Vegas, Paris, Shanghai, Khartoum). Our method performs best on suburbs consisting of individual houses. These experiments show that either large buildings or buildings without clear delineation produce weaker results in terms of precision and recall.
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labelling of simultaneously emerging actions in each video. We commit an in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition, and the results reveal that such methods are unsatisfactory. Furthermore, we propose a novel three-stream hybrid model to tackle the HAA problem, which achieves better performances and receives relatively promising results.
We present a semantic segmentation algorithm for RGB remote sensing images. Our method is based on the Dilated Stacked U-Nets architecture. This state-of-the-art method has been shown to have good performance in other...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We present a semantic segmentation algorithm for RGB remote sensing images. Our method is based on the Dilated Stacked U-Nets architecture. This state-of-the-art method has been shown to have good performance in other applications. We perform additional post-processing by blending image tiles and degridding the result. Our method gives competitive results on the DeepGlobe dataset.
Semantic Segmentation of satellite images is one of the most challenging problems in computervision as it requires a model capable of capturing both local and global information at each pixel. Current state of the ar...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Semantic Segmentation of satellite images is one of the most challenging problems in computervision as it requires a model capable of capturing both local and global information at each pixel. Current state of the art methods are based on Fully Convolutional Neural Networks (FCNN) with mostly two main components: an encoder which is a pretrained classification model that gradually reduces the input spatial size and a decoder that transforms the encoder's feature map into a predicted mask with the original size. We change this conventional architecture to a model that makes use of full resolution information. NU-Net is a deep FCNN that is able to capture wide field of view global information around each pixel while maintaining localized full resolution information throughout the model. We evaluate our model on the Land Cover Classification and Road Extraction tracks in the DeepGlobe competition.
Automatic target recognition involves detecting and recognizing potential targets automatically, which is widely used in civilian and military applications today. Quadratic correlation filters were introduced as two-c...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Automatic target recognition involves detecting and recognizing potential targets automatically, which is widely used in civilian and military applications today. Quadratic correlation filters were introduced as two-class recognition classifiers for quickly detecting targets in cluttered scene environments. In this paper, we introduce two methods that integrate the discrimination capability of quadratic correlation filters with the multi-class recognition ability of multilayer neural networks. For mid-wave infrared imagery, the proposed methods are demonstrated to be multi-class target recognition classifiers with very high accuracy.
Classification of human emotions remains an important and challenging task for many computervision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently prop...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Classification of human emotions remains an important and challenging task for many computervision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multilayered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN), that achieves state-of-the-art results in the recent facial landmark recognition challenge, with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%.
Millions of people are disconnected from basic services due to lack of adequate addressing. We propose an automatic generative algorithm to create street addresses from satellite imagery. Our addressing scheme is cohe...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Millions of people are disconnected from basic services due to lack of adequate addressing. We propose an automatic generative algorithm to create street addresses from satellite imagery. Our addressing scheme is coherent with the street topology, linear and hierarchical to follow human perception, and universal to be used as a unified geocoding system. Our algorithm starts with extracting road segments using deep learning and partitions the road network into regions. Then regions, streets, and address cells are named using proximity computations. We also extend our addressing scheme to cover inaccessible areas, to be flexible for changes, and to lead as a pioneer for a unified geodatabase.
Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
暂无评论