Zero-shot learning (ZSL) is the process of recognizing unseen samples from their related classes. Generally, ZSL is realized with the help of some pre-defined semantic information via projecting high dimensional visua...
Zero-shot learning (ZSL) is the process of recognizing unseen samples from their related classes. Generally, ZSL is realized with the help of some pre-defined semantic information via projecting high dimensional visual features of data samples and class-related semantic vectors into a common embedding space. Although classification can be simply decided through the nearest-neighbor strategy, it usually suffers from problems of domain shift and hubness. In order to address these challenges, majority of researches have introduced regularization with some existing norms, such as lasso or ridge, to constrain the learned embedding. However, the sparse estimation of lasso may cause underfitting of training data, while ridge may introduce bias in the embedding space. In order to resolve these problems, this paper proposes a novel hybrid regularization approach by leveraging elastic net and linear discriminant analysis, and formulates a unified objective function that can be solved efficiently via a synchronous optimization strategy. The proposed method is evaluated on several benchmark image datasets for the task of generalized ZSL. The obtained results demonstrate the superiority of the proposed method over simple regularized methods as well as several previous models.
Convolutional neural networks (CNN) are showing powerful performance on image recognition tasks. However, when CNN is applied to mobile devices, with limited computing and memory resource, it requires more compact des...
Convolutional neural networks (CNN) are showing powerful performance on image recognition tasks. However, when CNN is applied to mobile devices, with limited computing and memory resource, it requires more compact design to maintain a relatively high performance. In this paper, we propose Relative Squeezing Net (RSNet) that provides technical insight into CNN structure for designing a compact model. In an endeavor to improve CondenseNet, we introduce Relative-Squeezing bottleneck where output is weighted percentage of input channels. The design of our bottleneck can transmit diverse and most useful features at all stages. We also employ multiple compression layers to constrain the output channels of feature maps which can eliminate superfluous feature maps and transmit powerful representations to next layers. We evaluate our model on two benchmark datasets;CIFAR and imageNet. Experimental results show that RSNet achieves state-of-the-art results with less parameters and FLOPs and is more efficient than compact architectures such as CondenseNet, MobileNet and ShuffleNet.
We studied a coding scheme where light field (LF) images (dense multi-view images) are regarded as a sequence of temporal video frames and encoded with video codecs such as High Efficiency Video Coding (HEVC). An impo...
详细信息
We studied a coding scheme where light field (LF) images (dense multi-view images) are regarded as a sequence of temporal video frames and encoded with video codecs such as High Efficiency Video Coding (HEVC). An important issue with this scheme is how to determine the frame order of the LF images. We propose a method to find the optimum frame order through a formulation of the traveling salesman problem (TSP). Under the assumption that video codecs are more effective with temporally smooth videos, our method, named LF-TSP, defines frame-to-frame distances for each image pairs in an LF, and attempted to find the shortest route that visits all frames. Experiments showed that our method achieved an overall better rate-distortion performance than several previous methods.
Advertisement logo compositing is aiming to embed some specified logos in a suitable position of target images with alike geometric distortion and same appearance characteristic. Unsupervised learning has gained consi...
详细信息
Advertisement logo compositing is aiming to embed some specified logos in a suitable position of target images with alike geometric distortion and same appearance characteristic. Unsupervised learning has gained considerable attention recently. To synthetically address the problems about geometric distortion and appearance realism, we propose a novel learnable module, the adversarial geometric consistency pursuit model (AGCP), which explicitly allows seamless image compositing. On one hand, we design an adversarial structure that generates composite images taking geometric correction and appearance harmonization into account. Our proposed adversarial learning approach is able to obtain better harmonization in the region of interest. On the other hand, a novel geometric consistency pursuit loss is designed which encourages the network to learn the warp parameters of target images while preserving the feature of the source object. Our comparative evaluation demonstrates the effectiveness of the proposed method.
Remote sensing pan sharpening aims to enhance spatial resolution of multispectral image by injecting spatial details of a panchromatic image to multispectral image. In this study, a novel sparse representation based p...
详细信息
ISBN:
(纸本)9781538615010
Remote sensing pan sharpening aims to enhance spatial resolution of multispectral image by injecting spatial details of a panchromatic image to multispectral image. In this study, a novel sparse representation based pan sharpening method is proposed to overcome the disadvantages of traditional methods such as color distortion and blurring effect. A data set acquired for each IKONOS and Quickbird satellites are used to evaluate the performance and robustness of the proposed algorithm. The proposed method is compared with four traditional methods using several quality measurement indices with reference image. The experimental results demonstrate that the proposed algorithm is competitive or superior to other conventional methods in terms of visual and quantitative analysis as it preserves spectral information and provides high quality spatial details in the final product image.
Convolutional Neural Networks (CNNs) have been successfully applied in various image analysis tasks and gradually become one of the most powerful machine learning approaches. In order to improve the capability of the ...
详细信息
Convolutional Neural Networks (CNNs) have been successfully applied in various image analysis tasks and gradually become one of the most powerful machine learning approaches. In order to improve the capability of the model generalization and performance in image classification, a new trend is to learn more discriminative features via CNNs. The main contribution of this paper is to increase the angles between the categories to extract discriminative features and enlarge the inter-class variance. To this end, we propose a loss function named focal inter-class angular loss (FICAL) which introduces the confusion rate-weighted cosine distance as the similarity measurement between categories. This measurement is dynamically evaluated during each iteration to adapt the model. Compared with other loss functions, experimental results demonstrate that the proposed FICAL achieved best performance among the referred loss functions on two image classification datasets.
The ensemble of Convolutional Neural Networks (CNNs) is known to be more accurate and robust than the component CNNs models. Along with the development of a fast training method, current research has managed to make a...
详细信息
The ensemble of Convolutional Neural Networks (CNNs) is known to be more accurate and robust than the component CNNs models. Along with the development of a fast training method, current research has managed to make an effective ensemble of several CNNs models and require no additional training cost. However, when the ensemble size of CNNs is further increased, it is hard to observe a corresponding performance enhancement. According to the generalization capability analysis of CNNs, this phenomenon can be explained by the over-saturation of model capacity and the close correlation among the component CNNs, especially when the CNNs are trained within the same dataset. To address this problem, we propose to train CNNs on re-sampled bootstrap datasets. Extensive experiments demonstrate the bootstrap re-sampling is effective for a large ensemble size (up to 80). Besides, benefiting from the usage of the bootstrap re-sampling technique, we can also have an unbiased estimate of the standard deviation of the ensemble output.
Deep convolutional neural networks have achieved considerable success in the field of computer vision. However, it is difficult to deploy state-of-the-art models on resource-constrained platforms due to their high sto...
详细信息
Deep convolutional neural networks have achieved considerable success in the field of computer vision. However, it is difficult to deploy state-of-the-art models on resource-constrained platforms due to their high storage, memory bandwidth, and computational costs. In this paper, we propose a structured pruning method which employs a three-step process to reduce the resource consumption of neural networks. First, we train an initial network on the training set and evaluate it on the validation set. Next, we introduce an iterative pruning and fine-tuning algorithm to identify and prune redundant structures, which results in a pruned network with a compact architecture. Finally, we train the pruned network from scratch on both the training set and validation set to obtain the final accuracy on the test set. In the experiments, our pruning method significantly reduces the model size (by 87.2% on CIFAR-10), saves inference time (53.3% on CIFAR-10), and achieves better performance as compared to recent state-of-the-art methods.
Recent human action recognition methods mainly model a two-stream or 3D convolution deep learning network, with which humans spatial-temporal features can be exploited and utilized effectively. However, due to the ign...
详细信息
Recent human action recognition methods mainly model a two-stream or 3D convolution deep learning network, with which humans spatial-temporal features can be exploited and utilized effectively. However, due to the ignoring of interaction exploiting, most of these methods cannot get good enough performance. In this paper, we propose a novel action recognition framework with Graph Convolutional Network (GCN) based Interaction Reasoning: Objects and discriminative scene patches are detected using an object detector and class active mapping (CAM), respectively;and then a GCN is introduced to model the interaction among the detected objects and scene patches. Evaluation of two widely used video action benchmarks shows that the proposed work can achieve comparable performance: the accuracy up to 43.6% at EPIC Kitchen, and 47.0% at VLOG benchmark without using optical flow, respectively.
In this paper, we propose a method that can automatically translate regular human faces to Yesilcam artists. Throughout the paper, two important contributions are presented. First, since a classifier based approach is...
详细信息
ISBN:
(纸本)9781538615010
In this paper, we propose a method that can automatically translate regular human faces to Yesilcam artists. Throughout the paper, two important contributions are presented. First, since a classifier based approach is used, the proposed method automatically selects the most similar artist and makes the face translation. Second, so as to obtain the latent information about faces, a pre-trained model is utilized at the encoder part. Experiments show that visual results which are perceptually promising, Yesilcam artist-like yet preserving facial features of a person, can be synthesized.
暂无评论