Zero-shot learning aims to identify unseen (novel) objects, using only labeled samples from seen (base) classes. Existing methods usually learn visual-semantic interactions or generate absent visual features of unseen...
详细信息
Zero-shot learning aims to identify unseen (novel) objects, using only labeled samples from seen (base) classes. Existing methods usually learn visual-semantic interactions or generate absent visual features of unseen classes to compensate for the data imbalance problem. However, existing methods ignore the representation quality of visual-semantic pairs, resulting in unsatisfactory alignment and prediction bias. To tackle these issues, we propose a Hierarchical Contrastive Representation learning paradigm, termed HCR, which fully exploits model representation capability and discriminative information. Specifically, we first propose a contrastive embedding, which preserves not only high quality representations but also discriminative enough information from class-level and instance-level supervision. Then, we introduce a regressor by valuable prior knowledge for conducting more desirable visual-semantic alignment for unseen classes. A pluggable calibrator is also aggregated to further alleviate prediction bias in contrastive embedding. Extensive experiments show that the proposed HCR can significantly outperform the state-of-the-arts on popular benchmarks under ZSL and challenging GZSL settings.
Autonomous driving relies on trusty visual recognition of surrounding objects. Few-shot image classification is used in autonomous driving to help recognize objects that are rarely seen. Successful embedding and metri...
详细信息
Autonomous driving relies on trusty visual recognition of surrounding objects. Few-shot image classification is used in autonomous driving to help recognize objects that are rarely seen. Successful embedding and metric-learning approaches to this task normally learn a feature comparison framework between an unseen image and the labeled images. However, these approaches usually have problems with ambiguous feature embedding because they tend to ignore important local visual and semantic information when extracting intra-class common features from the images. In this paper, we introduce a semantic-Aligned Attention (SAA) mechanism to refine feature embedding and it can be applied to most of the existing embedding and metric-learning approaches. The mechanism highlights pivotal local visual information with attention mechanism and aligns the attentive map with semantic information to refine the extracted features. Incorporating the proposed mechanism into the prototypical network, evaluation results reveal competitive improvements in both few-shot and zero-shot classification tasks on various benchmark datasets.
The key challenge of zero -shot learning (ZSL) is to sufficiently disentangle each latent attribute from the class -level semantic annotations of images, thereby achieving a desirable semantic transfer to unseen class...
详细信息
The key challenge of zero -shot learning (ZSL) is to sufficiently disentangle each latent attribute from the class -level semantic annotations of images, thereby achieving a desirable semantic transfer to unseen classes with the disentangled attributes. However, most existing studies tackle the ZSL task with a strict classlevel alignment strategy that may yield insufficient disentanglement: (1) this strategy simply aligns holistic visual feature with its associated class -level semantic vector for each image;(2) the class -level semantic vectors have limited diversity and complex compositions of attributes. To address these issues, we propose an incorporating attribute -level aligned comparative network, i.e., IAAC-net, that develops the alignment strategy of ZSL to the attribute level. IAAC-net aims to establish diversified attribute -level and refined class -level alignments to facilitate attribute disentanglement and simultaneously improve zero -shot generalization. By further proposing a confusion -aware loss, the model is forced to rectify the disentanglement of indistinguishable attributes, which leads to a more accurate attribute disentanglement. The proposed IAAC-net yields significant improvements over the strong baselines, leading to new state-of-the-art performances on three popular challenging benchmarks, i.e., CUB, SUN, and AWA2.
Image categorisation is an active yet challenging research topic in computer vision, which is to classify the images according to their semantic content. Recently, fine-grained object categorisation has attracted wide...
详细信息
Image categorisation is an active yet challenging research topic in computer vision, which is to classify the images according to their semantic content. Recently, fine-grained object categorisation has attracted wide attention and remains difficult due to feature inconsistency caused by smaller inter-class and larger intra-class variation as well as large varying poses. Most of the existing frameworks focused on exploiting a more discriminative imagery representation or developing a more robust classification framework to mitigate the suffering. The concern has recently been paid to discovering the dependency across fine-grained class labels based on Convolutional Neural Networks. Encouraged by the success of semantic label embedding to discover the fine-grained class labels' correlation, this paper exploits the misalignment between visual feature space and semantic label embedding space and incorporates it as a privileged information into a cost-sensitive learning framework. Owing to capturing both the variation of imagery feature representation and also the label correlation in the semantic label embedding space, such a visual-semantic misalignment can be employed to reflect the importance of instances, which is more informative that conventional cost-sensitivities. Experiment results demonstrate the effectiveness of the proposed framework on public fine-grained benchmarks with achieving superior performance to state-of-the-arts.
暂无评论