Category-level 6D object pose estimation has gained popularity and it is still challenging due to the diversity of different instances within the same category. In this paper, a novel category-level 6D object pose est...
详细信息
Category-level 6D object pose estimation has gained popularity and it is still challenging due to the diversity of different instances within the same category. In this paper, a novel category-level 6D object pose estimation framework with structure encoder and reasoning attention is proposed. A structure autoencoder is introduced to mine the shared structure features in the color images within the same category, via a distinct learning strategy that recovers the image of another instance but with the most similar pose to the input. On this basis, a reasoning attention decoder and full connected layers are stacked to form a rotation prediction network, where the structure features and 3D shape features are integrated and projected to a semantic space. The semantic space includes observed patterns and learnable patterns, which are better learned by adding a shortcut connection branch parallel to reasoning attention decoder with gradient decouple. Further reasoning based on these patterns endows the decoder with powerful feature representation. Without 3D object models, the proposed method models the attributes of category implicitly in the semantic space and better performance of 6D object pose estimation is guaranteed by reasoning on this space. The effectiveness of the proposed method is verified by the results on public datasets and actual experiments.
Fine-grained classification is a challenging problem with small inter-class variance and large intra-class variance. It becomes more difficult when only a few labeled training samples are available. Inspired by the pr...
详细信息
Fine-grained classification is a challenging problem with small inter-class variance and large intra-class variance. It becomes more difficult when only a few labeled training samples are available. Inspired by the procedure of human recognition that two similar objects are usually distinguished by comparing their key parts, we develop a novel few-shot fine-grained classification method, which learns to model the inter-class boundaries in human-like style, i.e., extracting key-part structure information of objects and performing part-by-part comparison. To this end, we first extract the key parts of objects by using the designed key-part detector, which are then encoded by our structure encoder for the final comparison. To tackle with the scarce labeled samples, we train the proposed network under the metric-based few-shot learning methodology. Experiments on benchmark datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art counterparts. Besides, extensive investigations are conducted to verify the contributions of the key components of our method.
暂无评论