Aggregating information from multiple views is essential to accurately identifying similar objects. Nevertheless, existing datasets have limitations that hinder the development of practical multi-viewobject classific...
详细信息
Aggregating information from multiple views is essential to accurately identifying similar objects. Nevertheless, existing datasets have limitations that hinder the development of practical multi-view object classification methods for real-world scenarios. The limitations include synthetic and coarse-grained objects in the datasets and the absence of a validation split to enable standard hyperparameter tuning. This paper proposes a new dataset, MVP-N (multi-view, Retail Products, Label Noise), which contains 16k real captured views and 9k multi-view sets collected from 44 retail products. In MVP-N, each view is annotated with a human-perceived information quantity (HPIQ) for analyzing how views are utilized in information aggregation. Moreover, the fine-grained categorization of objects provides the inter-class view similarity and intra-class view variance, enabling the research on learning from noisy labels of the multi-view images. Finally, a new soft label scheme, HS-HPIQ, is proposed considering the hidden stratification phenomenon in the multi-view images and achieves superior performance. To assess the effectiveness of MVP-N and the proposed HS-HPIQ, this study overviews 50 recent multi-view-based methods regarding their practicality in real-world scenarios. Six feature aggregation methods and twelve soft label methods are benchmarked on MVP-N with a deep analysis. The dataset and code are publicly available at https://***/SMNUResearch/MVP-N.
Existing multi-view object classification algorithms usually rely on sufficient labeled multi-viewobjects, which substantially restricts their scalability to novel classes with few annotated training samples in real-...
详细信息
Existing multi-view object classification algorithms usually rely on sufficient labeled multi-viewobjects, which substantially restricts their scalability to novel classes with few annotated training samples in real-world applications. Aiming to go beyond these limitations, we explore a novel yet challenging task, few-shot multi -viewobjectclassification (FS-MVOC), which expects the network to build its classification ability efficiently based on limited labeled multi-viewobjects. To this end, we design a dual augmentation network (DANet) to provide excellent performance for the under-explored FS-MVOC task. On the one hand, we employ an attention -guided multi-view representation augmentation (AMRA) strategy to help the model focus on salient features and suppress unnecessary ones on multiple views of multi-viewobjects, resulting in more discriminative multi -view representations. On the other hand, during the meta-training stage, we adopt the category prototype augmentation (CPA) strategy to improve the class-representativeness of each prototype and increase the inter -prototype difference by injecting Gaussian noise in the deep feature space. Extensive experiments on the benchmark datasets (Meta-ModelNet and Meta-ShapeNet) indicate the effectiveness and robustness of DANet.
3D objectclassification has emerged as a practical technology with applications in various domains, such as medical image analysis, automated driving, intelligent robots, and crowd surveillance. Among the different a...
详细信息
ISBN:
(纸本)9798350365474
3D objectclassification has emerged as a practical technology with applications in various domains, such as medical image analysis, automated driving, intelligent robots, and crowd surveillance. Among the different approaches, multi-view representations for 3D objectclassification have shown the most promising results, achieving state-of-theart performance. However, there are certain limitations in current view-based 3D objectclassification methods. One observation is that using all captured views for classification can confuse the classifier and lead to misleading results for certain classes. Additionally, some views may contain more discriminative information for objectclassification than others. These observations motivate the development of smarter and more efficient selective multi-viewclassification models. In this work, we propose a Selective multiview Deep Model that extracts multi-view images from 3D data representations and selects the most influential view by assigning importance scores using the cosine similarity method based on visual features detected by a pre-trained CNN. The proposed method is evaluated on the ModelNet40 dataset for the task of 3D classification. The results demonstrate that the proposed model achieves an overall accuracy of 88.13% using only a single view when employing a shading technique for rendering the views, pre-trained ResNet152 as the backbone CNN for feature extraction, and a Fully Connected Network (FCN) as the classifier.
暂无评论