We propose a multi-view framework for joint objectdetection and labelling based on pairs of images. The proposed framework extends the single-view Mask R-CNN approach to multiple views without need for additional tra...
详细信息
ISBN:
(纸本)9789897584022
We propose a multi-view framework for joint objectdetection and labelling based on pairs of images. The proposed framework extends the single-view Mask R-CNN approach to multiple views without need for additional training. Dedicated components are embedded into the framework to match objects across views by enforcing epipolar constraints, appearance feature similarity and class coherence. The multi-view extension enables the proposed framework to detect objects which would otherwise be mis-detected in a classical Mask R-CNN approach, and achieves coherent object labelling across views. By avoiding the need for additional training, the approach effectively overcomes the current shortage of multi-view datasets. The proposed framework achieves high quality results on a range of complex scenes, being able to output class, bounding box, mask and an additional label enforcing coherence across views. In the evaluation, we show qualitative and quantitative results on several challenging outdoor multi-view datasets and perform a comprehensive comparison to verify the advantages of the proposed method.
multi-view object detection is an open and challenging problem due to its inherent intra-class variability among discrete viewpoints. This paper aims to perform multi-view object detection by learning discriminated an...
详细信息
multi-view object detection is an open and challenging problem due to its inherent intra-class variability among discrete viewpoints. This paper aims to perform multi-view object detection by learning discriminated and correlated patches firstly and then making inference based on them. In the training stage, discriminative patches are discovered for each view by a Hough decision tree corresponding to leaf nodes with high distinctiveness and stable spatial distributions in the tree. Then discriminated patches across different views are linked to establish the correlations between any two neighboring views. During multiviewdetection, intra-view direct votes and inter-view transfer votes are integrated to obtain voted Hough images through a probabilistic approach with each view having one Hough image, and Mean-Shift estimation is finally employed to detect object instances and infer image viewpoint. The experiments performed on two benchmark multi-view 3D object Category datasets and PASCAL VOC'06 Car dataset illustrate the effectiveness of the proposed framework. (C) 2017 Elsevier Ltd. All rights reserved.
Automatic inspection of X-ray scans at security checkpoints can improve the public security. X-ray images are different from photographic images. They are transparent. They contain much less texture. They may be highl...
详细信息
Automatic inspection of X-ray scans at security checkpoints can improve the public security. X-ray images are different from photographic images. They are transparent. They contain much less texture. They may be highly cluttered. objects may undergo in- and out-of-plane rotations. On the other hand, scale and illumination change is less of an issue. More importantly, X-ray imaging provides extra information which are usually not available in regular images: dual-energy imaging, which provides material information about the objects;and multi-view imaging, which provides multiple images of objects from different viewing angles. Such peculiarities of X-ray images should be leveraged for high-performance object recognition systems to be deployed on X-ray scanners. To this end, we first present an extensive evaluation of standard local features for objectdetection on a large X-ray image dataset in a structured learning framework. Then, we propose two dense sampling methods as keypoint detector for textureless objects and extend the SPIN color descriptor to utilize the material information. Finally, we propose a multi-view branch-and-bound search algorithm for multi-view object detection. Through extensive experiments on three object categories, we show that objectdetection performance on X-ray images improves substantially with the help of extended features and multiple views.
Although many state-of-the-art methods of objectdetection in a single image have achieved great success in the last few years, they still suffer from the false positives in crowd scenes of the real-world applications...
详细信息
Although many state-of-the-art methods of objectdetection in a single image have achieved great success in the last few years, they still suffer from the false positives in crowd scenes of the real-world applications like automatic checkout. In order to address the limitations of single-viewobjectdetection in complex scenes, we propose MVDet, an end-to-end learnable approach that can detect and re-identify multi-class objects in multiple images captured by multiple cameras (multi-view). Our approach is based on the premise that incorrect detection results in a specific view can be eliminated using precise cues from other views, given the availability of multi-view images. Unlike most existing multi- viewdetection algorithms, which assume that objects belong to a single class on the ground plane, our approach can classify multi-class objects without such assumptions and is thus more practical. To classify multi-class objects, we propose an integrated architecture for region proposal, re-identification, and classification. Additionally, we utilize the epipolar geometry constraint to devise a novel re- identification algorithm that does not require assumptions about ground plane assumption. Our model demonstrates competitive performance compared to several baselines on the challenging MessyTable dataset.
Broiler localization is crucial for welfare monitoring, particularly in identifying issues such as wet litter. We focus on multi-camera detection systems since multiple viewpoints not only ensure comprehensive pen cov...
详细信息
ISBN:
(纸本)9798350365474
Broiler localization is crucial for welfare monitoring, particularly in identifying issues such as wet litter. We focus on multi-camera detection systems since multiple viewpoints not only ensure comprehensive pen coverage but also reduce occlusions caused by lighting, feeder and drinking equipment. Previous multi-viewdetection studies localize subjects either by aggregating ground plane projections of single-view predictions or by developing end-to-end multi-view detectors capable of directly generating predictions. However, single-viewdetections may suffer from reduced accuracy due to occlusions, and obtaining ground plane labels for training end-to-end multi-view detectors is challenging. In this paper, we combine the strengths of both approaches by using the readily available aggregated single-viewdetections as labels for training a multi-view detector. Our approach alleviates the need for hard-to-acquire ground-plane labels. Through experiments on a real-world broiler dataset, we demonstrate the effectiveness of our approach.
3D Deep Learning has made tremendous progress recently and is being widely used in various fields, such as medical imaging, robotics, and autonomous vehicle driving, to identify and segment various structures. In this...
详细信息
ISBN:
(纸本)9798350334982
3D Deep Learning has made tremendous progress recently and is being widely used in various fields, such as medical imaging, robotics, and autonomous vehicle driving, to identify and segment various structures. In this work, we leverage the recent developments in 3D semi-supervised learning to develop state-of-the-art models for 3D objectdetection and segmentation for various buried structures such as memory and logic die. We briefly describe our approach to fabricating, generating 3D scans, and annotating these samples. Thereafter, we explain our approach to locating these buried structures by demonstrating how semi-supervised learning is adopted to leverage vast amounts of available unlabeled data to improve both detection and segmentation performance. We also develop a metrology package that performs post-processing and outputs various important metrics for each package, such as void-to-solder ratio, pad misalignment, solder extrusion, and bond line thickness. Overall, we observe an improvement of up to 16% in objectdetection and 6% in 3D segmentation. Our final metrology results show a mean error of less than 1.24um for BLT and 0.753um for Pad misalignment when compared to the ground-truth labeled data.
Prohibited item detection in X-ray security inspection images using computer vision technology is a challenging task in real world scenarios due to various factors, include occlusion and unfriendly imaging viewing ang...
详细信息
ISBN:
(纸本)9783031189159;9783031189166
Prohibited item detection in X-ray security inspection images using computer vision technology is a challenging task in real world scenarios due to various factors, include occlusion and unfriendly imaging viewing angle. Intelligent analysis of multi-view X-ray security inspection images is a relatively direct and targeted solution. However, there is currently no published multi-view X-ray security inspection image dataset. In this paper, we construct a dual-view X-ray security inspection dataset, named Dualray, based on real acquisition method. Dualray dataset consists of 4371 pairs of images with 6 categories of prohibited items, and each pair of instances is imaged from horizontal and vertical viewing angles. We have annotated each sample with the categories of prohibited item and the location represented by bounding box. In addition, a dual-view prohibited item feature fusion and detection framework in X-ray images is proposed, where the two input channels are applied and divided into primary and secondary channels, and the features of the secondary channel are used to enhance the features of the primary channel through the feature fusion model. Spatial attention and channel attention are employed to achieve efficient feature screening. We conduct some experiments to verify the effectiveness of the proposed dual-view prohibited item detection framework in X-ray images. The Dualray dataset and dual-viewobjectdetection code are available at https://***/zhg-SZPT/Dualray.
We present BAdaCost, a multi-class cost-sensitive classification algorithm. It combines a set of cost sensitive multi-class weak learners to obtain a strong classification rule within the Boosting framework. To derive...
详细信息
We present BAdaCost, a multi-class cost-sensitive classification algorithm. It combines a set of cost sensitive multi-class weak learners to obtain a strong classification rule within the Boosting framework. To derive the algorithm we introduce CMEL, a Cost-sensitive multi-class Exponential Loss that generalizes the losses optimized in various classification algorithms such as AdaBoost, SAMME, Cost-sensitive AdaBoost and PIBoost. Hence unifying them under a common theoretical framework. In the experiments performed we prove that BAdaCost achieves significant gains in performance when compared to previous multi-class cost-sensitive approaches. The advantages of the proposed algorithm in asymmetric multi class classification are also evaluated in practical multi-view face and car detection problems. (C) 2018 Elsevier Ltd. All rights reserved.
暂无评论