multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state...
详细信息
ISBN:
(纸本)9781450386517
multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://***/hou- yz/MVDeTr.
multiview detection task is to utilize multiple camera views to reduce the severity of the occlusion, the critical part of which is the multiview aggregation. The aggregated ground plane feature can be acquired based ...
详细信息
ISBN:
(纸本)9798350379822;9798350379815
multiview detection task is to utilize multiple camera views to reduce the severity of the occlusion, the critical part of which is the multiview aggregation. The aggregated ground plane feature can be acquired based on the convolutional feature map projections from multiple views, and it uses the same weight to fuse information from all cameras. However, directly using information from all cameras is suboptimal, as the object features undergo various occlusions according to their positions and corresponding camera perspectives. In this paper, we propose a novel meta-learning based multiview detector, dubbed as MetaMVDet, that adopts a newly introduced camera-aware attention to aggregate the multiview information. Our camera-aware attention aims to select reliable information from different camera views to reduce the ambiguity by occlusions. We leverage both 2D and 3D information simultaneously while maintaining 2D-3D multiview consistency to guide the learning of the multiview detection network. The proposed solution achieves the state-of-the-art accuracy on two major multiview-detection benchmarks.
Automated whole breast ultrasound (ABUS) has become a popular screening tool in recent years. To reduce the review time and misdetection from ABUS images by physicians, a computer-aided detection (CADe) system for ABU...
详细信息
Automated whole breast ultrasound (ABUS) has become a popular screening tool in recent years. To reduce the review time and misdetection from ABUS images by physicians, a computer-aided detection (CADe) system for ABUS images based on a multiview method is proposed in this study. A total of 58 pathology-proven lesions from 41 patients were used to evaluate the performance of the system. In the proposed CADe system, the fuzzy c-mean clustering method was applied to detect tumor candidates from these ABUS images. Subsequently, the tumor likelihoods of these candidates could be estimated by a logistic linear regression model based on the intensity, morphology, location, and size features in the transverse, longitudinal, and coronal views. Finally, the multiview tumor likelihoods of the tumor candidates could be obtained from the estimated tumor likelihoods of the three views, and the tumor candidates with high multiview tumor likelihoods were regarded as the detected tumors in the proposed system. The sensitivities of the multiview tumor detection for selecting 5, 10, 20, and 30 tumor candidates with the largest multiview tumor likelihoods were 79.31%, 86.21%, 96.55%, and 98.28%, respectively.
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration, which promotes the multi-view subject registration problem to a new calibrat...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration, which promotes the multi-view subject registration problem to a new calibration-free stage. This greatly alleviates the limitation in many practical applications. However, this is a very challenging problem since its only input is several RGB images from different first-person views (FPVs), without the BEV image and the calibration of the FPVs, while the output is a unified plane aggregated from all views with the positions and orientations of both the subjects and cameras in a BEV. For this purpose, we propose an end-to-end framework solving camera and subject registration together by taking advantage of their mutual dependence, whose main idea is as below: i) creating a subject view-transform module (VTM) to project each pedestrian from FPV to a virtual BEV, ii) deriving a multi-view geometry-based spatial alignment module (SAM) to estimate the relative camera pose in a unified BEV, iii) selecting and refining the subject and camera registration results within the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for training and evaluation. Additionally, we also collect a real dataset for cross-domain evaluation. The experimental results show the remarkable effectiveness of our method. The code and proposed datasets are available at BEVSee.
暂无评论