Iterative inference approaches have shown promising success in the task of multi-view depth estimation. However, these methods put excessive emphasis on the universal inter-view correspondences while neglecting the co...
详细信息
Iterative inference approaches have shown promising success in the task of multi-view depth estimation. However, these methods put excessive emphasis on the universal inter-view correspondences while neglecting the correspondence ambiguity in regions of low texture and depth discontinuous areas. Thus, they are prone to produce inaccurate or even erroneous depth estimations, which is further exacerbated cumulative errors especially in the iterative pipeline, providing unreliable information in many real-world scenarios. In this paper, we revisit this issue from the intra-view Contextual Hints and introduce a novel enhancing iterative approach, named EnIter. Concretely, at the beginning of each iteration, we present a Depth Intercept (DI) modulator to provide more accurate depth by aggregating neighbor uncertainty, correlation volume of reference and normal. This plug and play modulator is effective at intercepting the erroneous depth estimations with implicit guidance from the universal correlation contextual hints, especially for the challenging regions. Furthermore, at the end of each iteration, we refine the depth map with another plug and play modulator termed as Depth Refine (DR). It mines the latent structure knowledge of reference Contextual Hints and establishes one-way dependency using local attention from reference features to depth, yielding delicate depth in details. Extensive experiment demonstrates that our method not only achieves state-of-the-art performance over existing models but also exhibits remarkable universality in popular iterative pipelines, e.g., CasMVS, UCSNet, TransMVS, UniMVS.
暂无评论