Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To addr...
详细信息
Despite advancements in visual question answering, challenges persist with documents like financial reports, often structured in complicated tabular structures with complex numerical computations. An alternative appro...
详细信息
vision-based sign language recognition is an extensively researched problem aimed at advancing communication be-tween deaf and hearing individuals. Numerous Sign Lan-guage recognition (SLR) datasets have been introduc...
Medical image segmentation tasks are often intricate and require medical domain expertise. Recent advancements in deep learning have expedited these demanding tasks, transitioning from specialized models tailored to e...
详细信息
This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. It presents two main contributions: (i) we introduce a n...
详细信息
In this paper, we present an enhanced medical image segmentation approach leveraging the nnUNet framework, specifically tailored to integrate bounding box prompts for improved segmentation accuracy in resource-constra...
详细信息
Despite advancements in visual question answering, challenges persist with documents like financial reports, often structured in complicated tabular structures with complex numerical computations. An alternative appro...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Despite advancements in visual question answering, challenges persist with documents like financial reports, often structured in complicated tabular structures with complex numerical computations. An alternative approach, the pipeline-driven methodology, includes table recognition (TR) and table question-answering (TQA). Recent advancements in TR support this approach with better accuracy and interpretability. However, real-world tables usually represent hierarchical tables. They pose additional challenges due to merged cells and indents, necessitating a specific approach for hierarchical relationship extraction. In this paper, we propose TRH2TQA (Table recognition with Hierarchical Relationships to Table Question-Answering) for business table images. It consists of three modules on table images with question-answer pairs. First, the TR module extracts structure and textual content from table images into HTML format. Second, post-structure extraction is applied to identify header and hierarchical relationships using predicted column span and bounding box. Finally, this information is combined with natural language questions in the TQA module to generate the answer through the decoder. In extensive experiments, TRH2TQA outperforms in questionanswering performance on the VQAonBD 2023 dataset.
We show how shadows can be efficiently generated in differentiable rendering of triangle meshes. Our central observation is that pre-filtered shadow mapping, a technique for approximating shadows based on rendering fr...
详细信息
ISBN:
(纸本)9798350301298
We show how shadows can be efficiently generated in differentiable rendering of triangle meshes. Our central observation is that pre-filtered shadow mapping, a technique for approximating shadows based on rendering from the perspective of a light, can be combined with existing differentiable rasterizers to yield differentiable visibility information. We demonstrate at several inverse graphics problems that differentiable shadow maps are orders of magnitude faster than differentiable light transport simulation with similar accuracy - while differentiable rasterization without shadows often fails to converge.
In this paper, we examine gradients of logits of image classification CNNs by input pixel values. We observe that these fluctuate considerably with training randomness, such as the random initialization of the network...
详细信息
ISBN:
(纸本)9798350301298
In this paper, we examine gradients of logits of image classification CNNs by input pixel values. We observe that these fluctuate considerably with training randomness, such as the random initialization of the networks. We extend our study to gradients of intermediate layers, obtained via GradCAM, as well as popular network saliency estimators such as DeepLIFT, SHAP, LIME, Integrated Gradients, and SmoothGrad. While empirical noise levels vary, qualitatively different attributions to image features are still possible with all of these, which comes with implications for interpreting such attributions, in particular when seeking data-driven explanations of the phenomenon generating the data. Finally, we demonstrate that the observed artefacts can be removed by marginalization over the initialization distribution by simple stochastic integration.
In this paper, we identify pattern imbalance from several aspects, and further develop a new training scheme to avert pattern preference as well as spurious correlation. In contrast to prior methods which are mostly c...
详细信息
ISBN:
(纸本)9798350301298
In this paper, we identify pattern imbalance from several aspects, and further develop a new training scheme to avert pattern preference as well as spurious correlation. In contrast to prior methods which are mostly concerned with category or domain granularity, ignoring the potential finer structure that existed in datasets, we give a new definition of seed category as an appropriate optimization unit to distinguish different patterns in the same category or domain. Extensive experiments on domain generalization datasets of diverse scales demonstrate the effectiveness of the proposed method.
暂无评论