Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap betw...
详细信息
ISBN:
(纸本)9781665445092
Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.
A growing number of commercial satellite companies provide easily accessible satellite imagery. Overhead imagery is used by numerous industries including agriculture, forestry, natural disaster analysis, and meteorolo...
详细信息
ISBN:
(纸本)9781665448994
A growing number of commercial satellite companies provide easily accessible satellite imagery. Overhead imagery is used by numerous industries including agriculture, forestry, natural disaster analysis, and meteorology. Satellite images, just as any other images, can be tampered with image manipulation tools. Manipulation detection methods created for images captured by "consumer cameras" tend to fail when used on satellite images due to the differences in image sensors, image acquisition, and processing. In this paper we propose an unsupervised technique that uses a vision Transformer to detect spliced areas within satellite images. We introduce a new dataset which includes manipulated satellite images that contain spliced objects. We show that our proposed approach performs better than existing unsupervised splicing detection techniques.
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric...
详细信息
ISBN:
(纸本)9781665448994
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric under epipolar constraint, in this paper, we improve the performance of stereo image SR by exploiting symmetry cues in stereo image pairs. Specifically, we propose a symmetric bi-directional parallax attention module (biPAM) and an inline occlusion handling scheme to effectively interact cross-view information. Then, we design a Siamese network equipped with a biPAM to super-resolve both sides of views in a highly symmetric manner. Finally, we design several illuminance-robust losses to enhance stereo consistency. Experiments on four public datasets demonstrate the superior performance of our method.
Over the earlier time, a category of machine learning, called deep learning, has attained significant achievements in several computervision tasks such as image classification, object detection, semantic segmentation...
详细信息
In this paper, we present a decomposition model for stereo matching to solve the problem of excessive growth in computational cost (time and memory cost) as the resolution increases. In order to reduce the huge cost o...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we present a decomposition model for stereo matching to solve the problem of excessive growth in computational cost (time and memory cost) as the resolution increases. In order to reduce the huge cost of stereo matching at the original resolution, our model only runs dense matching at a very low resolution and uses sparse matching at different higher resolutions to recover the disparity of lost details scale-by-scale. After the decomposition of stereo matching, our model iteratively fuses the sparse and dense disparity maps from adjacent scales with an occlusion-aware mask. A refinement network is also applied to improving the fusion result. Compared with high-performance methods like PSMNet and GANet, our method achieves 10 - 100 x speed increase while obtaining comparable disparity estimation results.
Person activity recognition is now a research area in computervision. The various applications of this area include healthcare, sports and suspicious activity detection. This study focused on physiotherapy exercise a...
详细信息
Face recognition under occlusion presents a persistent challenge in computervision, primarily due to difficulties in capturing and effectively integrating visible and obscured facial features. This paper introduces a...
详细信息
Previous point cloud semantic segmentation networks use the same process to aggregate features from neighbors of the same category and different categories. However, the joint area between two objects usually only occ...
详细信息
ISBN:
(纸本)9781665445092
Previous point cloud semantic segmentation networks use the same process to aggregate features from neighbors of the same category and different categories. However, the joint area between two objects usually only occupies a small percentage in the whole scene. Thus the networks are well-trained for aggregating features from the same category point while not fully trained on aggregating points of different categories. To address this issue, this paper proposes to utilize different aggregation strategies between the same category and different categories. Specifically, it presents a customized module, termed as Category Guided Aggregation (CGA), where it first identifies whether the neighbors belong to the same category with the center point or not, and then handles the two types of neighbors with two carefully-designed modules. Our CGA presents a general network module and could be leveraged in any existing semantic segmentation network. Experiments on three different backbones demonstrate the effectiveness of our method.
Faces must be present in images for intelligent vision-based human computer interaction to work. Since many years ago, face recognition research has been ongoing and significant. This procedure entails facial tracking...
详细信息
The motorcycle industry has come a long way in automating the assembly processes, yet the automation of electrical wiring still presents a considerable challenge. Due to their complex and non-linear nature, wires prov...
详细信息
暂无评论