In the field of robotics, due to the complexity of real environments, place recognition using the 3D LiDAR is always a challenging problem. The spatialrelations of internal structures underlying the LiDAR data from d...
详细信息
In the field of robotics, due to the complexity of real environments, place recognition using the 3D LiDAR is always a challenging problem. The spatialrelations of internal structures underlying the LiDAR data from different places are distinguishable, which can be used to describe the environment. In this paper, we utilize the spatialrelations of internal structures and propose a two-level framework for 3D LiDAR place recognition based on the spatial relation graph (SRG). At first, the proposed framework segments the point cloud into multiple clusters, then the features of the clusters and the spatialrelation descriptors (SRDs) between the clusters are extracted, and the point cloud is represented by the SRG, which uses the clusters as the nodes and their spatialrelations as the edges. After that, we propose a two-level matching model in which two different models are fused for accurately and efficiently matching the SRGs, including the upper-level searching model (U-LSM) and lower-level matching model (L-LMM). In the U-LSM, an incremental bag-of-words model is used to search for candidate SRGs through the distribution of the SRDs in the SRG. In the L-LMM, we utilize the improved spectral method to calculate similarities between the current SRG and the candidates. The experimental results demonstrate that our framework achieves good precision, recall and viewpoint robustness on both public benchmarks and self-built campus dataset. (c) 2021 Elsevier Ltd. All rights reserved.
Video captioning is a significant challenging task in computer vision and natural language processing, aiming to automatically describe video content by natural language sentences. Comprehensive understanding of video...
详细信息
Video captioning is a significant challenging task in computer vision and natural language processing, aiming to automatically describe video content by natural language sentences. Comprehensive understanding of video is the key for accurate video captioning, which needs to not only capture the global content and salient objects in video, but also understand the spatio-temporal relations of objects, including their temporal trajectories and spatialrelationships. Thus, it is important for video captioning to capture the objects' relationships both within and across frames. Therefore, in this paper, we propose an object-aware spatio-temporal graph (OSTG) approach for video captioning. It constructs spatio-temporal graphs to depict objects with their relations, where the temporal graphs represent objects' inter-frame dynamics, and the spatialgraphs represent objects' intra-frame interactive relationships. The main novelties and advantages are: (1) Bidirectional temporal alignment: Bidirectional temporal graph is constructed along and reversely along the temporal order to perform bidirectional temporal alignment for objects across different frames, which provides complementary clues to capture the inter-frame temporal trajectories for each salient object. (2) graph based spatialrelation learning: spatial relation graph is constructed among objects in each frame by considering their relative spatial locations and semantic correlations, which is exploited to learn relation features that encode intra-frame relationships for salient objects. (3) Object-aware feature aggregation: Trainable VLAD (vector of locally aggregated descriptors) models are deployed to perform object-aware feature aggregation on objects' local features, which learn discriminative aggregated representations for better video captioning. A hierarchical attention mechanism is also developed to distinguish contributions of different object instances. Experiments on two widely-used datasets, MSR-VTT and
A spatial relation graph (SRG) and its partial matching method are proposed for online composite graphics representation and recognition. The SRG-based approach emphasizes three characteristics of online graphics reco...
详细信息
A constrained partial permutation strategy is proposed for matching spatial relation graph (SRG), which is used in our sketch input and recognition system Smart Sketchpad for representing the spatialrelationship amon...
详细信息
A constrained partial permutation strategy is proposed for matching spatial relation graph (SRG), which is used in our sketch input and recognition system Smart Sketchpad for representing the spatialrelationship among the components of a graphic object. Using two kinds of matching constraints dynamically generated in the matching process, the proposed approach can prune most improper mappings between SRGs during the matching process. According to our theoretical analysis in this paper, the time complexity of our approach is O(n 2) in the best case, and O(n!) in the worst case, which occurs infrequently. The spatial complexity is always O(n) for all cases. Implemented in Smart Sketchpad, our proposed strategy is of good performance.
暂无评论