Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in any scene and any downstream task. Data-driven local feature learnin...
详细信息
Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in any scene and any downstream task. Data-driven local feature learning methods need to rely on pixel-level correspondence for training. However, a vast number of existing approaches ignored the semantic information on which humans rely to describe image pixels. In addition, it is not feasible to enhance generic scene keypoints detection and description simply by using traditional common semantic segmentation models because they can only recognize a limited number of coarse-grained object classes. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a foundation model trained on 11 million images, as a teacher to guide local feature learning. SAMFeat learns additional semantic information brought by SAM and thus is inspired by higher performance even with limited training samples. To do so, first, we construct an auxiliary task of Attention-weighted Semantic Relation Distillation (ASRD), which adaptively distillates feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, to improve local feature description using semantic discrimination. Second, we develop a technique called Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), which utilizes semantic groupings derived from SAM as weakly supervised signals, to optimize the metric space of local descriptors. Third, we design an Edge Attention Guidance (EAG) to further improve the accuracy of local feature detection and description by prompting the network to pay more attention to the edge region guided by SAM. SAMFeat’s performance on various tasks such as image matching on HPatches, and long-term visual localization on Aachen Day-Night showcases its superiority over previous local features. The release code is available at https://***/vignywang/SAMFeat
This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding ev...
详细信息
It is critical to perceive physical contact for intelligent robots to safely interact in dynamic, unstructured environments. As physical contacts can occur at any location, a well-performing tactile sensing system sho...
It is critical to perceive physical contact for intelligent robots to safely interact in dynamic, unstructured environments. As physical contacts can occur at any location, a well-performing tactile sensing system should be able to deploy a large area on robotic surface. Some researchers have implemented large-area tactile sensors by using sensing arrays, but it is challenging to deploy many sensing elements. Electrical resistance tomography (ERT) has recently been introduced into tactile sensing to overcome some of the limitations with conventional tactile sensing arrays, and good results have been achieved for some robotic applications. However, a particular challenge is that spatial resolution is low. Although various attempts have been made to improve the performance of ERT-based tactile sensors, the intrinsic resolution issue remains unsolved. In this paper, we propose a novel adaptive optimal drive strategy for efficient ERT-based large-area tactile sensing for robotic applications, which can adaptively select the current injection and voltage measurement pattern for optimal tactile stimulus. In particular, regions of tactile contacts are preliminarily detected and localized by a base scanning pattern with only a few measurement data. According to this detected region, the adaptive strategy can select the optimal current injection and voltage measurement pattern to improve the sensing performance by maximizing the current density. To verify the effectiveness of the proposed strategy, the proposed method is comprehensively evaluated by simulation and experiments. The results revealed that the optimal strategy can effectively improve both spatial and temporal resolution.
Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current models, leading to image distorti...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current models, leading to image distortion and temporal inconsistency. We point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information. Specifically, multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multilevel temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multifrequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over the state-of-the-art works. Source code and videos are available at https://***/Bei-Jin/STMFANet.
Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to ...
详细信息
This paper considers the problem of decentralized optimization on compact submanifolds, where a finite sum of smooth (possibly non-convex) local functions is minimized by n agents forming an undirected and connected g...
详细信息
A railroad technical operation station is an extensive and complex operation system and a variety of random disturbing factors further increase the complexity of its command and dispatching. To enhance the railroad te...
A railroad technical operation station is an extensive and complex operation system and a variety of random disturbing factors further increase the complexity of its command and dispatching. To enhance the railroad technical operation stations' intelligence, and integration level, and enhance its dispatching management ability of complex operation plans, based on the ACP approach, this paper designs the parallel dispatching system (PDS) architecture, and describes its key technologies including the artificial dispatching system (ADS), computing experiments, parallel execution, etc. On the basis of the artificial dispatching system, computational experiments can optimize various types of technical operations and the overall operational process, to achieve parallel execution between the actual dispatching system and ADS. PDS can provide more efficient and intelligent command and dispatching solutions for railroad technical work stations, and promote the development of the railroad transportation industry in the direction of higher quality and efficiency.
作者:
Zhang, JianTu, BingLiu, BoLi, JunPlaza, AntonioNanjing University of Information Science and Technology
Institute of Optics and Electronics State Key Laboratory Cultivation Base of Atmospheric Optoelectronic Detection and Information Fusion Jiangsu International Joint Laboratory on Meteorological Photonics and Optoelectronic Detection Jiangsu Engineering Research Center for Intelligent Optoelectronic Sensing Technology of Atmosphere Nanjing210044 China
Faculty of Computer Science Wuhan430074 China University of Extremadura
Hyperspectral Computing Laboratory Department of Technology of Computers and Communications Escuela Politécnica Cáceres10003 Spain
Graph convolutional network (GCN) has garnered significant attention in hyperspectral image (HSI) classification due to their ability to model non-Euclidean structured data. Compared with convolutional neural network ...
详细信息
Graph Structure Learning (GSL) has recently garnered considerable attention due to its ability to optimize both the parameters of Graph Neural Networks (GNNs) and the computation graph structure simultaneously. Despit...
详细信息
In public roads, autonomous vehicles (AVs) face the challenge of frequent interactions with human-driven vehicles (HDVs), which render uncertain driving behavior due to varying social characteristics among humans. To ...
详细信息
In public roads, autonomous vehicles (AVs) face the challenge of frequent interactions with human-driven vehicles (HDVs), which render uncertain driving behavior due to varying social characteristics among humans. To effectively assess the risks prevailing in the vicinity of AVs in social interactive traffic scenarios and achieve safe autonomous driving, this article proposes a social-suitable and safety-sensitive trajectory planning (S $^{\text{4}}$ TP) framework. Specifically, S $^{\text{4}}$ TP integrates the Social-Aware Trajectory Prediction (SATP) and Social-Aware Driving Risk Field (SADRF) modules. SATP utilizes Transformers to effectively encode the driving scene and incorporates an AV's planned trajectory during the prediction decoding process. SADRF assesses the expected surrounding risk degrees during AVs-HDVs interactions, each with different social characteristics, visualized as two-dimensional heat maps centered on the AV. SADRF models the driving intentions of the surrounding HDVs and predicts trajectories based on the representation of vehicular interactions. S $^{\text{4}}$ TP employs an optimization-based approach for motion planning, utilizing the predicted HDVs' trajectories as input. With the integration of SADRF, S $^{\text{4}}$ TP executes real-time online optimization of the planned trajectory of AV within low-risk regions, thus improving the safety and the interpretability of the planned trajectory. We have conducted comprehensive tests of the proposed method using the SMARTS simulator. Experimental results in complex social scenarios, such as unprotected left-turn intersections, merging, cruising, and overtaking, validate the superiority of our proposed S $^{\text{4}}$ TP in terms of safety and rationality. S $^{\text{4}}$ TP achieves a pass rate of 100% across all scenarios, surpassing the current state-of-the-art methods Fanta of 98.25% and Predictive-Decision of 94.75%.
暂无评论