Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and stor...
详细信息
Maneuvering target tracking of Unmanned Aerial Vehicle(UAV) in cluttered environments is a challenging issue owing to the unknown motion intention of the target and the complex moving environments. As the complexity o...
详细信息
ISBN:
(纸本)9781665481106
Maneuvering target tracking of Unmanned Aerial Vehicle(UAV) in cluttered environments is a challenging issue owing to the unknown motion intention of the target and the complex moving environments. As the complexity of the environment increases, stable and secure target tracking is increasingly difficult to guarantee. To address the issue, this paper proposes a stable quadrotor tracking solution. The proposed solution contains two parts: target motion prediction and tracking path searching. The target motion prediction method predicts the future target motion based on the obtained target observations while considering observation noise and prediction errors. The tracking path searching method utilizes a sampling-based search method, using the homotopy of paths to ensure that the tracking path and the target position are in the same space. Finally, simulations, real-world experiments and statistical analysis verify the correctness and effectiveness of the proposed approach.
Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embeddi...
详细信息
Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embedding:the process of visual embedding is both time and computationally expensive due to the prerequisite extraction of explicit visual cues from the original image before input into the multimodal ***,these approaches cannot achieve efficient online reasoning;(2)suboptimal interaction handling:the prevalent method of managing interaction between different modalities typically relies on the alternation of self-attention and cross-attention mechanisms or excessive dependence on the gating *** explicit modeling method may fail to capture some nuanced relations between image and text,ultimately undermining the model’s capability to extract optimal *** address these challenges,we introduce Implicit Modality Mining(IMM),a novel end-to-end framework for fine-grained image-text correlation without heavy visual *** uses an Implicit Semantic Alignment module with a Transformer for cross-modal clues and an Insert-Activation module to effectively utilize these *** approach achieves state-of-the-art performance on three datasets.
In server board assembly tasks, the effect of vision-based robot assembly schemes is not ideal due to the small installation gap and the blocking of vision. Adding force sensors and force controllers can be a good sol...
详细信息
ISBN:
(数字)9798350340266
ISBN:
(纸本)9798350340273
In server board assembly tasks, the effect of vision-based robot assembly schemes is not ideal due to the small installation gap and the blocking of vision. Adding force sensors and force controllers can be a good solution to the above problems and increase the flexibility in the assembly process. In this paper, we propose a force control strategy applied to a server board assembly task. The method is divided into two main parts. In the first part, the zero offset of the force sensor and the load gravity are calibrated and compensated, so that the external force on the load is accurately obtained. In the second part, the admittance controller is designed to achieve compliant behavior between the robot and the environment. Finally, the experimental verification of the board insertion is carried out on the experimental platform. Experimental results verify the effectiveness and practicability of the proposed method.
Temporal information plays a pivotal role in Bird’s-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the b...
详细信息
Quantum machine learning is considered one of the current research fields with great potential. In recent years, Havlíček et al. [Nature 567, 209-212 (2019)] have proposed a quantum machine learning algorithm wit...
详细信息
Online High-Definition (HD) maps have emerged as the preferred option for autonomous driving, overshadowing the counterpart offline HD maps due to flexible update capability and lower maintenance costs. However, conte...
详细信息
Online High-Definition (HD) maps have emerged as the preferred option for autonomous driving, overshadowing the counterpart offline HD maps due to flexible update capability and lower maintenance costs. However, contemporary online HD map models embed parameters of visual sensors into training, resulting in a significant decrease in generalization performance when applied to visual sensors with different parameters. Inspired by the inherent potential of Inverse Perspective Mapping (IPM), where camera parameters are decoupled from the training process, we have designed a universal map generation framework, GenMapping. The framework is established with a triadic synergy architecture, including principal and dual auxiliary branches. When faced with a coarse road image with local distortion translated via IPM, the principal branch learns robust global features under the state space models. The two auxiliary branches are a dense perspective branch and a sparse prior branch. The former exploits the correlation information between static and moving objects, whereas the latter introduces the prior knowledge of OpenStreetMap (OSM). The triple-enhanced merging module is crafted to synergistically integrate the unique spatial features from all three branches. To further improve generalization capabilities, a Cross-View Map Learning (CVML) scheme is leveraged to realize joint learning within the common space. Additionally, a Bidirectional Data Augmentation (BiDA) module is introduced to mitigate reliance on datasets concurrently. A thorough array of experimental results shows that the proposed model surpasses current state-of-the-art methods in both semantic mapping and vectorized mapping, while also maintaining a rapid inference speed. Moreover, in cross-dataset experiments, the generalization of semantic mapping is improved by 17.3% in mIoU, while vectorized mapping is improved by 12.1% in mAP. The source code will be publicly available at https://***/lynn-yu/GenMappin
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works o...
详细信息
Simultaneous Localization And Mapping (SLAM) has become a crucial aspect in the fields of autonomous driving and robotics. One crucial component of visual SLAM is the Field-of-View (FoV) of the camera, as a larger FoV...
详细信息
robot teleoperation attracts growing attention of researchers in many domains. Plenty of factors contribute to the good performance of a smart teleoperation system, and one crucial factor is that it provides an enviro...
详细信息
暂无评论