Visual localization and object detection both play important roles in various *** many indoor application scenarios where some detected objects have fixed positions,the two techniques work closely ***,few researchers ...
详细信息
Visual localization and object detection both play important roles in various *** many indoor application scenarios where some detected objects have fixed positions,the two techniques work closely ***,few researchers consider these two tasks simultaneously,because of a lack of datasets and the little attention paid to such *** this paper,we explore multi-task network design and joint refinement of detection and *** address the dataset problem,we construct a medium indoor scene of an aviation exhibition hall through a semi-automatic *** dataset provides localization and detection information,and is publicly available at https://***/drive/folders/1U28zk0N4_I0db zkqyIAK1A15k9oUKOjI?usp=sharing for benchmarking localization and object detection *** this dataset,we have designed a multi-task network,JLDNet,based on YOLO v3,that outputs a target point cloud and object bounding *** dynamic environments,the detection branch also promotes the perception of *** includes image feature learning,point feature learning,feature fusion,detection construction,and point cloud ***,object-level bundle adjustment is used to further improve localization and detection *** test JLDNet and compare it to other methods,we have conducted experiments on 7 static scenes,our constructed dataset,and the dynamic TUM RGB-D and Bonn *** results show state-of-the-art accuracy for both tasks,and the benefit of jointly working on both tasks is demonstrated.
The field of three-dimensional reconstruction plays a pivotal role across diverse domains such as computer graphics, virtual reality, robotics, archaeology, and medical imaging. This paper presents a novel deep learni...
详细信息
Multi‐object tracking in autonomous driving is a non‐linear *** better address the tracking problem,this paper leveraged an unscented Kalman filter to predict the object's *** the association stage,the Mahalanob...
详细信息
Multi‐object tracking in autonomous driving is a non‐linear *** better address the tracking problem,this paper leveraged an unscented Kalman filter to predict the object's *** the association stage,the Mahalanobis distance was employed as an affinity metric,and a Non‐minimum Suppression method was designed for *** the detections fed into the tracker and continuous‘predicting‐matching’steps,the states of each object at different time steps were described as their own continuous *** conducted extensive experiments to evaluate tracking accuracy on three challenging datasets(KITTI,nuScenes and Waymo).The experimental results demon-strated that our method effectively achieved multi‐object tracking with satisfactory ac-curacy and real‐time efficiency.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Traditional puppet manipulation systems often require human operators to be physically present in tight and cramped locations. This leads to challenges in the positioning and effective operation of puppets, particular...
详细信息
The process of classifying RGB images is a basic and noxious activity in computer vision with an array of applications in face recognition, traffic analysis, and security protocols. An important part of this mission e...
详细信息
Precise polyp segmentation is vital for the early diagnosis and prevention of colorectal cancer(CRC)in clinical ***,due to scale variation and blurry polyp boundaries,it is still a challenging task to achieve satisfac...
详细信息
Precise polyp segmentation is vital for the early diagnosis and prevention of colorectal cancer(CRC)in clinical ***,due to scale variation and blurry polyp boundaries,it is still a challenging task to achieve satisfactory segmentation performance with different scales and *** this study,we present a novel edge-aware feature aggregation network(EFA-Net)for polyp segmentation,which can fully make use of cross-level and multi-scale features to enhance the performance of polyp ***,we first present an edge-aware guidance module(EGM)to combine the low-level features with the high-level features to learn an edge-enhanced feature,which is incorporated into each decoder unit using a layer-by-layer ***,a scale-aware convolution module(SCM)is proposed to learn scale-aware features by using dilated convolutions with different ratios,in order to effectively deal with scale ***,a cross-level fusion module(CFM)is proposed to effectively integrate the cross-level features,which can exploit the local and global contextual ***,the outputs of CFMs are adaptively weighted by using the learned edge-aware feature,which are then used to produce multiple side-out segmentation *** results on five widely adopted colonoscopy datasets show that our EFA-Net outperforms state-of-the-art polyp segmentation methods in terms of generalization and *** implementation code and segmentation maps will be publicly at https://***/taozh2017/EFANet.
For permanent magnet synchronous machines(PMSMs),accurate inductance is critical for control design and condition *** to magnetic saturation,existing methods require nonlinear saturation model and measurements from mu...
详细信息
For permanent magnet synchronous machines(PMSMs),accurate inductance is critical for control design and condition *** to magnetic saturation,existing methods require nonlinear saturation model and measurements from multiple load/current conditions,and the estimation is relying on the accuracy of saturation model and other machine parameters in the *** harmonic produced by harmonic currents is inductance-dependent,and thus this paper explores the use of magnitude and phase angle of the speed harmonic for accurate inductance *** estimation models are built based on either the magnitude or phase angle,and the inductances can be from d-axis voltage and the magnitude or phase angle,in which the filter influence in harmonic extraction is considered to ensure the estimation *** inductances can be estimated from the measurements under one load condition,which is free of saturation ***,the inductance estimation is robust to the change of other machine *** proposed approach can effectively improve estimation accuracy especially under the condition with low current *** and comparisons are conducted on a test PMSM to validate the proposed approach.
We propose the lattice design that allows multiple topologically protected edge modes. The scattering between these modes, which is linear, energy preserving, and robust against local disorders, is discussed in terms ...
暂无评论