Multi‐object tracking in autonomous driving is a non‐linear *** better address the tracking problem,this paper leveraged an unscented Kalman filter to predict the object's *** the association stage,the Mahalanob...
详细信息
Multi‐object tracking in autonomous driving is a non‐linear *** better address the tracking problem,this paper leveraged an unscented Kalman filter to predict the object's *** the association stage,the Mahalanobis distance was employed as an affinity metric,and a Non‐minimum Suppression method was designed for *** the detections fed into the tracker and continuous‘predicting‐matching’steps,the states of each object at different time steps were described as their own continuous *** conducted extensive experiments to evaluate tracking accuracy on three challenging datasets(KITTI,nuScenes and Waymo).The experimental results demon-strated that our method effectively achieved multi‐object tracking with satisfactory ac-curacy and real‐time efficiency.
In multi-label learning(MLL), it is extremely challenging to accurately annotate every appearing object due to expensive costs and limited knowledge. When facing such a challenge, a more practical and cheaper alternat...
In multi-label learning(MLL), it is extremely challenging to accurately annotate every appearing object due to expensive costs and limited knowledge. When facing such a challenge, a more practical and cheaper alternative should be single positive multi-label learning(SPMLL), where only one positive label needs to be provided per sample. Existing SPMLL methods usually assume unknown labels as negatives, which inevitably introduces false negatives as noisy labels. More seriously, binary cross entropy(BCE) loss is often used for training, which is notoriously not robust to noisy labels. To mitigate this issue, we customize an objective function for SPMLL by pushing only one pair of labels apart each time to suppress the domination of negative labels, which is the main culprit of fitting noisy labels in SPMLL. To further combat such noisy labels, we explore the high-rankness of the label matrix, which can also push apart different labels. By directly extending from SPMLL to MLL with full labels, a unified loss applicable to both settings is derived. As a byproduct, the proposed loss can alleviate the imbalance inherent in MLL. Experiments on real datasets demonstrate that the proposed loss not only performs more robustly to noisy labels for SPMLL but also works well for full labels. Besides, we empirically discover that high-rankness can mitigate the dramatic performance drop in SPMLL. Most surprisingly, even without any regularization or fine-tuned label correction, only adopting our loss defeats state-of-the-art SPMLL methods on CUB, a dataset that severely lacks labels.
Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a p...
详细信息
Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a pace that is faster than the increase in the big data processing capacity of ***,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in *** performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming *** distributed computing frameworks need to be developed to conquer these *** this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data *** addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
Large language models(LLMs) have demonstrated remarkable effectiveness across various natural language processing(NLP) tasks, as evidenced by recent studies [1, 2]. However, these models often produce responses that c...
Large language models(LLMs) have demonstrated remarkable effectiveness across various natural language processing(NLP) tasks, as evidenced by recent studies [1, 2]. However, these models often produce responses that conflict with reality due to the unreliable distribution of facts within their training data, which is particularly critical for applications requiring high credibility and accuracy [3].
We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Facial expression recognition (FER) is a critical area of research in face analysis. While 2D data has been extensively used, 3D data offers inherent advantages, such as increased resilience to illumination and pose v...
详细信息
Despite recent success of deep learning models in research settings, their application in sensitive domains remains limited because of their opaque decision-making processes. Taking to this challenge, people have prop...
详细信息
Multi-focus image fusion (MFIF) explores the positioning and reorganization of the focused parts from the input images. Focused and defocused parts have similar representations in color, contour and other appearance i...
详细信息
Migration has been a universal phenomenon, which brings opportunities as well as challenges for global development. As the number of migrants (e.g., refugees) increases rapidly, a key challenge faced by each country i...
详细信息
暂无评论