检索结果-内蒙古大学图书馆

UKF‐MOT:An unscented Kalman filter‐based 3D multi‐object tracker

CAAI Transactions on intelligence Technology 2024年第4期9卷 1031-1041页

作者： Meng Liu Jianwei Niu Yu Liu Collective Intelligence&Collaboration Laboratory China North Artificial Intelligence and Innovation Research InstituteBeijingChina China North Vehicle Research Institute BeijingChina State Key Laboratory of Software Development Environment School of Computer Science and EngineeringBeihang UniversityBeijingChina State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and EngineeringBeihang UniversityBeijingChina School of Computer Science and Engineering Beihang UniversityBeijingChina

Multi‐object tracking in autonomous driving is a non‐linear *** better address the tracking problem,this paper leveraged an unscented Kalman filter to predict the object's *** the association stage,the Mahalanobis distance was employed as an affinity metric,and a Non‐minimum Suppression method was designed for *** the detections fed into the tracker and continuous‘predicting‐matching’steps,the states of each object at different time steps were described as their own continuous *** conducted extensive experiments to evaluate tracking accuracy on three challenging datasets(KITTI,nuScenes and Waymo).The experimental results demon-strated that our method effectively achieved multi‐object tracking with satisfactory ac-curacy and real‐time efficiency.

关键词： autonomous vehicle transportation

来源：评论

学校读者我要写书评

暂无评论

Pushing one pair of labels apart each time in multi-label learning: from single positive to full labels

引用

science China(Information sciences) 2025年第6期 268-285页

作者： Xiang LI Xinrui WANG Songcan CHEN MIIT Key Laboratory of Pattern Analysis and Machine Intelligence College of Computer Science and Technology/College of Artificial Intelligence Nanjing University of Aeronautics and Astronautics

In multi-label learning(MLL), it is extremely challenging to accurately annotate every appearing object due to expensive costs and limited knowledge. When facing such a challenge, a more practical and cheaper alternative should be single positive multi-label learning(SPMLL), where only one positive label needs to be provided per sample. Existing SPMLL methods usually assume unknown labels as negatives, which inevitably introduces false negatives as noisy labels. More seriously, binary cross entropy(BCE) loss is often used for training, which is notoriously not robust to noisy labels. To mitigate this issue, we customize an objective function for SPMLL by pushing only one pair of labels apart each time to suppress the domination of negative labels, which is the main culprit of fitting noisy labels in SPMLL. To further combat such noisy labels, we explore the high-rankness of the label matrix, which can also push apart different labels. By directly extending from SPMLL to MLL with full labels, a unified loss applicable to both settings is derived. As a byproduct, the proposed loss can alleviate the imbalance inherent in MLL. Experiments on real datasets demonstrate that the proposed loss not only performs more robustly to noisy labels for SPMLL but also works well for full labels. Besides, we empirically discover that high-rankness can mitigate the dramatic performance drop in SPMLL. Most surprisingly, even without any regularization or fine-tuned label correction, only adopting our loss defeats state-of-the-art SPMLL methods on CUB, a dataset that severely lacks labels.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

引用

Big Data Mining and Analytics 2023年第2期6卷 154-169页

作者： Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang College of Computer Science and Software Engineering Shenzhen UniversityShenzhen 518060China Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ) Shenzhen 518107China

Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a pace that is faster than the increase in the big data processing capacity of ***,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in *** performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming *** distributed computing frameworks need to be developed to conquer these *** this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data *** addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.

关键词： distributed computing frameworks big data analysis approximate computing MapReduce computing model

来源：评论

学校读者我要写书评

暂无评论

Unveiling factuality and injecting knowledge for LLMs via reinforcement learning and data proportion

引用

science China(Information sciences) 2024年第10期67卷 389-390页

作者： Wenjun KE Ziyu SHANG Zhizhao LUO Peng WANG Yikai GUO Qi LIU Yuxuan CHEN School of Computer Science and Engineering Southeast University Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications(Southeast University) Beijing Institute of Technology Zhuhai Beijing Institute of Computer Technology and Application

Large language models(LLMs) have demonstrated remarkable effectiveness across various natural language processing(NLP) tasks, as evidenced by recent studies [1, 2]. However, these models often produce responses that conflict with reality due to the unreliable distribution of facts within their training data, which is particularly critical for applications requiring high credibility and accuracy [3].

关键词：

来源：评论

学校读者我要写书评

暂无评论

BONGARD-OPENWORLD: FEW-SHOT REASONING FOR FREE-FORM VISUAL CONCEPTS IN THE REAL WORLD 12

BONGARD-OPENWORLD: FEW-SHOT REASONING FOR FREE-FORM VISUAL C...

引用

12th International Conference on Learning Representations, ICLR 2024

作者： Wu, Rujie Ma, Xiaojian Zhang, Zhenliang Wang, Wei Li, Qing Zhu, Song-Chun Wang, Yizhou School of Computer Science Peking University China National Key Laboratory of General Artificial Intelligence BIGAI China School of Intelligence Science and Technology Peking University China Institute for Artificial Intelligence Peking University China

We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge;2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge to current few-shot reasoning algorithms. We further investigate to which extent the recently introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can solve our task, by directly probing VLMs, and combining VLMs and LLMs in an interactive reasoning scheme. We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems. However, none of these approaches manage to close the human-machine gap, as the best learner achieves 64% accuracy while human participants easily reach 91%. We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

ViGT: proposal-free video grounding with a learnable token in the transformer

引用

science China(Information sciences) 2023年第10期66卷 196-212页

作者： Kun LI Dan GUO Meng WANG School of Computer Science and Information Engineering Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Intelligent Interconnected Systems Laboratory of Anhui Province Institute of Artificial Intelligence Hefei Comprehensive National Science Center

The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.

关键词： video grounding temporal sentence grounding boundary regression token learning proposal-free

来源：评论

学校读者我要写书评

暂无评论

Progressive Self-supervised Representation Learning for 3D Facial Expression Recognition 18

Progressive Self-supervised Representation Learning for 3D F...

引用

18th IEEE International Joint Conference on Biometrics, IJCB 2024

作者： Li, Hebeizi Yang, Hongyu Huang, Di Beihang University School of Computer Science and Engineering Beijing China Beihang University Institute of Artificial Intelligence Beijing China Shanghai Artificial Intelligence Laboratory Shanghai China

ISBN: (纸本)9798350364132

Facial expression recognition (FER) is a critical area of research in face analysis. While 2D data has been extensively used, 3D data offers inherent advantages, such as increased resilience to illumination and pose variations. However, the limited size of current 3D FER datasets significantly constrains the performance of 3D FER methods. To overcome this challenge, we propose a novel self-supervised pre-training scheme by leveraging large-scale external 3D data, followed by fine-tuning on 3D FER datasets. Our approach starts with self-supervised learning on a large-scale 3D point cloud object dataset, specifically ShapeNet. We then move on to the FaceScape dataset, which is primarily used for morphable face prediction. To enhance robustness, we integrate synthetic data before fine-tuning on specific FER datasets. This multi-stage process allows the model to progressively learn 3D facial expression representations from coarse to fine. For this purpose, we utilize Point-MAE, a leading self-supervised model for representation learning. To enhance its ability for FER task, we further incorporate facial priors in the masking and point sampling steps, leveraging the distinctive characteristics of facial data. Our method achieves state-of-the-art performance on both BU-3DFE and Bosphorus datasets, matching or surpassing results achieved by other 2D+3D FER techniques. © 2024 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

TOWARDS INTERPRETABLE DEEP REINFORCEMENT LEARNING WITH HUMAN-FRIENDLY PROTOTYPES 11

TOWARDS INTERPRETABLE DEEP REINFORCEMENT LEARNING WITH HUMAN...

引用

11th International Conference on Learning Representations, ICLR 2023

作者： Kenny, Eoin M. Tucker, Mycal Shah, Julie A. Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology United States

Despite recent success of deep learning models in research settings, their application in sensitive domains remains limited because of their opaque decision-making processes. Taking to this challenge, people have proposed various eXplainable AI (XAI) techniques designed to calibrate trust and understandability of black-box models, with the vast majority of work focused on supervised learning. Here, we focus on making an "interpretable-by-design" deep reinforcement learning agent which is forced to use human-friendly prototypes in its decisions, thus making its reasoning process clear. Our proposed method, dubbed Prototype-Wrapper Network (PW-Net), wraps around any neural agent backbone, and results indicate that it does not worsen performance relative to black-box models. Most importantly, we found in a user study that PW-Nets supported better trust calibration and task performance relative to standard interpretability approaches and black-boxes. © 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

SMFuse: Two-Stage Structural Map Aware Network for Multi-focus Image Fusion 27th

SMFuse: Two-Stage Structural Map Aware Network for Multi-foc...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Shen, Tianyu Li, Hui Cheng, Chunyang Shen, Zhongwei Song, Xiaoning International Joint Laboratory on Artificial Intelligence of Jiangsu Province School of Artificial Intelligence and Computer Science Jiangnan University Wuxi China School of Electronic and Information Engineering Suzhou University of Science and Technology Suzhou China

ISBN: (纸本)9783031783111

Multi-focus image fusion (MFIF) explores the positioning and reorganization of the focused parts from the input images. Focused and defocused parts have similar representations in color, contour and other appearance information, which degrades the fusion quality due to the influence of these redundant information. Currently, most MFIF methods have not identified an effective way to remove redundant information before fusion stage. Thus, in this paper, we introduce a structural map extraction strategy for multi-focus image fusion. Compared to the source image, structural map reduces redundant information, and the clearer parts of the image retain more abundant structural features. Consequently, the differences between focused part and defocused part become more pronounced based on the extracted structural map. Specifically, the proposed fusion method adopts a two-stage training strategy. Firstly, the structural map is extracted by the proposed structural map extraction network (SMENet) from the source images. Secondly, the structural map is thus applied to train the decision map generation network (DMGNet) to obtain the decision map which is utilized to generate the final fusion image. Qualitative and quantitative experiments on three public datasets demonstrate the superiority of the proposed method, compared with the advanced image fusion algorithms. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image fusion

来源：评论

学校读者我要写书评

暂无评论

Migrant Resettlement by Evolutionary Multiobjective Optimization

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on artificial intelligence 2025年第1期6卷 51-65页

作者： Liu, Dan-Xuan Gu, Yu-Ran Qian, Chao Mu, Xin Tang, Ke Nanjing University National Key Laboratory for Novel Software Technology School of Artificial Intelligence Nanjing210023 China Peng Cheng Laboratory Shenzhen518000 China Southern University of Science and Technology Department of Computer Science and Engineering Shenzhen518055 China

Migration has been a universal phenomenon, which brings opportunities as well as challenges for global development. As the number of migrants (e.g., refugees) increases rapidly, a key challenge faced by each country is the problem of migrant resettlement. This problem has attracted scientific research attention, from the perspective of maximizing the employment rate. Previous works mainly formulated migrant resettlement as an approximately submodular optimization problem subject to multiple matroid constraints and employed the greedy algorithm, whose performance, however, may be limited due to its greedy nature. In this article, we propose a new framework called migrant resettlement by evolutionary multiobjective optimization (MR-EMO), which reformulates migrant resettlement as a biobjective optimization problem that maximizes the expected number of employed migrants and minimizes the number of dispatched migrants simultaneously, and employs a multiobjective evolutionary algorithm (MOEA) to solve the biobjective problem. We implement MR-EMO using three MOEAs: the popular nondominated sorting genetic algorithm II (NSGA-II), MOEA based on decomposition (MOEA/D) as well as the theoretically grounded global simple evolutionary multiobjective optimizer (GSEMO). To further improve the performance of MR-EMO, we propose a specific MOEA, called GSEMO using matrix-swap mutation and repair mechanism (GSEMO-SR), which has a better ability to search for feasible solutions. We prove that MR-EMO using either GSEMO or GSEMO-SR can achieve better theoretical guarantees than the previous greedy algorithm. Experimental results under the interview and coordination migration models clearly show the superiority of MR-EMO (with either NSGA-II, MOEA/D, GSEMO or GSEMO-SR) over previous algorithms, and that using GSEMO-SR leads to the best performance of MR-EMO. © 2024 IEEE.

关键词： Multiobjective optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：