检索结果-内蒙古大学图书馆

KAN v.s. MLP for Offline Reinforcement Learning

学校读者我要写书评

暂无评论

KAN v.s. MLP for Offline Reinforcement Learning

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Haihong Guo Fengxin Li Jiao Li Hongyan Liu School of Information Renmin University of China China Institute of Medical Information / Medical Library Chinese Academy of Medical Sciences / Peking Union Medical College China Key Laboratory of Data Engineering and Knowledge Engineering Ministry of Education China School of Economics and Management Tsinghua University China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Kolmogorov-Arnold Networks (KAN) is an emerging neural network architecture in machine learning. It has greatly interested the research community about whether KAN can be a promising alternative to the commonly used Multi-Layer Perceptions (MLP). Experiments in various fields demonstrated that KAN-based machine learning can achieve comparable if not better performance than MLP-based methods, but with much smaller parameter scales and are more explainable. In this paper, we explore the incorporation of KAN into the actor and critic networks for offline reinforcement learning (RL). We evaluated the performance, parameter scales, and training efficiency of various KAN and MLP-based conservative Q-learning (CQL) on the classical D4RL benchmark for offline RL. Our study demonstrates that KAN can achieve performance close to the commonly used MLP with significantly fewer parameters. This allows us to choose the base networks according to the offline RL task requirements.

关键词： Training Performance evaluation Hands Q-learning Neural networks Memory management Signal processing Benchmark testing Splines (mathematics) Speech processing

Embedding VLAD in Transformer for Video Question Answering

学校读者我要写书评

暂无评论

Jisuanji Xuebao/Chinese Journal of Computers 2023年第4期46卷 671-689页

作者： Guo, Dan Yao, Shen-Tao Wang, Hui Wang, Meng School of Computer and Information Engineering Hefei University of Technology Hefei230601 China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei230094 China Key Laboratory of Knowledge Engineering with Big Data（Hefei University of Technology） Ministry of Education Hefei230601 China Intelligent Interconnected Systems Laboratory of Anhui Province（Hefei University of Technology） Hefei230601 China

Video question answering（VideoQA）is a typical cross-modal understanding task. Its challenge lies in how to learn appropriate multimodal representation and cross-modal correlation for answer inference. Most existing video question answering methods focus on the latter，e. g.，relationship learning between each video frame or clip and word. In this work， we devote to advanced feature embedding of both video and query. We develop a clustering-based VLAD technique for VideoQA. The novelty of our work is the joint exploitation of temporal aggregation and correlation in multimodality. We propose an end-to-end trainable Transformed VLAD embedding network， named TVLAD-Net. TVLAD-Net constructs a differentiable aggregation network module（i. e.， convolutional Residual-less VLAD Block）to generate compact VLAD descriptors（transforming N frames，clips or words to compact K descriptors while K © 2023 Science Press. All rights reserved.

关键词： Semantics

Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Liu, Hanbing Li, Haoyang Zhang, Xiaokang Chen, Ruotong Xu, Haiyong Tian, Tian Qi, Qi Zhang, Jing Gaoling School of Artificial Intelligence Renmin University of China Beijing China School of Information Renmin University of China Beijing China Key Laboratory of Data Engineering and Knowledge Engineering Beijing China Engineering Research Center of Database and Business Intelligence Beijing China China Mobile Information Technology Center China

Direct Preference Optimization (DPO) has proven effective in complex reasoning tasks like math word problems and code generation. However, when applied to Text-to-SQL datasets, it often fails to improve performance and can even degrade it. Our investigation reveals the root cause: unlike math and code tasks, which naturally integrate Chain-of-Thought (CoT) reasoning with DPO, Text-to-SQL datasets typically include only final answers (gold SQL queries) without detailed CoT solutions. By augmenting Text-to-SQL datasets with synthetic CoT solutions, we achieve, for the first time, consistent and significant performance improvements using DPO. Our analysis shows that CoT reasoning is crucial for unlocking DPO’s potential, as it mitigates reward hacking, strengthens discriminative capabilities, and improves scalability. These findings offer valuable insights for building more robust Text-to-SQL models. To support further research, we publicly release the code and CoT-enhanced datasets © 2025, CC BY.

关键词： Chains

Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Chen, Haipeng Wu, Sifan Wang, Zhigang Yin, Yifang Jiao, Yingying Lyu, Yingda Liu, Zhenguang College of Computer Science and Technology Jilin University China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University China College of Computer Science and Technology Zhejiang Gongshang University China A*STAR Singapore Public Computer Education and Research Center Jilin University China The State Key Laboratory of Blockchain and Data Security Zhejiang University China Institute of Blockchain and Data Security China

Video-based human pose estimation has long been a fundamental yet challenging problem in computer vision. Previous studies focus on spatio-temporal modeling through the enhancement of architecture design and optimization strategies. However, they overlook the causal relationships in the joints, leading to models that may be overly tailored and thus estimate poorly to challenging scenes. Therefore, adequate causal reasoning capability, coupled with good interpretability of model, are both indispensable and prerequisite for achieving reliable results. In this paper, we pioneer a causal perspective on pose estimation and introduce a causal-inspired multitask learning framework, consisting of two stages. In the first stage, we try to endow the model with causal spatio-temporal modeling ability by introducing two self-supervision auxiliary tasks. Specifically, these auxiliary tasks enable the network to infer challenging keypoints based on observed keypoint information, thereby imbuing causal reasoning capabilities into the model and making it robust to challenging scenes. In the second stage, we argue that not all feature tokens contribute equally to pose estimation. Prioritizing causal (keypoint-relevant) tokens is crucial to achieve reliable results, which could improve the interpretability of the model. To this end, we propose a Token Causal Importance Selection module to identify the causal tokens and non-causal tokens (e.g., background and objects). Additionally, non-causal tokens could provide potentially beneficial cues but may be redundant. We further introduce a non-causal tokens clustering module to merge the similar non-causal tokens. Extensive experiments show that our method outperforms state-of-the-art methods on three large-scale benchmark datasets. Copyright © 2025, The Authors. All rights reserved.

关键词： Self-supervised learning

RGB-D Visual Odometry Based on Semantic Feature Points in Dynamic Environments 1st

学校读者我要写书评

暂无评论

RGB-D Visual Odometry Based on Semantic Feature Points in ...

1st CAAI International Conference on Artificial Intelligence, CICAI 2021

作者： Wang, Hao Wang, Yincan Fang, Baofu Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology Anhui230009 China School of Computer Science and Information Engineering HeFei University of Technology Anhui Hefei230009 China

ISBN: (纸本)9783030930486

Various algorithms of traditional visual Simultaneous Localization and Mapping (SLAM) can well match with static scenes, but mismatches will occur in dynamic scenes, which makes the positioning and mapping of the SLAM system produce large errors. Therefore, this paper proposed a visual odometry algorithm based on semantic feature points, which can improve the positioning accuracy in dynamic scenes. The algorithm combined semantic information to detect dynamic objects, and then detects and eliminates dynamic feature points. This paper conducted an extensive evaluation of the system and compared it with ORB-SLAM3 and other dynamic scene SLAM systems. The experimental results show that this method greatly improves the positioning accuracy of the camera and the robustness of the system in a highly complex dynamic environment, which verifies the advancement and effectiveness of the algorithm in this paper. © 2021, Springer Nature Switzerland AG.

关键词： Semantics

An Efficient Customized Blockchain System for Inter-Organizational Processes

学校读者我要写书评

暂无评论

An Efficient Customized Blockchain System for Inter-Organiza...

IEEE International Conference on Web services (ICWS)

作者： Puwei Wang Zhouxing Sun Rui Li Jinchuan Chen Ping Gong Xiaoyong Du School of Information Renmin University of China Key Laboratory of Data Engineering and Knowledge Engineering (Ministry of Education) Renmin University of China School of Mathematics Renmin University of China College of Computer and Cyber Security Fujian Normal University

Blockchain technologies pave a promising way for implementing the inter-organizational processes. Most of the current research works translate the execution logic in the process models into the smart contracts, which can run independently on the blockchain without the outside process engine. However, the works usually suffer from the execution and storage costs, since the translation needs to be done when the processes are deployed. In this paper, we customize a process engine for executing the inter-organizational business processes via a blockchain-style procedure, i.e., checking the validity of transactions, adding the valid transactions into the blockchain through the consensus mechanism, and then updating the process states according to the committed transactions. And then, we build a blockchain system by embedding the customized process engine into the blockchain nodes. Moreover, in order to realize the interactions between the inter-organizational processes running on blockchain and the services outside blockchain, we propose a blockchain-based approach for service registration, binding and invocation, and design a lease-based concurrency control protocol to logically isolate transactions from each other when invoking the services simultaneously. Finally, we implement a prototype system based on a permissioned blockchain platform Hyperledger Fabric and a process engine Activiti. The experimental results show the proposed blockchain system can execute the inter-organizational processes correctly and efficiently.

关键词：

Scene-Adaptive Person Search via Bilateral Modulations

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Jiang, Yimin Wang, Huibing Peng, Jinjia Fu, Xianping Wang, Yang School of Information Science and Technology Dalian Maritime University Dalian China School of Cyber Security and Computer Hebei University Baoding China Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology Hefei China

Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97.1% mAP and PRW with 60.5% mAP. The code is available at https://***/whbdmu/SEAS. © 2024, CC0.

关键词： Embeddings

UAV-assisted Joint Mobile Edge Computing and data Collection via Matching-enabled Deep Reinforcement Learning

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Wang, Boxiong Kang, Hui Li, Jiahui Sun, Geng Sun, Zemin Wang, Jiacheng Niyato, Dusit College of Computer Science and Technology Jilin University Changchun130012 China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Changchun130012 China College of Computing and Data Science Nanyang Technological University 639798 Singapore College of Computing and Data Science Nanyang Technological University Singapore

Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) and data collection (DC) have been popular research issues. Different from existing works that consider MEC and DC scenarios separately, this paper investigates a multi-UAV-assisted joint MEC-DC system. Specifically, we formulate a joint optimization problem to minimize the MEC latency and maximize the collected data volume. This problem can be classified as a non-convex mixed integer programming problem that exhibits long-term optimization and dynamics. Thus, we propose a deep reinforcement learning-based approach that jointly optimizes the UAV movement, user transmit power, and user association in real time to solve the problem efficiently. Specifically, we reformulate the optimization problem into an action space-reduced Markov decision process (MDP) and optimize the user association by using a two-phase matching-based association (TMA) strategy. Subsequently, we propose a soft actor-critic (SAC)-based approach that integrates the proposed TMA strategy (SAC-TMA) to solve the formulated joint optimization problem collaboratively. Simulation results demonstrate that the proposed SAC-TMA is able to coordinate the two subsystems and can effectively reduce the system latency and improve the data collection volume compared with other benchmark algorithms. Copyright © 2025, The Authors. All rights reserved.

关键词： Unmanned aerial vehicles (UAV)

A Zero-shot Learning Method with a Multi-Modal knowledge Graph

学校读者我要写书评

暂无评论

A Zero-shot Learning Method with a Multi-Modal Knowledge Gra...

International Conference on Tools for Artificial Intelligence (ICTAI)

作者： Yuhong Zhang Haitao Shu Chenyang Bu Xuegang Hu Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei China School of Computer Science and Information Engineering Hefei University of Technology Hefei China

Zero-shot learning aims to recognize unseen-classes using some seen-class samples as training set. It is challenging owing to that the feature representations of unseen-class samples are unavailable. Existing methods transfer the mapping from seen-classes to unseen-classes with the correlation as a bridge, in which, the semantic representations are used to discriminate the classes. However, the unavailability of visual representations for unseen-classes and the insufficient discrimination of semantic representations make the zero-shot learning challenging. Therefore, the visual representations are learned as complements to semantic representations to construct a multi-modal knowledge graph (KG), and a zero-shot learning method based on multi-modal KG is proposed in this paper. Specially, a semantic KG is introduced to capture the correlation of classes, and with the correlation, the visual feature representations of all classes are learned. Then, the discriminative visual representations and the semantic representations are used together to construct a multi-modal KG. With the multi-modal KG, the classifier for seen-classes is transferred to unseen classes. Extensive experimental results show the effectiveness of our method.

关键词： Learning systems Representation learning Training Bridges Visualization Correlation Semantics