检索结果-内蒙古大学图书馆

Woodpecker: hallucination correction for multimodal large language models

science China(Information sciences) 2024年第12期67卷 52-64页

作者： Shukang YIN Chaoyou FU Sirui ZHAO Tong XU Hao WANG Dianbo SUI Yunhang SHEN Ke LI Xing SUN Enhong CHEN School of Artificial Intelligence and Data Science University of Science and Technology of China State Key Laboratory for Novel Software Technology Nanjing University School of Intelligence Science and Technology Nanjing University Institute of Automation Chinese Academy of Sciences YouTu

Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like woodpeckers heal trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://***/BradyFU/Woodpecker.

关键词： multimodal learning multimodal large language models hallucination correction large language models vision and language

来源：评论

学校读者我要写书评

暂无评论

Asymptotical event-based input-output constrained boundary control of flexible manipulator agents under a signed digraph

引用

science China(Information sciences) 2025年第4期68卷 319-335页

作者： Xiangqian YAO Xiangmin XU Wei HE Yu LIU School of Future Technology South China University of Technology School of Automation and Institute of Artificial Intelligence Beijing Information Science and Technology University School of Intelligence Science and Technology University of Science and Technology Beijing School of Automation Science and Engineering South China University of Technology

This article tackles the boundary event-based bipartite consensus tracking control problem for the flexible manipulator multi-agent network over a signed diagraph. Each follower agent is the flexible manipulator with unknown disturbances,modeling uncertainties, input saturations and backlashes, and asymmetric output constraints. To reduce the continuous updating of control inputs, a new dynamic event-triggering mechanism is used. Under multiple constraints, achieving the asymptotic convergence point by point in space of the manipulator's vibration state is a control challenge. To solve this issue, we propose a new asymptotic convergence lemma. In control design, radial basis neural networks are employed to estimate nonlinear uncertain terms and the barrier Lyapunov function is used to accomplish the output constraints. Based on the Lyapunov direct method, a novel distributed boundary event-based control algorithm is designed to guarantee that the closed-loop network can reach the asymptotical bipartite consensus tracking and vibration suppression. Moreover, Zeno behaviors can be excluded for each agent. Finally, some numerical results are presented to demonstrate the validity and superiority of the designed control algorithm.

关键词： flexible-link manipulator input-output constraint event-based asymptotical bipartite tracking vibrationprotectłinebreak control

来源：评论

学校读者我要写书评

暂无评论

Graph-geometric message passing via a graph convolution transformer for FKP regression

引用

science China(Information sciences) 2024年第12期67卷 176-190页

作者： Huizhi ZHU Wenxia XU Jian HUANG Baocheng YU Hubei Key Laboratory of Intelligent Robot Wuhan Institute of Technology School of Artificial Intelligence and Automation Huazhong University of Science and Technology

In this paper, the forward kinematics problem(FKP) of the Gough-Stewart platform(GSP) with six degrees of freedom(6 DoFs) is estimated via deep learning. We propose a graph convolution transformer model by systematically analyzing some challenges encountered with using deep learning regression on largescale data. We attempt to leverage the graph-geometric message as input and singular value decomposition(SVD) orthogonalization for SO(3) manifold learning. This study is the first in which a robot with a sophisticated closed-loop mechanism is described by a graph structure and a specific deep learning model is proposed to solve the FKP of the GSP. Qualitative and quantitative experiments on our dataset demonstrate that our model is feasible and superior to other methods. Our method can guarantee error percentages of translation and rotation less than 1 mm and 1° of 81.9% and 96.7%, respectively.

关键词： deep learning graph-structured learning graph convolution transformer forward kinematics problem Gough-Stewart platform

来源：评论

学校读者我要写书评

暂无评论

Frequency control for islanded AC microgrid based on deep reinforcement learning

引用

Cyber-Physical Systems 2024年第1期10卷 43-59页

作者： Liu, Xianggang Liu, Zhi-Wei Chi, Ming Wei, Guixi School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

The incorporation of intermittent and stochastic renewable energy into a microgrid creates frequent fluctuations, which provides new challenges in frequency control. This paper deals with the frequency control problem in the islanded AC microgrid (IACMG) via a model-free deep reinforcement learning (DRL) method, which includes offline learning and online control. Twin-delayed deep deterministic policy gradient is involved to improve the performance of the agent to minimise the frequency deviation. The advantage of the proposed method is self-adaptive to the uncertain IACMG model including renewable energy sources. Finally, the effectiveness and robustness of the proposed controller is demonstrated by four simulation scenarios. © 2022 Informa UK Limited, trading as Taylor & Francis Group.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Dynamic Optimal Formation Control for Mobile Vehicles With Minimizing Communication Path Loss Under Topology Switching

IEEE Transactions on Intelligent Vehicles

引用

IEEE Transactions on Intelligent Vehicles 2024年 1-11页

作者： Zhang, Wen-Tao Chi, Ming Liu, Zhi-Wei Xu, Jing-Zhe School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

This paper investigates the challenge of controlling the formation patterns of nonholonomic mobile vehicles (NMVs) fleet during mission execution to safeguard the critical vehicle, while simultaneously minimizing communication path loss among vehicles. Communication path loss refers to the attenuation of signal strength as it propagates between transmitting and receiving vehicles. To address this issue, we model the problem of minimizing communication path loss during the formation process as a non-convex problem. In this context, a novel dynamic optimal formation control with real-time topology adaptation (DOFC) algorithm is proposed. The algorithm consists of iterative optimizer, position offset estimation and predefined-time controller, seamlessly integrating optimization and control methods. Compared to traditional optimal formation, our proposed DOFC allows for dynamic adaptation to communication topology switches and adjustments of the optimal formation during motion. Finally, to demonstrate the effectiveness of the proposed methods, we conduct simulations and experiments. IEEE

关键词： Cost functions

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Multi-Expert Learning Model for Underwater Image Enhancement

引用

IEEE/CAA Journal of Automatica Sinica 2024年第3期11卷 708-722页

作者： Hongmin Liu Qi Zhang Yufan Hu Hui Zeng Bin Fan the School of Intelligence Science and Technology and the Institute of Artificial Intelligence University of Science and Technology BeijingBeijing 100083China the School of Automation and Electrical Engineering University of Science and Technology BeijingBeijing 100083China IEEE

Underwater image enhancement aims to restore a clean appearance and thus improves the quality of underwater degraded *** methods feed the whole image directly into the model for ***,they ignored that the R,G and B channels of underwater degraded images present varied degrees of degradation,due to the selective absorption for the *** address this issue,we propose an unsupervised multi-expert learning model by considering the enhancement of each color ***,an unsupervised architecture based on generative adversarial network is employed to alleviate the need for paired underwater *** on this,we design a generator,including a multi-expert encoder,a feature fusion module and a feature fusion-guided decoder,to generate the clear underwater ***,a multi-expert discriminator is proposed to verify the authenticity of the R,G and B channels,*** addition,content perceptual loss and edge loss are introduced into the loss function to further improve the content and details of the enhanced *** experiments on public datasets demonstrate that our method achieves more pleasing results in vision *** metrics(PSNR,SSIM,UIQM and UCIQE) evaluated on our enhanced images have been improved obviously.

关键词： Multi-expert learning underwater image enhancement unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

A survey on multimodal large language models

引用

National science Review 2024年第12期11卷 277-296页

作者： Shukang Yin Chaoyou Fu Sirui Zhao Ke Li Xing Sun Tong Xu Enhong Chen School of Artificial Intelligence and Data Science University of Science and Technology of China State Key Laboratory for Novel Software Technology Nanjing University School of Intelligence Science and Technology Nanjing University Tencent YouTu Lab

Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images and optical character recognition–free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even outperform GPT-4V, pushing the limit of research at a surprising speed. In this paper, we aim to trace and summarize the recent progress of MLLMs. First, we present the basic formulation of the MLLM and delineate its related concepts, including architecture,training strategy and data, as well as evaluation. Then, we introduce research topics about how MLLMs can be extended to support more granularity, modalities, languages and scenarios. We continue with multimodal hallucination and extended techniques, including multimodal in-context learning, multimodal chain of thought and LLM-aided visual reasoning. To conclude the paper, we discuss existing challenges and point out promising research directions.

关键词： multimodal large language model vision language model large language model

来源：评论

学校读者我要写书评

暂无评论

Residential Energy Scheduling With Solar Energy Based on Dyna Adaptive Dynamic Programming

引用

IEEE/CAA Journal of Automatica Sinica 2025年第2期12卷 403-413页

作者： Kang Xiong Qinglai Wei Hongyang Li the Institute of Systems Engineering Macau University of Science and Technology IEEE the State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences the School of Artificial Intelligence University of Chinese Academy of Sciences

Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.

关键词： Adaptive dynamic programming (ADP) dynamic residential scenarios optimal residential energy management smart grid

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered Robust Parallel Optimal Consensus Control for Multiagent Systems

引用

IEEE/CAA Journal of Automatica Sinica 2025年第1期12卷 40-53页

作者： Qinglai Wei Shanshan Jiao Qi Dong Fei-Yue Wang IEEE the Institute of Systems Engineering Macau University of Science and Technology the State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation Chinese Academy of Sciences the School of Artificial Intelligence University of Chinese Academy of Sciences China Academy of Electronics and Information Technology the State Key Laboratory of Multimodal Artificial Intelligence Systems the School of Artificial Intelligence University of Chinese Academy of Sciences

This paper highlights the utilization of parallel control and adaptive dynamic programming(ADP) for event-triggered robust parallel optimal consensus control(ETRPOC) of uncertain nonlinear continuous-time multiagent systems(MASs).First, the parallel control system, which consists of a virtual control variable and a specific auxiliary variable obtained from the coupled Hamiltonian, allows general systems to be transformed into affine systems. Of interest is the fact that the parallel control technique's introduction provides an unprecedented perspective on eliminating the negative effects of disturbance. Then, an eventtriggered mechanism is adopted to save communication resources while ensuring the system's stability. The coupled HamiltonJacobi(HJ) equation's solution is approximated using a critic neural network(NN), whose weights are updated in response to events. Furthermore, theoretical analysis reveals that the weight estimation error is uniformly ultimately bounded(UUB). Finally,numerical simulations demonstrate the effectiveness of the developed ETRPOC method.

关键词： Adaptive dynamic programming (ADP) critic neural network (NN) event-triggered control optimal consensus control robust control

来源：评论

学校读者我要写书评

暂无评论

Curve-Suppression-Based Event-Triggered Mechanisms for Quasi-Synchronization of Fuzzy Delayed Neural Networks on Time Scales

引用

IEEE Transactions on Systems, Man, and Cybernetics: Systems 2025年第5期55卷 3174-3187页

作者： Wan, Peng Zhou, Yufeng Zeng, Zhigang Lai, Jingang Wuhan University of Science and Technology School of Artificial Intelligence and Automation Engineering Research Center of Metallurgical Automation and Measurement Technology Wuhan430081 China Huazhong University of Science and Technology School of Artificial Intelligence and Automation Key Laboratory of Image Processing and Intelligent control of Education Ministry of China Wuhan430074 China

The vast majority of published event-triggered mechanisms (ETMs) are constructed based on measurement errors, which introduces a problem naturally that they are updated when the measurement errors exceed the thresholds although the current obtained sampling states can make systems converge well. With this problem in mind, we redesign ETMs for quasi-synchronization of T-S fuzzy neural networks (FNNs) with time delays on time scales. First, a novel ETM is designed for continuous-time FNNs with time-varying delays to achieve quasi-synchronization, with which synchronization errors is suppressed to globally exponentially converge to a ball. Second, we introduce the ETM for continuous-time FNNs to discrete-time FNNs, owing to the existence of discrete-time states, the Lypunov function of synchronization errors run over the exponentially decay curve, but it can be suppressed to evolve under another exponentially decay curve. Third, for FNNs on time scales with constant and time-varying delays, we estimate the forward jump operator of the Lyapunov functions and design ETMs to guarantee that the Lypunov functions evolve under the exponentially decay curves, so quasi-synchronization can be achieved. Last but not least, we prove that Zeno behavior will not happen and four numerical examples are introduced to verify the validity and the superiority of the proposed ETMs in reducing information transmission. © 2013 IEEE.

关键词： Continuous time systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：