Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existi...
详细信息
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models(MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like woodpeckers heal trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://***/BradyFU/Woodpecker.
This article tackles the boundary event-based bipartite consensus tracking control problem for the flexible manipulator multi-agent network over a signed diagraph. Each follower agent is the flexible manipulator with ...
详细信息
This article tackles the boundary event-based bipartite consensus tracking control problem for the flexible manipulator multi-agent network over a signed diagraph. Each follower agent is the flexible manipulator with unknown disturbances,modeling uncertainties, input saturations and backlashes, and asymmetric output constraints. To reduce the continuous updating of control inputs, a new dynamic event-triggering mechanism is used. Under multiple constraints, achieving the asymptotic convergence point by point in space of the manipulator's vibration state is a control challenge. To solve this issue, we propose a new asymptotic convergence lemma. In control design, radial basis neural networks are employed to estimate nonlinear uncertain terms and the barrier Lyapunov function is used to accomplish the output constraints. Based on the Lyapunov direct method, a novel distributed boundary event-based control algorithm is designed to guarantee that the closed-loop network can reach the asymptotical bipartite consensus tracking and vibration suppression. Moreover, Zeno behaviors can be excluded for each agent. Finally, some numerical results are presented to demonstrate the validity and superiority of the designed control algorithm.
In this paper, the forward kinematics problem(FKP) of the Gough-Stewart platform(GSP) with six degrees of freedom(6 DoFs) is estimated via deep learning. We propose a graph convolution transformer model by systematica...
详细信息
In this paper, the forward kinematics problem(FKP) of the Gough-Stewart platform(GSP) with six degrees of freedom(6 DoFs) is estimated via deep learning. We propose a graph convolution transformer model by systematically analyzing some challenges encountered with using deep learning regression on largescale data. We attempt to leverage the graph-geometric message as input and singular value decomposition(SVD) orthogonalization for SO(3) manifold learning. This study is the first in which a robot with a sophisticated closed-loop mechanism is described by a graph structure and a specific deep learning model is proposed to solve the FKP of the GSP. Qualitative and quantitative experiments on our dataset demonstrate that our model is feasible and superior to other methods. Our method can guarantee error percentages of translation and rotation less than 1 mm and 1° of 81.9% and 96.7%, respectively.
The incorporation of intermittent and stochastic renewable energy into a microgrid creates frequent fluctuations, which provides new challenges in frequency control. This paper deals with the frequency control problem...
详细信息
This paper investigates the challenge of controlling the formation patterns of nonholonomic mobile vehicles (NMVs) fleet during mission execution to safeguard the critical vehicle, while simultaneously minimizing comm...
详细信息
This paper investigates the challenge of controlling the formation patterns of nonholonomic mobile vehicles (NMVs) fleet during mission execution to safeguard the critical vehicle, while simultaneously minimizing communication path loss among vehicles. Communication path loss refers to the attenuation of signal strength as it propagates between transmitting and receiving vehicles. To address this issue, we model the problem of minimizing communication path loss during the formation process as a non-convex problem. In this context, a novel dynamic optimal formation control with real-time topology adaptation (DOFC) algorithm is proposed. The algorithm consists of iterative optimizer, position offset estimation and predefined-time controller, seamlessly integrating optimization and control methods. Compared to traditional optimal formation, our proposed DOFC allows for dynamic adaptation to communication topology switches and adjustments of the optimal formation during motion. Finally, to demonstrate the effectiveness of the proposed methods, we conduct simulations and experiments. IEEE
Underwater image enhancement aims to restore a clean appearance and thus improves the quality of underwater degraded *** methods feed the whole image directly into the model for ***,they ignored that the R,G and B cha...
详细信息
Underwater image enhancement aims to restore a clean appearance and thus improves the quality of underwater degraded *** methods feed the whole image directly into the model for ***,they ignored that the R,G and B channels of underwater degraded images present varied degrees of degradation,due to the selective absorption for the *** address this issue,we propose an unsupervised multi-expert learning model by considering the enhancement of each color ***,an unsupervised architecture based on generative adversarial network is employed to alleviate the need for paired underwater *** on this,we design a generator,including a multi-expert encoder,a feature fusion module and a feature fusion-guided decoder,to generate the clear underwater ***,a multi-expert discriminator is proposed to verify the authenticity of the R,G and B channels,*** addition,content perceptual loss and edge loss are introduced into the loss function to further improve the content and details of the enhanced *** experiments on public datasets demonstrate that our method achieves more pleasing results in vision *** metrics(PSNR,SSIM,UIQM and UCIQE) evaluated on our enhanced images have been improved obviously.
Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising ...
详细信息
Recently, the multimodal large language model(MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models(LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images and optical character recognition–free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even outperform GPT-4V, pushing the limit of research at a surprising speed. In this paper, we aim to trace and summarize the recent progress of MLLMs. First, we present the basic formulation of the MLLM and delineate its related concepts, including architecture,training strategy and data, as well as evaluation. Then, we introduce research topics about how MLLMs can be extended to support more granularity, modalities, languages and scenarios. We continue with multimodal hallucination and extended techniques, including multimodal in-context learning, multimodal chain of thought and LLM-aided visual reasoning. To conclude the paper, we discuss existing challenges and point out promising research directions.
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we pr...
详细信息
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna actiondependent heuristic dynamic programming(Dyna-ADHDP)method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.
This paper highlights the utilization of parallel control and adaptive dynamic programming(ADP) for event-triggered robust parallel optimal consensus control(ETRPOC) of uncertain nonlinear continuous-time multiagent s...
详细信息
This paper highlights the utilization of parallel control and adaptive dynamic programming(ADP) for event-triggered robust parallel optimal consensus control(ETRPOC) of uncertain nonlinear continuous-time multiagent systems(MASs).First, the parallel control system, which consists of a virtual control variable and a specific auxiliary variable obtained from the coupled Hamiltonian, allows general systems to be transformed into affine systems. Of interest is the fact that the parallel control technique's introduction provides an unprecedented perspective on eliminating the negative effects of disturbance. Then, an eventtriggered mechanism is adopted to save communication resources while ensuring the system's stability. The coupled HamiltonJacobi(HJ) equation's solution is approximated using a critic neural network(NN), whose weights are updated in response to events. Furthermore, theoretical analysis reveals that the weight estimation error is uniformly ultimately bounded(UUB). Finally,numerical simulations demonstrate the effectiveness of the developed ETRPOC method.
The vast majority of published event-triggered mechanisms (ETMs) are constructed based on measurement errors, which introduces a problem naturally that they are updated when the measurement errors exceed the threshold...
详细信息
暂无评论