Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning;adoption that has fueled a wealth of new models such as LL...
详细信息
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning;adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3. Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance - a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization, and challenge sets that probe properties such as hallucination;evaluations that provide fine-grained insight VLM capabilities. Second, we rigorously investigate VLMs along key design axes, including pretrained visual representations and training from base vs. instruct-tuned language models, amongst others. We couple our analysis with three resource contributions: (1) a unified framework for evaluating VLMs, (2) optimized, flexible training code, and (3) checkpoints for all models, including a family of VLMs at the 7-13B scale that strictly outperform InstructBLIP and LLaVa v1.5, the state-of-the-art in open VLMs. Copyright 2024 by the author(s)
As population aging intensifies, the incidence of cardiovascular and cerebrovascular diseases has continued to rise. Vascular interventional surgery, known for its minimal trauma, rapid recovery, and clear effectivene...
详细信息
With the rapid changes in science and technology, robotics has gradually moved from science fiction to reality. However, progress in this field has not been easy, and the way of robot control methods has always been a...
详细信息
Exploration in unknown environments plays an important role in the field of mobile robots, with multi-vehicle collaboration showcasing advantages such as parallel processing, fault tolerance, flexibility, and informat...
详细信息
Cobots have an ethos of sharing a common workspace with humans is crucial in a production environment as humans generally excel in tasks that lean towards dexterity, cognitive reasoning and decision making. These mutu...
详细信息
In response to the issues of traditional Sparrow Search Algorithm (SSA) in path planning for mine cleaning robots, such as being prone to local optima and having slow convergence speed, this paper proposes an improved...
详细信息
To address the issues of low efficiency and poor uniformity in manually applying a waterproof bonding layer on steel deck surfaces, we developed an automated spraying robot. We established a mathematical model based o...
详细信息
To informatively plan optimal paths for autonomous mobile robots in indoor environment is essential in real life cases. In view of the shortcomings of the traditional path planning strategies based on the cameras moun...
详细信息
This paper introduces a communication and planning framework to facilitate efficient state update information between an autonomous robotic system and a human operator under scenarios where continuous robotic monitori...
详细信息
In response to the safety and reliability concerns in dynamic path planning for dual-robot systems, this paper proposes a spatio-temporal A* path planning algorithm based on minimum manipulability ***, the algorithm e...
详细信息
暂无评论