Current automatic segment extraction techniques for identifying target characters in videos have several limitations, including low accuracy, slow processing speeds, and poor adaptability to diverse scenes. This paper...
详细信息
Foundation models(FMs) [1] have revolutionized software development and become the core components of large software systems. This paradigm shift, however, demands fundamental re-imagining of softwareengineering theo...
Foundation models(FMs) [1] have revolutionized software development and become the core components of large software systems. This paradigm shift, however, demands fundamental re-imagining of softwareengineering theories and methodologies [2]. Instead of replacing existing software modules implemented by symbolic logic, incorporating FMs' capabilities to build software systems requires entirely new modules that leverage the unique capabilities of ***, while FMs excel at handling uncertainty, recognizing patterns, and processing unstructured data, we need new engineering theories that support the paradigm shift from explicitly programming and maintaining user-defined symbolic logic to creating rich, expressive requirements that FMs can accurately perceive and implement.
Underwater target detection is an important method for detecting marine organisms. However, due to the image occlusion of underwater targets, blurred water quality, poor lighting conditions, small targets, and complex...
详细信息
Understanding and quantifying the capabilities of foundation models, particularly in text-to-image(T2I) generation, is crucial for verifying their alignment with human expectations and practical requirements. However,...
详细信息
Understanding and quantifying the capabilities of foundation models, particularly in text-to-image(T2I) generation, is crucial for verifying their alignment with human expectations and practical requirements. However, evaluating T2I foundation models presents significant challenges due to the complex, multi-dimensional psychological factors that influence human preferences for generated images. In this work, we propose MindScore, a multi-view framework for assessing the generation capacity of T2I models through the lens of human preference. Specifically, MindScore decomposes the evaluation into four complementary modules that align with human cognitive processing of images: matching, faithfulness, quality,and realness. The matching module quantifies the semantic alignment between generated images and prompt text, while the faithfulness module measures how accurately the images reflect specific prompt details. Furthermore, we incorporate quality and realness modules to capture deeper psychological preferences, recognizing that unpleasant or distorted images often trigger adverse human responses. Extensive experiments on three T2I datasets with human preference annotations clearly validate the superiority of our proposed MindScore over various state-of-the-art baselines. Our case studies further reveal that MindScore offers valuable insights into T2I generation from a human-centric perspective.
In the field of object detection for remote sensing images, especially in applications such as environmental monitoring and urban planning, significant progress has been made. This paper addresses the common challenge...
详细信息
Dear Editor,This letter presents a new transfer learning framework for the deep multi-agent reinforcement learning(DMARL) to reduce the convergence difficulty and training time when applying DMARL to a new scenario [1...
详细信息
Dear Editor,This letter presents a new transfer learning framework for the deep multi-agent reinforcement learning(DMARL) to reduce the convergence difficulty and training time when applying DMARL to a new scenario [1], [2].
Images captured under severe weather conditions, such as haze and fog, suffer from image quality degradation caused by atmospheric particle diffusion. This degradation manifests as color fading, reduced contrast, and ...
详细信息
Apricot detection is a prerequisite for counting and harvesting tasks. Existing algorithms face challenges in adapting to the impacts of complex environmental factors such as lighting variations, shadows, dense foliag...
详细信息
This paper introduces an advanced road damage detection algorithm that effectively addresses the shortcomings of existing models, including limited detection performance and large parameter sizes, by utilizing the YOL...
详细信息
Diabetic Retinopathy is a common microvascular complication of diabetes, and early and accurate diagnosis is crucial for minimizing its impact on vision. To address the complexity and diversity of lesions in diabetic ...
详细信息
暂无评论