In this work, we further investigate the utility of using a large language model as a research assistant to identify research grant funding opportunities that are best suited for a user-defined natural language set of...
详细信息
The limitations of acquisition equipment often result in scene image data of limited size, posing a challenge for comprehensive analysis of social image datasets. Advances in generative models have introduced image ou...
详细信息
The limitations of acquisition equipment often result in scene image data of limited size, posing a challenge for comprehensive analysis of social image datasets. Advances in generative models have introduced image outpainting techniques that expand the size of acquired social scene images, thereby enhancing the value of social image data. Stable diffusion (SD), which benefits from the guidance of caption prompts, shows excellent performance in image outpainting. However, its heavy reliance on manual prompts leads to a significant drawback: a decrease in the quality of generated images without prompts. To overcome this challenge, we propose a novel self-prompt diffusion model for image outpainting that extrapolates images based on the semantics of the source image, thereby removing the dependence on manual prompts. Specifically, we design a prompt autoencoder that uses an autoregressive transformer to map prompt embeddings into their semantic space, facilitating the construction of a semantic decoder. The semantic decoder and prompt embeddings are then cooptimized within the proposed prompt embedding network, allowing the mapping of image features to the stable diffusion prompt embeddings. Furthermore, by exploiting the inherent generative capabilities of diffusion models, we introduce a seam line regeneration mechanism to address the common problem of seam lines when splicing input and generated images. Comparative experiments on the Places2 and COCO datasets show that our method outperforms current state-of-the-art approaches on visual quality metrics and is adaptable to the stable diffusion model without additional fine-tuning.
Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations bet...
详细信息
Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations between images and diseases, leading to inaccurate or incomplete diagnostic results. In this work, we propose MedFILIP, a fine-grained VLP model, introduces medical image-specific knowledge through contrastive learning, specifically: 1) An information extractor based on a large language model is proposed to decouple comprehensive disease details from reports, which excels in extracting disease deals through flexible prompt engineering, thereby effectively reducing text complexity while retaining rich information at a tiny cost. 2) A knowledge injector is proposed to construct relationships between categories and visual attributes, which help the model to make judgments based on image features, and fosters knowledge extrapolation to unfamiliar disease categories. 3) A semantic similarity matrix based on fine-grained annotations is proposed, providing smoother, information-richer labels, thus allowing fine-grained image-text alignment. 4) We validate MedFILIP on numerous datasets, e.g., RSNA-Pneumonia, NIH ChestX-ray14, VinBigdata, and COVID-19. For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance, the classification accuracy has increased by a maximum of 6.69%.
As the amount of data that engineers and analysts needed to evaluate increases, the use of Artificial Intelligence (AI) techniques become more critical. Big data analytics tools can be used to find patterns and relati...
详细信息
Financial sentiment analysis is the task of evaluating and quantifying the emotions and opinions expressed in financial news, reports, or social media to help investors and institutions make informed decisions. Financ...
详细信息
Most existing few-shot image classification methods employ global pooling to aggregate class-relevant local features in a data-drive manner. Due to the difficulty and inaccuracy in locating class-relevant regions in c...
详细信息
Most existing few-shot image classification methods employ global pooling to aggregate class-relevant local features in a data-drive manner. Due to the difficulty and inaccuracy in locating class-relevant regions in complex scenarios, as well as the large semantic diversity of local features, the class-irrelevant information could reduce the robustness of the representations obtained by performing global pooling. Meanwhile, the scarcity of labeled images exacerbates the difficulties of data-hungry deep models in identifying class-relevant regions. These issues severely limit deep models' few-shot learning ability. In this work, we propose to remove the class-irrelevant information by making local features class relevant, thus bypassing the big challenge of identifying which local features are class irrelevant. The resulting class-irrelevant feature removal (CIFR) method consists of three phases. First, we employ the masked image modeling strategy to build an understanding of images' internal structures that generalizes well. Second, we design a semantic-complementary feature propagation module to make local features class relevant. Third, we introduce a weighted dense-connected similarity measure, based on which a loss function is raised to fine-tune the entire pipeline, with the aim of further enhancing the semantic consistency of the class-relevant local features. visualization results show that CIFR achieves the removal of class-irrelevant information by making local features related to classes. Comparison results on four benchmark datasets indicate that CIFR yields very promising performance.
With the wide application of channel visualization technology and UAV inspection in the power system, the image intelligent recognition technology has achieved initial success in the power transmission and distributio...
详细信息
Glyph-based visualization is one of the main techniques for visualizing complex multivariate data. With small glyphs, data variables are typically encoded with relatively low visual and perceptual precision. Glyph des...
详细信息
The following achievements have been made: in the research of the operation image model of the large power grid, a real-time operation image model of the large power grid has been proposed. Corresponding solutions or ...
详细信息
暂无评论