检索结果-内蒙古大学图书馆

International Joint Conference on Neural Networks (IJCNN)

作者： Lee, Yu-Won Oh, Myeong-Seok Kim, Ho-Joong Lee, Seong-Whan Korea Univ Dept Artificial Intelligence Seoul South Korea Korea Univ Dept Comp & Radio Commun Engn Seoul South Korea

ISBN: (纸本)9781665488679

In deep learning, unsupervised domain adaptation (UDA) is commonly utilized when the availability of abundant labeled data is often limited. Several methods have been proposed for UDA to overcome the difficulty of distinguishing between semantically similar classes, such as person vs. rider and road vs. sidewalk. The confusion of the classes results from the collapse of the distance, caused by the domain shift, between classes in the feature space. In this work, we present a versatile approach based on text-image correlation-guided domain adaptation (TigDA), which maintains a distance to properly adjust the decision boundaries between classes in the feature space. In our approach, the feature information is extracted through text embedding of classes and the aligning capability of the text features with the image features is achieved using the cross-modality. The resultant cross-modal features play an essential role in generating pseudo-labels and calculating an auxiliary pixel-wise cross-entropy loss to assist the image encoder in learning the distribution of cross-modal features. Such a guiding process allows the extension of the distance between similar classes in feature space so that a proper distance for adjusting the decision boundary is maintained. Our TigDA achieved the highest performance among other UDA methods in both single-resolution and multi-resolution cases with the help of GTA5 and SYNTHIA for the source domain and Cityscapes for the target domain. The simplicity and versatility of TigDA will be widely applicable for enhancing the self-training capabilities of most UDA methods.

关键词： Unsupervised domain adaptation text-image correlation Self-training

来源：评论

学校读者我要写书评

暂无评论

LC-MSM: Language-Conditioned Masked Segmentation Model for unsupervised domain adaptation

引用

PATTERN RECOGNITION 2024年 148卷

作者： Kim, Young-Eun Lee, Yu-Won Lee, Seong-Whan Korea Univ Dept Artificial Intelligence Seoul 02841 South Korea

Unsupervised domain adaptation (UDA) is an important research topic in semantic segmentation tasks, wherein pixel-wise annotations are often difficult to collect in a test environment due to their high labeling costs. Previous UDA-based studies trained their segmentation networks using labeled synthetic data and unlabeled realistic data as source and target domains, respectively. However, they often fail to distinguish semantically similar classes, such as person vs. rider and road vs. sidewalk, because these classes are prone to confusion in domain-shifted environments. In this paper, we introduce a Language-Conditioned Masked Segmentation Model (LC-MSM), which is a new framework for the joint learning of context relations and domain-agnostic information for domain-adaptive semantic segmentation. Specifically, we reconstruct semantic labels with masked image conditions on the generalized text embeddings of the corresponding semantic class from OpenCLIP, which contains domain-invariant knowledge from large-scale data. To this end, we correlate the generalized text embeddings onto the per-pixel image feature of a masked image that learned the spatial context to further append domain-agnostic language information to the semantic decoder. This facilitates the generalization of our model to the target domain via the learning of context information within individual training instances, while considering cross-domain representations spanning the entire dataset. LC-MSM achieves an unprecedented UDA performance of 71.8 and 62.8 mIoU on GTA-*Cityscapes and SYNTHIA-*Cityscapes, respectively, which corresponds to an improvement of +3.5 and +1.9 percent points over the baseline method.

关键词： Unsupervised domain adaptation Semantic segmentation text-image correlation

来源：评论

学校读者我要写书评

暂无评论

From text to mask: Localizing entities using the attention of text-to-image diffusion models

引用

NEUROCOMPUTING 2024年 610卷

作者： Xiao, Changming Yang, Qi Zhou, Feng Zhang, Changshui Tsinghua Univ THUAI Inst Artificial Intelligence Beijing 100084 Peoples R China Beijing Natl Res Ctr Informat Sci & Technol BNRist Beijing 100084 Peoples R China Tsinghua Univ Dept Automat Beijing 100084 Peoples R China Aibee Inc Algorithm Res Beijing Peoples R China

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. This work proposes a simple but effective method to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without additional training time nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called "personalized referring image segmentation" with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.

关键词： Diffusion model for segmentation text-image correlation Weakly-supervised semantic segmentation Personalized recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：