Extracting emergency events from a large amount of unstructured information is essential for improving early warning and emergency response. Existing event extraction methods for specialist fields often rely on well-d...
详细信息
Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficie...
详细信息
The brain is the important part of central nervous system of the human being where brain tumor seriously threatens human health. For effective treatment, fully automatic brain tumor segmentation is a very significant ...
详细信息
Point cloud-based large scale place recognition is an important but challenging task for many applications such as Simultaneous Localization and Mapping (SLAM). Taking the task as a point cloud retrieval problem, prev...
详细信息
We study the dynamics of a family of replicator maps, depending on two parameters. Such studies are motivated by the analysis of the dynamics of evolutionary games under selections. From the dynamics viewpoint, we pro...
详细信息
The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LORA and DreamBooth to extend this performance to even more unseen concept ...
ISBN:
(纸本)9798331314385
The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LORA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the out-of-distribution concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The aesthetics score and concept-matching score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11.10 boost on the CLIP score and a 33.08% decrease on the CMMD metric.
The advanced sixth-generation (6G) wireless network is considered as an indispensable part of the Metaverse, where a substantial volume of communication content is transmitted through multiple modalities, placing sign...
详细信息
ISBN:
(数字)9798350303582
ISBN:
(纸本)9798350303599
The advanced sixth-generation (6G) wireless network is considered as an indispensable part of the Metaverse, where a substantial volume of communication content is transmitted through multiple modalities, placing significant transmission loads on communication channels. In this paper, we propose a framework for multi-modal semantic communication using hashing-based semantic extraction approach to produce optimal binary signatures (hash codes). Instead of directly using coarse-grained feature fusion methods, we capture deep semantics in self-attention manner, achieving fine-grained multi-modal feature fusion thereby strengthening the representation ability of hash codes. To enhance adaptability in practical situations, we then design a modality-completion module to address missing modalities in data, accommodating scenarios with both single-modal and cross-modal data. We evaluate the proposed semantic extraction framework on two popular multi-modal datasets, comparing it with the latest hashing methods and then demonstrate the effectiveness in various channel conditions.
Trust plays a vital role in ensuring the security of the Social Internet of Things. Due to the increased mobility of intelligent devices, there are frequent interactions between machines and changes in social relation...
详细信息
Cross-modal matching shows enormous potential to recognize objects across different sensory modalities, which is fundamental to numerous visual-language tasks like image-text retrieval and visual captioning. Existing ...
详细信息
With the advent of the era of bigdata, more and more data are accumulated by the second class in colleges and universities, and how to effectively analyze and use them has become the research focus. A novel clusterin...
详细信息
暂无评论