Understanding and quantifying the capabilities of foundation models, particularly in text-to-image(T2I) generation, is crucial for verifying their alignment with human expectations and practical requirements. However,...
详细信息
Understanding and quantifying the capabilities of foundation models, particularly in text-to-image(T2I) generation, is crucial for verifying their alignment with human expectations and practical requirements. However, evaluating T2I foundation models presents significant challenges due to the complex, multi-dimensional psychological factors that influence human preferences for generated images. In this work, we propose MindScore, a multi-view framework for assessing the generation capacity of T2I models through the lens of human preference. Specifically, MindScore decomposes the evaluation into four complementary modules that align with human cognitive processing of images: matching, faithfulness, quality,and realness. The matching module quantifies the semantic alignment between generated images and prompt text, while the faithfulness module measures how accurately the images reflect specific prompt details. Furthermore, we incorporate quality and realness modules to capture deeper psychological preferences, recognizing that unpleasant or distorted images often trigger adverse human responses. Extensive experiments on three T2I datasets with human preference annotations clearly validate the superiority of our proposed MindScore over various state-of-the-art baselines. Our case studies further reveal that MindScore offers valuable insights into T2I generation from a human-centric perspective.
Even though Multiple Input Multiple Output (MIMO) systems are one of the greatest innovations in wireless cellular network technology, they still face many challenges as we progress to 6G, hindering their ability to a...
详细信息
Image steganography is the art and science of secure communication by concealing information within digital images. In recent years, the techniques of steganographic cost learning have developed rapidly. Although the ...
详细信息
Image steganography is the art and science of secure communication by concealing information within digital images. In recent years, the techniques of steganographic cost learning have developed rapidly. Although the existing methods can learn satisfactory additive costs, the interplay of different pixels' embedding impacts has not been considered, so the potential of learning may not be fully exploited. To overcome this limitation, in this paper, a reinforcement learning paradigm called Jo Po L(joint policy learning) is proposed to extend the idea of additive cost learning to a non-additive situation. Jo Po L aims to capture the interactions within pixel blocks by defining embedding policies and evaluating contributions of embedding impacts on a block level rather than a pixel level. Then, a policy network is utilized to learn optimal joint embedding policies for pixel blocks through interactions with the environment. Afterwards,these policies can be converted into joint embedding costs for practical message embedding. The structure of the policy network is designed with an effective attention mechanism and incorporated with the domain knowledge derived from traditional non-additive steganographic methods. The environment is responsible for assigning rewards according to the impacts of the sampled joint embedding actions, which are evaluated by the gradient information of a neural network-based steganalyzer. Experimental results show that the proposed non-additive method Jo Po L significantly outperforms the existing additive methods against both feature-based and CNN-based steganalzyers over different payloads.
Recommender systems are effective in mitigating information overload, yet the centralized storage of user data raises significant privacy concerns. Cross-user federated recommendation(CUFR) provides a promising distri...
详细信息
Recommender systems are effective in mitigating information overload, yet the centralized storage of user data raises significant privacy concerns. Cross-user federated recommendation(CUFR) provides a promising distributed paradigm to address these concerns by enabling privacy-preserving recommendations directly on user devices. In this survey, we review and categorize current progress in CUFR, focusing on four key aspects: privacy, security, accuracy, and efficiency. Firstly,we conduct an in-depth privacy analysis, discuss various cases of privacy leakage, and then review recent methods for privacy protection. Secondly, we analyze security concerns and review recent methods for untargeted and targeted *** untargeted attack methods, we categorize them into data poisoning attack methods and parameter poisoning attack methods. For targeted attack methods, we categorize them into user-based methods and item-based methods. Thirdly,we provide an overview of the federated variants of some representative methods, and then review the recent methods for improving accuracy from two categories: data heterogeneity and high-order information. Fourthly, we review recent methods for improving training efficiency from two categories: client sampling and model compression. Finally, we conclude this survey and explore some potential future research topics in CUFR.
Long Short-Term Memory (LSTM) networks are particularly useful in recommender systems since user preferences change over time. Unlike traditional recommender models which assume static user-item interactions, LSTM mod...
详细信息
Identifying Jackfruit spices using a deep learning model involves leveraging advanced neural network architectures to classify spices accurately. The model is trained on a dataset containing images of various spices c...
详细信息
Chemistry, as a naturally multimodal discipline, plays a crucial role in various vital fields such as pharmaceutical research and material manufacturing. Therefore, research on artificialintelligence(AI) for chemistr...
Chemistry, as a naturally multimodal discipline, plays a crucial role in various vital fields such as pharmaceutical research and material manufacturing. Therefore, research on artificialintelligence(AI) for chemistry has garnered increasing attention. Despite the rapid development, most of the chemical AI models today mainly focus on single tasks with unimodal input [1].
作者:
Huang, Tsung-RenHsu, Shin-MinFu, Li-ChenNational Taiwan University
Department of Psychology Center for Artificial Intelligence & Advanced Robotics Taipei106319 Taiwan National Taiwan University
Center for Artificial Intelligence & Advanced Robotics Department of Electrical Engineering Department of Computer Science and Information Engineering Taipei106319 Taiwan
Being able to recognize emotional intensity is a desirable feature for a facial emotional recognition (FER) system. However, the development of such a feature is hindered by the paucity of intensity-labeled data for m...
详细信息
Support vector machine(SVM)is a binary classifier widely used in machine ***,neglecting the latent data structure in previous SVM can limit the performance of SVM and its *** address this issue,the authors propose a n...
详细信息
Support vector machine(SVM)is a binary classifier widely used in machine ***,neglecting the latent data structure in previous SVM can limit the performance of SVM and its *** address this issue,the authors propose a novel SVM with discriminative low-rank embedding(LRSVM)that finds a discriminative latent low-rank subspace more suitable for SVM *** extension models of LRSVM are introduced by imposing different orthogonality constraints to prevent computational inaccuracies.A detailed derivation of the authors’iterative algorithms are given that is essentially for solving the SVM on the low-rank ***,some theorems and properties of the proposed models are presented by the *** is worth mentioning that the subproblems of the proposed algorithms are equivalent to the standard or the weighted linear discriminant analysis(LDA)*** indicates that the projection subspaces obtained by the authors’algorithms are more suitable for SVM classification compared to those from the LDA *** convergence analysis for the authors proposed algorithms are also ***,the authors conduct experiments on various machine learning data sets to evaluate the *** experiment results show that the authors’algorithms perform significantly better than other algorithms,which indicates their superior abilities on classification tasks.
Interference management is one of the most important issues in the device-to-device(D2D)-enabled heterogeneous cellular networks(HetCNets)due to the coexistence of massive cellular and D2D devices in which D2D devices...
详细信息
Interference management is one of the most important issues in the device-to-device(D2D)-enabled heterogeneous cellular networks(HetCNets)due to the coexistence of massive cellular and D2D devices in which D2D devices reuse the cellular *** alleviate the interference,an efficient interference management way is to set exclusion zones around the cellular *** this paper,we adopt a stochastic geometry approach to analyze the outage probabilities of cellular and D2D users in the D2D-enabled *** main difficulties contain three aspects:1)how to model the location randomness of base stations,cellular and D2D users in practical networks;2)how to capture the randomness and interrelation of cellular and D2D transmissions due to the existence of random exclusion zones;3)how to characterize the different types of interference and their impacts on the outage probabilities of cellular and D2D *** then run extensive Monte-Carlo simulations which manifest that our theoretical model is very accurate.
暂无评论