Sampling is a fundamental method for generating data *** many data analysis methods are developed based on probability distributions,maintaining distributions when sampling can help to ensure good data analysis ***,sa...
详细信息
Sampling is a fundamental method for generating data *** many data analysis methods are developed based on probability distributions,maintaining distributions when sampling can help to ensure good data analysis ***,sampling a minimum subset while maintaining probability distributions is still a *** this paper,we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the ***,a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions:one based on the chi-square test and the other based on likelihood *** on four types of datasets with a size of 60000 show that when the significant difference level,a,is set to 0.05,the algorithm can exclude 99.9%,99.0%,93.1%and 96.7%of the samples based on their Bayesian networks-ASIA,ALARM,HEPAR2,and ANDES,*** subsets of the same size are sampled,the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0.03;by contrast,the subsets generated by random sampling pass only 83.8%of the tests,and the average distribution difference is approximately 0.24.
Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of...
详细信息
Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant r...
详细信息
As for social choice, all alternatives are ranked by agents to form preferences as linear orders. However, in applications, sometimes some alternatives cannot be ranked, or it is unnecessary to rank them, which leads ...
As for social choice, all alternatives are ranked by agents to form preferences as linear orders. However, in applications, sometimes some alternatives cannot be ranked, or it is unnecessary to rank them, which leads to unranked alternatives. Hence, without loss of generality, by dividing the set of alternatives into three ranked and unranked subsets, including top-k alternatives, intermediate-r alternatives, and last-l alternatives, the Mallows model on ranked and unranked preferences can be analyzed systematically. Technically, a repeated insertion model is adopted during sampling, and probability distributions are derived for ranked and unranked preferences of alternatives. Experimental results verify the accuracy of the probability distributions for different ranked and unranked preferences of alternatives. Furthermore, in order to solve the preference completion problem where agents have multiple partial rankings, a fuzzy preference completion algorithm, Fuzzy-Multi-Rankings, is proposed, which introduces a fuzzy ranking to complete the target agent’s preference in addition to the traditional nearest-neighbor-based methods. Based on the three ranked and unranked preferences, seven cases can be classified and analyzed for fuzzy preference completion. Experiments on the synthetic datasets and MovieLens dataset confirm the effectiveness and efficiency of our proposed Fuzzy-Multi-Rankings algorithm and also verify the accuracy of the evaluated probability distributions for the proposed seven cases.
knowledge representation learning is usually used in knowledge reasoning and other related fields. Its goal is to use low-dimensional vectors to represent the entities and relations in a knowledge graph. In the proces...
详细信息
Early classification of time series aims to accurately predict the class label of a time series as early as possible, which is significant but challenging in many time-sensitive applications. Existing early classifica...
详细信息
ISBN:
(数字)9781665488105
ISBN:
(纸本)9781665488112
Early classification of time series aims to accurately predict the class label of a time series as early as possible, which is significant but challenging in many time-sensitive applications. Existing early classification methods hold a basic closed-world assumption that the classifier must have seen the classes of test samples. However, new samples that do not belong to any trained class may appear in the real world. In this paper, we first address the early classification in an open world and design two detectors to identify which known class or unknown class a sample belongs to. Specifically, based on the observed data, an early known-class detector is designed to determine the known-class confidence and an early unknown-class detector is designed to determine the unknown-class confidence according to the Minimum Reliable Length (MRL) and the Weibull distribution of each class. Experimental results evaluated on real-world datasets demonstrate that the proposed model can identify samples of unknown and known classes accurately and early.
作者:
Liu, KangXue, FengGuo, DanWu, LeLi, ShujieHong, RichangHefei University of Technology
School of Computer Science and Information Engineering Key Laboratory of Knowledge Engineering with Big Data Intelligent Interconnected Systems Laboratory of Anhui Province 485 Danxia Road Anhui Province Hefei230601 China Hefei University of Technology
School of Software Key Laboratory of Knowledge Engineering with Big Data Intelligent Interconnected Systems Laboratory of Anhui Province 485 Danxia Road Anhui Province Hefei230601 China
In most E-commerce platforms, whether the displayed items trigger the user's interest largely depends on their most eye-catching multimodal content. Consequently, increasing efforts focus on modeling multimodal us...
详细信息
In most E-commerce platforms, whether the displayed items trigger the user's interest largely depends on their most eye-catching multimodal content. Consequently, increasing efforts focus on modeling multimodal user preference, and the pressing paradigm is to incorporate complete multimodal deep features of the items into the recommendation module. However, the existing studies ignore the mismatch problem between multimodal feature extraction (MFE) and user interest modeling (UIM). That is, MFE and UIM have different emphases. Specifically, MFE is migrated from and adapted to upstream tasks such as image classification. In addition, it is mainly a content-oriented and non-personalized process, while UIM, with its greater focus on understanding user interaction, is essentially a user-oriented and personalized process. Therefore, the direct incorporation of MFE into UIM for purely user-oriented tasks, tends to introduce a large number of preference-independent multimodal noise and contaminate the embedding representations in UIM. This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences. Towards this end, we develop a novel model, multimodal entity graph collaborative filtering, short for MEGCF. The UIM of the proposed model captures the semantic correlation between interactions and the features obtained from MFE, thus making a better match between MFE and UIM. More precisely, semantic-rich entities are first extracted from the multimodal data, since they are more relevant to user preferences than other multimodal information. These entities are then integrated into the user-item interaction graph. Afterwards, a symmetric linear Graph Convolution Network (GCN) module is constructed to perform message propagation over the graph, in order to capture both high-order semantic correlation and collaborative filtering signals. Finally, the sentiment information fr
Purpose:We attempt to find out whether OA or TA really affects the dissemination of scientific ***/methodology/approach:We design the indicators,hot-degree,and R-index to indicate a topic OA or TA ***,according to the...
详细信息
Purpose:We attempt to find out whether OA or TA really affects the dissemination of scientific ***/methodology/approach:We design the indicators,hot-degree,and R-index to indicate a topic OA or TA ***,according to the OA classification of the Web of Science(WoS),we collect data from the WoS by downloading OA and TA articles,letters,and reviews published in Nature and Science during 2010–*** papers are divided into three broad disciplines,namely biomedicine,physics,and ***,taking a discipline in a journal and using the classical Latent Dirichlet Allocation(LDA)to cluster 100 topics of OA and TA papers respectively,we apply the Pearson correlation coefficient to match the topics of OA and TA,and calculate the hot-degree and R-index of every OA-TA topic ***,characteristics of the discipline can be *** qualitative comparison,we choose some high-quality papers which belong to Nature remarkable papers or Science breakthroughs,and analyze the relations between OA/TA and citation ***:The result shows that OA hot-degree in biomedicine is significantly greater than that of TA,but significantly less than that of TA in *** on the R-index,it is found that OA advantages exist in biomedicine and TA advantages do in ***,the dissemination of average scientific discoveries in all fields is not necessarily affected by OA or ***,OA promotes the spread of important scientific discoveries in high-quality *** limitations:We lost some citations by ignoring other open sources such as arXiv and *** limitation came from that Nature employs some strong measures for access-promoting subscription-based articles,on which the boundary between OA and TA became *** implications:It is useful to select hot topics in a set of publications by the hotdegree *** finding comprehensively reflects the differences of OA and TA in different disciplines,which is a u
In this work, we address the challenging task of Generalized Referring Expression Comprehension (GREC). Compared to the classic Referring Expression Comprehension (REC) that focuses on single-target expressions, GREC ...
详细信息
Few-shot relation extraction with none-of-the-above (FsRE with NOTA) aims at predicting labels in few-shot scenarios with unknown classes. FsRE with NOTA is more challenging than the conventional few-shot relation ext...
详细信息
暂无评论