Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text *** Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of priva...
详细信息
Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text *** Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy ***,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent *** two kinds of methods have their respective *** sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in *** inherent privacy could only provide weak guaranteed privacy when defending against *** is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential *** key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is *** is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private ***-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-*** RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics *** we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent *** validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.
Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model,by employing word clustering and syntactic parsing ***, In order...
详细信息
Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model,by employing word clustering and syntactic parsing ***, In order to overcome the defects of the classical HMM, Markov family model(MFM), a new statistical model was introduced. Secondly, to solve the problem of data sparseness, we propose a bottom-to-up hierarchical word clustering algorithm. Then we combine syntactic parsing with part-of-speech tagging. The Part-ofSpeech tagging experiments show that the improved PartOf-Speech tagging model has higher performance than Hidden Markov models(HMMs) under the same testing conditions, the precision is enhanced from 94.642% to97.235%.
The cross-domain knowledge diffusion from science to policy is a prevalent phenomenon that demands academic attention. To investigate the characteristics of cross-domain knowledge diffusion from science to policy, thi...
详细信息
The cross-domain knowledge diffusion from science to policy is a prevalent phenomenon that demands academic attention. To investigate the characteristics of cross-domain knowledge diffusion from science to policy, this study suggests using the citation of policies to scientific articles as a basis for quantifying the diffusion strength, breadth, and speed. The study reveals that the strength and breadth of cross-domain knowledge diffusion from scientific papers to policies conform to a power-law distribution, while the speed follows a logarithmic normal distribution. Moreover, the papers with the highest diffusion strength, breadth, and fastest diffusion speed are predominantly from world-renowned universities, scholars, and top journals. The papers with the highest diffusion strength and breadth are mostly from social sciences, especially economics, those with the fastest diffusion speed are mainly from medical and life sciences, followed by social sciences. The findings indicate that cross-domain knowledge diffusion from science to policy follows the Matthew effect, whereby individuals or institutions with high academic achievements are more likely to achieve successful cross-domain knowledge diffusion. Furthermore, papers in the field of economics tend to have the higher cross-domain knowledge diffusion strength and breadth, while those in medical and life sciences have the faster cross-domain knowledge diffusion speed. 86 Annual Meeting of the Association for information Science & Technology | Oct. 27 – 31, 2023 | London, United Kingdom. Author(s) retain copyright, but ASIS&T receives an exclusive publication license.
A network of many sensors and a base station that are deployed over a region is *** sensor has a transmission range,an interference range and a carrier sensing range,which are r,αr and βr,*** this paper,we study the...
详细信息
A network of many sensors and a base station that are deployed over a region is *** sensor has a transmission range,an interference range and a carrier sensing range,which are r,αr and βr,*** this paper,we study the minimum latency conflict-aware many-to-one data aggregation scheduling problem:Given locations of sensors along with a base station,a subset of all sensors,and parameters r,α and β,to find a schedule in which the data of each sensor in the subset can be transmitted to the base station with no conflicts,such that the latency is *** designe an algorithm based on maximal independent sets,which has a latency bound of(a+19b)R + Δb-a + 5 time slots,where a and b are two constant integers relying on α and β,Δ is the maximum degree of network topology,and R is the trivial lower bound of *** Δ contributes to an additive factor instead of a multiplicative one,thus our algorithm is nearly a constant(a+19b)-ratio.
Recent research has demonstrated how the widespread adoption of collaborative tagging systems yields emergent semantics. In recent years, much has been learned about how to harvest the data produced by taggers for eng...
详细信息
Recent research has demonstrated how the widespread adoption of collaborative tagging systems yields emergent semantics. In recent years, much has been learned about how to harvest the data produced by taggers for engineering light-weight ontologies. For example, existing measures of tag similarity and tag relatedness have proven crucial step stones for making latent semantic relations in tagging systems explicit. However, little progress has been made on other issues, such as understanding the different levels of tag generality (or tagabstrcatsness), which is essential for, among others, identifying hierarchical relationships between concepts. In this paper we aim to address this gap. Starting from a review of linguistic definitions of wordabstrcatness, we first use several large-scale ontologies and taxonomies as grounded measures of word generality, including Yago, Wordnet, DMOZ and Wikitaxonomy. Then, we introduce and apply several folksonomy-based methods to measure the level of generality of given tags. We evaluate these methods by comparing them with the grounded measures. Our results suggest that the generality of tags in social tagging systems can be approximated with simple measures. Our work has implications for a number of problems related to social tagging systems, including search, tag recommendation, and the acquisition of light-weight ontologies from tagging data.
Despite its success,similarity-based collaborative filtering suffers from some limitations,such as scalability,sparsity and recommendation *** work has shown incorporating trust mechanism into traditional collaborativ...
详细信息
Despite its success,similarity-based collaborative filtering suffers from some limitations,such as scalability,sparsity and recommendation *** work has shown incorporating trust mechanism into traditional collaborative filtering recommender systems can improve these *** argue that trust-based recommender systems are facing novel recommendation attack which is different from the profile injection attacks in traditional recommender *** the best of our knowledge,there has not any prior study on recommendation attack in a trust-based recommender *** analyze the attack problem,and find that "victim" nodes play a significant role in the ***,we propose a data provenance method to trace malicious users and identify the "victim" nodes as distrust users of recommender *** study of the defend method is done with the dataset crawled from Epinions website.
Dear editor,This letter presents an unsupervised feature selection method based on machine *** selection is an important component of artificial intelligence,machine learning,which can effectively solve the curse of d...
详细信息
Dear editor,This letter presents an unsupervised feature selection method based on machine *** selection is an important component of artificial intelligence,machine learning,which can effectively solve the curse of dimensionality *** most of the labeled data is expensive to obtain.
Recently there have been growing interests in the applications of wireless sensor networks. Given a query point, which is a value, find a set of K nodes whose values are nearest to this point. We call this query the v...
详细信息
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challen...
详细信息
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational ***,dynamic environments necessitate robust partition detection *** paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are *** discuss partitioning features pertaining to database schema,table data,workload,and runtime *** then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization ***,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and *** survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.
This paper presents exploratory subgroup analytics on ubiquitous data: We propose subgroup discovery and assessment approaches for obtaining interesting descriptive patterns and provide a novel graphbased analysis app...
详细信息
暂无评论