The cross-domain knowledge diffusion from science to policy is a prevalent phenomenon that demands academic attention. To investigate the characteristics of cross-domain knowledge diffusion from science to policy, thi...
详细信息
The cross-domain knowledge diffusion from science to policy is a prevalent phenomenon that demands academic attention. To investigate the characteristics of cross-domain knowledge diffusion from science to policy, this study suggests using the citation of policies to scientific articles as a basis for quantifying the diffusion strength, breadth, and speed. The study reveals that the strength and breadth of cross-domain knowledge diffusion from scientific papers to policies conform to a power-law distribution, while the speed follows a logarithmic normal distribution. Moreover, the papers with the highest diffusion strength, breadth, and fastest diffusion speed are predominantly from world-renowned universities, scholars, and top journals. The papers with the highest diffusion strength and breadth are mostly from social sciences, especially economics, those with the fastest diffusion speed are mainly from medical and life sciences, followed by social sciences. The findings indicate that cross-domain knowledge diffusion from science to policy follows the Matthew effect, whereby individuals or institutions with high academic achievements are more likely to achieve successful cross-domain knowledge diffusion. Furthermore, papers in the field of economics tend to have the higher cross-domain knowledge diffusion strength and breadth, while those in medical and life sciences have the faster cross-domain knowledge diffusion speed. 86 Annual Meeting of the Association for Information Science & Technology | Oct. 27 – 31, 2023 | London, United Kingdom. Author(s) retain copyright, but ASIS&T receives an exclusive publication license.
The processing of XML queries can result in evaluation of various structural relationships. Efficient algorithms for evaluating ancestor-descendant and parent-child relationships have been proposed. Whereas the proble...
详细信息
The processing of XML queries can result in evaluation of various structural relationships. Efficient algorithms for evaluating ancestor-descendant and parent-child relationships have been proposed. Whereas the problems of evaluating preceding-sibling-following-sibling and preceding-following relationships are still open. In this paper, we studied the structural join and staircase join for sibling relationship. First, the idea of how to filter out and minimize unnecessary reads of elements using parent's structural information is introduced, which can be used to accelerate structural joins of parent-child and preceding-sibling-following-sibling relationships. Second, two efficient structural join algorithms of sibling relationship are proposed. These algorithms lead to optimal join performance: nodes that do not participate in the join can be judged beforehand and then skipped using B^+-tree index. Besides, each element list joined is scanned sequentially once at most. Furthermore, output of join results is sorted in document order. We also discussed the staircase join algorithm for sibling axes. Studies show that, staircase join for sibling axes is close to the structural join for sibling axes and shares the same characteristic of high efficiency. Our experimental results not only demonstrate the effectiveness of our optimizing techniques for sibling axes, but also validate the efficiency of our algorithms. As far as we know, this is the first work addressing this problem specially.
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing v...
详细信息
ISBN:
(纸本)9783642235344;9783642235351
Duplicate detection has been well recognized as a crucial task to improve the quality of data. Related work on this problem mainly aims to propose efficient approaches over a single machine. However, with increasing volume of the data, the performance to identify duplicates is still far from satisfactory. Hence, we try to handle the problem of duplicate detection over MapReduce, a share-nothing paradigm. We argue the performance of utilizing MapReduce to detect duplicates mainly depends on the number of candidate record pairs. In this paper, we proposed a new signature scheme with new pruning strategy over MapReduce to minimize the number of candidate record pairs. Our experimental results over both real and synthetic datasets demonstrate that our proposed signature based method is efficient and scalable.
Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee *** current approaches to FIM under LDP add"padding and sampling"steps to ...
详细信息
Local differential privacy(LDP)approaches to collecting sensitive information for frequent itemset mining(FIM)can reliably guarantee *** current approaches to FIM under LDP add"padding and sampling"steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of *** current state-of-the-art approach,namely set-value itemset mining(SVSM),must balance variance and bias to achieve accurate ***,an unbiased FIM approach with lower variance is highly *** narrow this gap,we propose an Item-Level LDP frequency oracle approach,named the Integrated-with-Hadamard-Transform-Based Frequency Oracle(IHFO).For the first time,Hadamard encoding is introduced to a set of values to encode all items into a fixed vector,and perturbation can be subsequently applied to the *** FIM approach,called optimized united itemset mining(O-UISM),is pro-posed to combine the padding-and-sampling-based frequency oracle(PSFO)and the IHFO into a framework for acquiring accurate frequent itemsets with their ***,we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.
Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other seman...
详细信息
Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins.
The volume of RDF data increases very fast within the last five years, e.g. the Linked Open data cloud grows from 2 billions to 50 billions of RDF triples. With its wonderful scalability, cloud computing platform like...
详细信息
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challen...
详细信息
data partitioning techniques are pivotal for optimal data placement across storage devices,thereby enhancing resource utilization and overall system ***,the design of effective partition schemes faces multiple challenges,including considerations of the cluster environment,storage device characteristics,optimization objectives,and the balance between partition quality and computational ***,dynamic environments necessitate robust partition detection *** paper presents a comprehensive survey structured around partition deployment environments,outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are *** discuss partitioning features pertaining to database schema,table data,workload,and runtime *** then delve into the partition generation process,segmenting it into initialization and optimization stages.A comparative analysis of partition generation and update algorithms is provided,emphasizing their suitability for different scenarios and optimization ***,we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and *** survey aims to foster the implementation,deployment,and updating of high-quality partitions for specific system scenarios.
Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model,by employing word clustering and syntactic parsing ***, In order...
详细信息
Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model,by employing word clustering and syntactic parsing ***, In order to overcome the defects of the classical HMM, Markov family model(MFM), a new statistical model was introduced. Secondly, to solve the problem of data sparseness, we propose a bottom-to-up hierarchical word clustering algorithm. Then we combine syntactic parsing with part-of-speech tagging. The Part-ofSpeech tagging experiments show that the improved PartOf-Speech tagging model has higher performance than Hidden Markov models(HMMs) under the same testing conditions, the precision is enhanced from 94.642% to97.235%.
Dear editor,This letter presents an unsupervised feature selection method based on machine *** selection is an important component of artificial intelligence,machine learning,which can effectively solve the curse of d...
详细信息
Dear editor,This letter presents an unsupervised feature selection method based on machine *** selection is an important component of artificial intelligence,machine learning,which can effectively solve the curse of dimensionality *** most of the labeled data is expensive to obtain.
Implementing runtime integrity measurement in an acceptable way is a big challenge. We tackle this challenge by developing a framework called Patos. This paper discusses the design and implementation concepts of our o...
详细信息
暂无评论