To explore the association relations among disease, pathogenesis, physician, symptoms and drug, we adapt a variational Apriori algorithm for discovering association rules on a dataset of the Qing Court Medical Records...
详细信息
To explore the association relations among disease, pathogenesis, physician, symptoms and drug, we adapt a variational Apriori algorithm for discovering association rules on a dataset of the Qing Court Medical Records. There are five types of semantic associations we intend to discover, including Disease-Pathogenesis-Drug set(DPaD), Disease-Symptoms-Drug set (DSyD), Disease-Drug set (DD), Disease-Physician-Drug set (DPhD) and Disease-Drug Category Set (DDC). To solve the synonymity problem and the data sparseness problem, we give a mapping strategy which maps pathogenesis to standardized forms and maps drugs to drug categories. With the mapping strategy the number of frequent drug sets rises from 287 to 1184. The experimental results indicate that our method with the mapping strategy is an effective way to acquire valuable semantic association rules.
This paper proposes a novel method for breast cancer diagnosis using the features generated by genetic programming (GP). We developed a new individual combination pattern (Composite individual genetic programming) whi...
详细信息
Fast convergence speed is a desired property for training topic models such as latent Dirichlet allocation (LDA), especially in online and parallel topic modeling algorithms for big data sets. In this paper, we develo...
详细信息
PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of ...
详细信息
For the issue that existing methods for Expert Homepage Recognition (EHP) usually identify each page separately, regardless of the relationships among the labels of candidate homepages, this paper, integrated utilizin...
详细信息
There are a number of leaf recognition methods, but most of them are based on Euclidean space. In this paper, we will introduce a new description of feature for the leaf image recognition, which represents the leaf co...
详细信息
In view of characteristics of the factoid question and the list question of the Question Answering System (QA), this paper proposed a domain entity answer ranking model which integrates multiple features. First, for t...
详细信息
For the problem that many difierent classification of questions and answers and user's changing from one interest to another, we propose a personalized user model based on multi-kernel support for vector data doma...
详细信息
Use of probability and statistics for question classification, the classifier training only relies on the frequency of the feature words in the question, but it dose not take into account the semantic relationships be...
详细信息
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithm...
详细信息
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.
暂无评论