Multi-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Scha...
详细信息
ISBN:
(纸本)3540405046
Multi-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singer's *** and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data.
Breast cancer is one of the diseases that represent a large number of incidence and mortality in the world. datamining classifications techniques will be effective tools for classifying data of cancer to facilitate d...
详细信息
ISBN:
(纸本)9781450365628
Breast cancer is one of the diseases that represent a large number of incidence and mortality in the world. datamining classifications techniques will be effective tools for classifying data of cancer to facilitate decision-making. The objective of this paper is to compare the performance of different machinelearning algorithms in the diagnosis of breast cancer, to define exactly if this type of cancer is a benign or malignant tumor. Six machinelearning algorithms were evaluated in this research Bayes Network (BN), Support Vector machine (SVM), k-nearest neighbors algorithm (Kim), Artificial Neural Network (ANN), Decision Tree (C4.5) and Logistic Regression. The simulation of the algorithms is done using the WEKA tool (The Waikato Environment for Knowledge Analysis) on the Wisconsin breast cancer dataset available in UCI machinelearning repository.
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important...
详细信息
ISBN:
(纸本)9781450394079
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. Recently, machinelearning techniques are widely developed and utilized to effectively tame graphs for discovering actionable patterns and harnessing them for advancing various graph-related computational tasks. Huge success has been achieved and numerous real-world applications have benefited from it. However, since in today's world, we are generating and gathering data in a much faster and more diverse way, real-world graphs are becoming increasingly large-scale and complex. More dedicated efforts are needed to propose more advanced machinelearning techniques and properly deploy them for real-world applications in a scalable way. Thus, we organize The 3rdinternational Workshop on machinelearning on Graphs (MLoG)(1), held in conjunction with the 16th ACM conference on Web Search and datamining (WSDM), which provides a venue to gather academia researchers and industry researchers/practitioners to present the recent progress on machinelearning on graphs.
This paper discusses a new approach for developing a service-oriented infrastructure for distributed datamining applications. The proposed architecture hides the complexity of implementation details and enables users...
详细信息
ISBN:
(纸本)0780388232
This paper discusses a new approach for developing a service-oriented infrastructure for distributed datamining applications. The proposed architecture hides the complexity of implementation details and enables users to perform datamining in a utility-like fashion. The service-oriented architecture provides an autonomic datamining framework where self-describing datamining services can be automatically discovered on the Internet. Moreover, this structure allows for the implementation of datamining algorithms for processing data on more than one site in a distributed manner. The performance of the proposed distributed datamining framework is compared to a standarddatamining approach to demonstrate its effectiveness.
The important indicator of students39; employment prospect can help colleges and universities better stabilize the output of college students who meet the actual needs of society [1]. The definition of students39;...
详细信息
ISBN:
(纸本)9781665417907
The important indicator of students' employment prospect can help colleges and universities better stabilize the output of college students who meet the actual needs of society [1]. The definition of students' employment prospect and its influencing factors is not obvious. Based on this, this paper selects indicators, constructs three types of indicators such as career choice, salary and self realization to form students' employment prospect indicators, and uses random forest, SVM and gbdt models for analysis. The comparative analysis of data shows that gbdt model has good prediction ability, The first mock exam shows that this model is more suitable for the need of educational datamining.
In the dynamic era of online education, the pursuit of a personalized and effective learning experience is paramount. A transformative approach in online education by integrating Multimodal datamining and data Synthe...
详细信息
Nowadays, the researches in datamining area have been continuous increasing. Appling datamining to agriculture;for example, the prediction of rice produce for farmers is still challenging. The objective of the resea...
详细信息
ISBN:
(纸本)9781538649916
Nowadays, the researches in datamining area have been continuous increasing. Appling datamining to agriculture;for example, the prediction of rice produce for farmers is still challenging. The objective of the research is to propose a model using machinelearning Techniques comparing between Decision Tree Technique and Neural Network Technique (ANN) for the prediction of rice produce for farmers. Farmers can predict volume of rice produce and selling price. It is helpful for farmers to increase their income. The process of the research follows Cross-industry standard process for datamining (CRISP-DM) process. The model pattern is classified by machinelearning techniques experiment with a dataset of farmer records. Performance measure of model pattern uses four options such as Test Options, Cross-Validation Folds 10, Split 80-20, and Use Training Set. After that, four options will be averaged for accuracy. The experimental result shows that the best technique which has highest accuracy can be helpful for farmers in real world.
作者:
Zhang, FaZhuhai Sch
Beijing Inst Technol Dept Business Adm Zhuhai Peoples R China
Simulation is a common method for studying the behavior of complex systems and revealing the mechanism of the system. However, complex systems have many parameters, non-linear interactions, and complex evolutionary dy...
详细信息
ISBN:
(纸本)9781665417907
Simulation is a common method for studying the behavior of complex systems and revealing the mechanism of the system. However, complex systems have many parameters, non-linear interactions, and complex evolutionary dynamics. It is difficult to reveal the mechanism of complex systems. Especially complex system simulation experiments may produce a large amount of data. How to summarize the macroscopic mode of the system, identify key factors, and discover the relationship between input and output variables, still lacks an effective method. This paper proposes an integrated framework for simulation modeling and datamining, which combines datamining and simulation modeling to conduct iterative experimental exploration and analysis of complex systems. datamining techniques were used in multiple stages ofmodeling and simulation, including: ETL on raw data, text mining and process mining to build conceptual models, uniform experimental design to generate simulation data, and clustering ofsimulation data to identify system macro patterns, use stepwise regression, neural network, etc. to build a meta-model of a complex system. The introduction of datamining can improve the ability and efficiency of complex system modeling and simulation.
Chronic kidney disease (CKD), is also known as chronic nephritic sickness. It defines constrains which affects your kidneys and reduces your potential to stay healthy. There will be various complication concerns like ...
详细信息
ISBN:
(纸本)9781538678084
Chronic kidney disease (CKD), is also known as chronic nephritic sickness. It defines constrains which affects your kidneys and reduces your potential to stay healthy. There will be various complication concerns like increased levels in your blood, anemia (low blood count), weak bones, and nerve injury. Detection and treatment should be done prior so it will typically keep chronic uropathy from obtaining a worse condition. data processing is the term used for information discovery from big databases. The task of knowledge mining is to generate regular patterns from historical data and emphasize future conclusions, follows from the convergence of many recent trends: the decreased value of huge knowledge storage devices and therefore the tremendous ease of aggregation knowledge over networks;the development of robust and economical machinelearning algorithms to method this data;and therefore the decrease value of machine power, enabling use of computationally intensive strategies for knowledge analysis. machinelearning is an important task as it benefits many applications such as analyzing life science outcomes, sleuthing fraud, sleuthing faux users etc. varied knowledge mining classification approaches and machinelearning algorithms are applied for prediction of chronic diseases. Therefore, this paper examines the performance of Naive Bayes, K-Nearest Neighbour (KNN) and Random Forest classifier on the basis of its accuracy, preciseness and execution time for CKD prediction. Finally, the outcome after conducted research is that the performance of Random Forest classifier is finest than Naive Bayes and KNN
This paper explores opinion mining using supervised learning algorithms to find the polarity of the student feedback based on pre-defined features of teaching and learning. The study conducted involves the application...
详细信息
ISBN:
(纸本)9781467395847
This paper explores opinion mining using supervised learning algorithms to find the polarity of the student feedback based on pre-defined features of teaching and learning. The study conducted involves the application of a combination of machinelearning and natural language processing techniques on student feedback data gathered from module evaluation survey results of Middle East College, Oman. In addition to providing a step by step explanation of the process of implementation of opinion mining from student comments using the open source data analytics tool Rapid Miner, this paper also presents a comparative performance study of the algorithms like SVM, Naive Bayes, K Nearest Neighbor and Neural Network classifier. The data set extracted from the survey is subjected to data preprocessing which is then used to train the algorithms for binomial classification. The trained models are also capable of predicting the polarity of the student comments based on extracted features like examination, teaching etc. The results are compared to find the better performance with respect to various evaluation criteria for the different algorithms.
暂无评论