datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, stati...
详细信息
ISBN:
(纸本)0769529941
datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, statistical and visualization techniques to discover and present knowledge in a form which is easily comprehensible to human. In the paper the authors first introduce the idea, basic concept and process of datamining, then, an example and methods of the application of datamining in physical statistics are analyzed. datamining is applied in physical training and evaluation, such as constitution data analyzing, PE industry and competitive sports. Thus;we think datamining becomes an important task of the scientific research of sports topic in future.
Intrusion detection systems play a crucial rule in this era where networks reached almost any sector. Unfortunately, intrusion detection systems are far from perfectness. Therefore, researchers never stopped digging d...
详细信息
ISBN:
(纸本)9781538642382
Intrusion detection systems play a crucial rule in this era where networks reached almost any sector. Unfortunately, intrusion detection systems are far from perfectness. Therefore, researchers never stopped digging deeper to improve them. In this context, datamining techniques have been highly exploited for intrusion detection. In this paper, we present a comparative study of datamining techniques for intrusion detection. Specifically, we study the overall performances of those methods as well as the impact of training data size on their results. We use ISCX2012 as a benchmark for our experimentation. A realistic dataset that represents at a certain level today's network traffic. The study shows that relatively old methods outperform some of the techniques highly used actually by the community. Regarding the impact of training dataset size, the investigated methods react differently from each other when we add more data to the training dataset. In addition, the results highlight the importance of attack traffic in the training dataset. Moreover, they strongly suggest the use of Random Forest for intrusion detection due to its linear performance relation with the training dataset's size.
Previously cancer was an incurable disease, but now with the advancement in technology it has been successful in becoming a curable disease. Oral cancer is the unstoppable increase in the number of cells or mutation t...
详细信息
ISBN:
(纸本)9781538611449
Previously cancer was an incurable disease, but now with the advancement in technology it has been successful in becoming a curable disease. Oral cancer is the unstoppable increase in the number of cells or mutation that is formed and has the capability to affect the neighboring tissues. In this paper different algorithms of datamining will be used to detect oral cancer. datamining is referred to a prominent technique employed by various health institutions for classification of life threatening diseases, e.g. cancer, dengue and tuberculosis. In our proposed approach WEKA is applied with ten cross validation to calculate and collate output. WEKA consists of a large variety of dataminingmachinelearning algorithms. First we have classified the oral cancer datasct and then analyzed various datamining methods in WEKA through Explorer and Experiment interfaces. The prime aim is to classify the dataset and help to collect useful material from the data and comfortably choose an appropriate algorithm for accurate prognostic model from it.
Agriculture is the most significant and ancient profession in India. As the economic system of India is mainly dependent on farming production, the extreme concern of food production is essential. Soil is a valuable n...
详细信息
Financial frauds are on the rise globally, causing significant financial losses. This issue has far-reaching consequences, impacting the investment industry, government, and corporate sectors alike. Manual verificatio...
详细信息
Fuzzy approaches can play an important role in datamining, because they provide comprehensible results. In addition, the approaches studied in datamining have mainly been oriented at highly structured and precise da...
详细信息
The widespread use of datamining is a direct result of the practice39;s first success in more public arenas like marketing, e-commerce, and retail. Discoveries in healthcare are among them. data is abundant in heal...
详细信息
In this paper the approach to the temperature parameters forecasting to avoid overheating of the spacecraft equipment at the end of the data transmission session is considered. To determine temperature values at the i...
详细信息
ISBN:
(纸本)9781467393799
In this paper the approach to the temperature parameters forecasting to avoid overheating of the spacecraft equipment at the end of the data transmission session is considered. To determine temperature values at the indicated times of the spacecraft components algorithms of historical data processing are proposed. The software is provided. The conducted experiments proved the ability to reveal anomaly situations.
Cluster analysis is an unsupervised machinelearning job of grouping objects based on some similarity measure. Among clustering algorithms, DBSCAN (Density Based Spatial Clustering of Application with Noise) contribut...
详细信息
With the rapid development of Internet technology, all kinds of data are growing exponentially. How to effectively manage and utilize these data has become the focus of research in the era of big data. Under the requi...
详细信息
ISBN:
(纸本)9781728155050
With the rapid development of Internet technology, all kinds of data are growing exponentially. How to effectively manage and utilize these data has become the focus of research in the era of big data. Under the requirement of massive data processing, aiming at the time requirement of massive data processing which cannot be met by traditional single-machine serial, this paper proposes a Spark computing framework, studies Bayesian algorithm in datamining, realizes the establishment method of parallel Bayesian algorithm and optimizes it. By using Spark memory computing framework, the efficiency of iteration is high. The computational performance of the parallel computing program is investigated. By comparing Spark parallel computing with traditional single machine serial experiments, it is found that the algorithm can effectively improve the speed of text classification. With the expansion of cluster size, the performance of classification accuracy, time performance and acceleration ratio is better. Parallel Bayesian algorithm based on Spark platform is feasible, which solves the problem that traditional single computer cannot handle large-scale data, and can effectively deal with all kinds of classification problems.
暂无评论