Along with high speed urbanization and motorization, road traffic accidents have become a severe problem in China. Drivers39; operation error and risk-taking behavior is a leading cause of traffic accidents. Under t...
详细信息
ISBN:
(纸本)9781538678312
Along with high speed urbanization and motorization, road traffic accidents have become a severe problem in China. Drivers' operation error and risk-taking behavior is a leading cause of traffic accidents. Under this condition, the demand of drivers' traffic safety assessment keeps increasing, especially for professional drivers like passenger drivers and freight drivers. This work proposes a datamining framework of drivers' traffic safety assessment using drivers' personal information, traffic violations and traffic accident data records. Model validation and result interpretation are given, showing the rationality and usefulness of our proposed approach.
Straggler task is commonly considered as the major bottleneck in parallel data processing. Previous work mainly focuses on the coarse-grained straggler detection and optimization such as speculative scheduling. Howeve...
详细信息
ISBN:
(纸本)9781538625880
Straggler task is commonly considered as the major bottleneck in parallel data processing. Previous work mainly focuses on the coarse-grained straggler detection and optimization such as speculative scheduling. However, fine-grained root-cause analysis of straggler tasks is rarely considered. In addition, existing work simply depends on empirical analysis, which lacks of useful guidance to performance optimization. In this paper, we propose a new methodology of fine-grained straggler root-cause analysis using machinelearning. We collect raw metrics from Spark event log and hardware sampling tool, and refine them into high-level metrics for model learning. Then we present the root-cause analysis of stragglers through CART tree. A customized prune method is also applied to improve analysis accuracy. From the analysis, we derive several new findings beyond the well known causes of stragglers. Our work provides a new perspective on identifying and understanding the inefficiency in parallel data processing programs by applying machinelearning techniques to fine-grained root-cause analysis of straggler tasks.
datamining (DM) brings knowledge and theories from several fields including databases, machinelearning, optimization, statistics, and data visualization and has been applied to various real-life applications. A larg...
详细信息
ISBN:
(纸本)1424404509
datamining (DM) brings knowledge and theories from several fields including databases, machinelearning, optimization, statistics, and data visualization and has been applied to various real-life applications. A large amount of datamining articles have been published. The goal of this study is to establish an overview of the past and current datamining research activities from the title and abstract more than 1400 textual documents collected from premier datamining journals and conference proceedings. Specifically, this study applied document clustering approaches to determine which subjects had been studied over the last several years, which subjects are currently popular, and describe the longitudinal changes of datamining publications.
The availability of huge remote sensing image dataset imposes recourse to powerful techniques of content-based image retrieval for archiving and mining. This paper propose descriptors based on the SIFT (Scale invarian...
详细信息
ISBN:
(纸本)9781538642382
The availability of huge remote sensing image dataset imposes recourse to powerful techniques of content-based image retrieval for archiving and mining. This paper propose descriptors based on the SIFT (Scale invariant features) combined with SVM linear classification. To build a powerful image classifier using very little training data, image augmentation is usually required to boost the performance of the classification. For this reason, an augmentation data is used to increase the training data for the SVM (Support vector machine) classifier. The creation of the training data is done using several techniques of augmentation: anisotropic filter. We report a first evaluation of the CBIR (Content based image retrieval) and the second evaluation of the system aims to compare the deep learning with the boosted SVM classification.
This paper mainly introduces some machinelearning methods used in the field of datamining. The method of datamining is discussed by taking market segmentation algorithm as an example. This paper presents an improve...
详细信息
Intrusion detection systems play a crucial rule in this era where networks reached almost any sector. Unfortunately, intrusion detection systems are far from perfectness. Therefore, researchers never stopped digging d...
详细信息
ISBN:
(纸本)9781538642382
Intrusion detection systems play a crucial rule in this era where networks reached almost any sector. Unfortunately, intrusion detection systems are far from perfectness. Therefore, researchers never stopped digging deeper to improve them. In this context, datamining techniques have been highly exploited for intrusion detection. In this paper, we present a comparative study of datamining techniques for intrusion detection. Specifically, we study the overall performances of those methods as well as the impact of training data size on their results. We use ISCX2012 as a benchmark for our experimentation. A realistic dataset that represents at a certain level today's network traffic. The study shows that relatively old methods outperform some of the techniques highly used actually by the community. Regarding the impact of training dataset size, the investigated methods react differently from each other when we add more data to the training dataset. In addition, the results highlight the importance of attack traffic in the training dataset. Moreover, they strongly suggest the use of Random Forest for intrusion detection due to its linear performance relation with the training dataset's size.
datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, stati...
详细信息
ISBN:
(纸本)0769529941
datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, statistical and visualization techniques to discover and present knowledge in a form which is easily comprehensible to human. In the paper the authors first introduce the idea, basic concept and process of datamining, then, an example and methods of the application of datamining in physical statistics are analyzed. datamining is applied in physical training and evaluation, such as constitution data analyzing, PE industry and competitive sports. Thus;we think datamining becomes an important task of the scientific research of sports topic in future.
Network security is becoming an increasingly important issue, since the rapid development of the Internet. Network Intrusion Detection System (IDS), as the main security defending technique, is widely used against suc...
详细信息
ISBN:
(纸本)9781424455379
Network security is becoming an increasingly important issue, since the rapid development of the Internet. Network Intrusion Detection System (IDS), as the main security defending technique, is widely used against such malicious attacks. datamining and machinelearning technology has been extensively applied in network intrusion detection and prevention systems by discovering user behavior patterns from the network traffic data. Association rules and sequence rules are the main technique of datamining for intrusion detection. Considering the classical Apriori algorithm with bottleneck of frequent itemsets mining, we propose a Length-Decreasing Support to detect intrusion based on datamining, which is an improved Apriori algorithm. Experiment results indicate that the proposed method is efficient.
Fuzzy approaches can play an important role in datamining, because they provide comprehensible results. In addition, the approaches studied in datamining have mainly been oriented at highly structured and precise da...
详细信息
This research investigates the detection of student meta-cognitive planning processes in real-time using log tracing techniques. We use fine and coarse-grained data distillation, in combination with coarse-grained tex...
详细信息
ISBN:
(纸本)9780615375298
This research investigates the detection of student meta-cognitive planning processes in real-time using log tracing techniques. We use fine and coarse-grained data distillation, in combination with coarse-grained text replay coding, in order to develop detectors for students' planning of experiments in Science Assistments, an assessment and tutoring system for scientific inquiry. The goal is to recognize student inquiry planning behavior in real-time as the student conducts inquiry in a micro-world;the eventual goal is to provide real-time scaffolding of scientific inquiry.
暂无评论