In this paper, different human/machine strategies are tested in order to evaluate their performance in underwater threat recognition. Sonar images collected using synthetic aperture sonar (SAS) and side scan sonar (SS...
详细信息
In this paper, different human/machine strategies are tested in order to evaluate their performance in underwater threat recognition. Sonar images collected using synthetic aperture sonar (SAS) and side scan sonar (SSS) during real mine countermeasures exercises are used. Data are collected over a test area on the Belgian Continental Shelf, where several targets were deployed. Image resolution is divided in three categories: (1) up to 5cm pixel size, (2) between 5cm and 10cm pixel size, (3) larger than 10cm pixel size. Soil complexity is also evaluated and used to build up different strategies. Results demonstrate the utility of considering the human operator as an integral part of the automatic underwater object recognition process, as well as how automated algorithms can extend and complement human performances.
Dealing with missing values in data is an important feature engineering task in data science to prevent negative impacts on machine learning classification models in terms of accurate prediction. However, it is often ...
详细信息
ISBN:
(数字)9781728100401
ISBN:
(纸本)9781728100418
Dealing with missing values in data is an important feature engineering task in data science to prevent negative impacts on machine learning classification models in terms of accurate prediction. However, it is often unclear what the underlying cause of the missing values in real-life data is or rather the missing data mechanism that is causing the missingness. Thus, it becomes necessary to evaluate several missing data approaches for a given dataset. In this paper, we perform a comparative study of several approaches for handling missing values in data, namely listwise deletion, mean, mode, k-nearest neighbors, expectation-maximization, and multiple imputations by chained equations. The comparison is performed on two real-world datasets, using the following evaluation metrics: Accuracy, root mean squared error, receiver operating characteristics, and the F1 score. Most classifiers performed well across the missing data strategies. However, based on the result obtained, the support vector classifier method overall performed marginally better for the numerical data and naïve Bayes classifier for the categorical data when compared to the other evaluated missing value methods.
Chronic kidney disease (CKD) is one of the diseases with high mortality rate. It is a disease resulted from kidney function loss over a long period of time. The disease shows no symptoms during initial stage. When lef...
详细信息
ISBN:
(数字)9781728130446
ISBN:
(纸本)9781728130453
Chronic kidney disease (CKD) is one of the diseases with high mortality rate. It is a disease resulted from kidney function loss over a long period of time. The disease shows no symptoms during initial stage. When left not medicated, a person may suffer from other complications such as high blood pressure, anemia, malnutrition, increased risk of cardiovascular disease, cognitive impairment and impaired physical function. Automated diagnosis by using classification algorithms has been an interest of researchers. In this study, six machine learning algorithms were used for classification and its prediction performance was compared based on training time and F1 score, with and without hypertuning the parameters. Of all the six algorithms, KNN has the best F1 score of 0.992248 and minimal training time of 46.999ms. The performance of decision trees was improved with hypertuning, having a F1 score from 0.96 to 0.99. Overall, machine learning algorithms are significant tool to assess chronic kidney disease.
Student performance has an important role to measure student quality. Student quality can be measured through predictions of student performance. Prediction can be done using data mining techniques. One technique that...
详细信息
ISBN:
(数字)9781728126654
ISBN:
(纸本)9781728126661
Student performance has an important role to measure student quality. Student quality can be measured through predictions of student performance. Prediction can be done using data mining techniques. One technique that can be used is the classification method. The research aims to find out which classification model has the best performance related to student performance data. The data used is taken from UCI Machine Learning, namely student performance. The study used seven methods, namely K-nearest neighbor, classification and regression trees, naïve bayes, adaboost, extratree, bernaoulli naïve bayes, random forest. The technology used to compare the seven methods uses Python programming. Testing the performance of methods using cross validation. The results of this study are the comparison of student performance classification algorithms on student math, namely K-Nearest Neighboring of 86.52%, classification and regression tests of 86.08%, naïve bayes of 84.78%, adaboost of 88.04%, extratree of 81.30%, bernaoulli naïve bayes of 79.34%, random forest E of 87.82%, random forest G of 89.78%. Based on these results we know that the best classification method is the random forest G of 89.78%.
Code smells in a source code shows the weakness of design or implementation. To detect code smells, several detection tools have been developed. However, these tools generally produce different results, since code sme...
详细信息
Code smells in a source code shows the weakness of design or implementation. To detect code smells, several detection tools have been developed. However, these tools generally produce different results, since code smells are subjectively interpreted, informally defined and configured by the developers, domain-dependent and based on opinions and experiences. To cope with these issues, in this paper, we have used machine learning techniques, especially multi-label classification methods, to classify whether the given source code is affected with more than one code smells or not. We have conducted experiments on four code smell datasets and transformed them into two multi-label datasets (one for method level and the other one for class level). Two multi-label classification methods (Classifier Chains and Label Combination) and their ensemble models performed on the converted datasets using five different base classifiers. The results show that, as a base classifier, Random Forest algorithm performs better than Decision Tree, Naive Bayes, Support Vector Machine and Neural Network algorithms.
Agribusiness is the essential occupation in India, that assumes a vital job in the economy of the nation. Yearly 15.7 percentage of the crops are being lost due to attack by insect pests and diseases [1]. The diseases...
详细信息
Agribusiness is the essential occupation in India, that assumes a vital job in the economy of the nation. Yearly 15.7 percentage of the crops are being lost due to attack by insect pests and diseases [1]. The diseases caused will lead to a reduction of quality and quantity of crops. To maintain the health of the plant, it is required to identify the infection and give reasonable consideration. It is difficult to do physically because the human eye cannot observe the minute variations of the infected part of the leaf. In this way, we have built up a framework programming utilizing Matlab [2] to distinguish plant leaf illnesses by utilizing picture handling procedures. The software is produced so that a man even who don't have earlier learning about the plants, and their ailments can effectively recognize infected leaves. We have utilized k-means clustering to distinguish the tainted region of the plant leaf. The diseased recognition part incorporates picture obtaining, image pre-processing, segmentation and feature extraction and SVM classification.
Data mining is the discovery of interesting and valuable information hidden in large data sets. Data mining, whose usage area is expanding day by day, is also widely used in the shopping sector. In this paper, a data ...
详细信息
Data mining is the discovery of interesting and valuable information hidden in large data sets. Data mining, whose usage area is expanding day by day, is also widely used in the shopping sector. In this paper, a data collection form related to shopping habits was prepared and applied to individuals and a data set was obtained. The data obtained from this form were analyzed using data mining techniques. Thus, it was tried to determine what kinds of products people spend their money, tendency to save money according to gender and what they attach importance to shopping. In this study, many classification algorithms were used and as a result, J48, Naive Bayes, SMO and Random Forest classification algorithms were found to be the highest performing algorithms. The results revealed that gender and occupational knowledge affect the shopping rate and that the budget allocated to shopping varies according to gender. In addition, it was observed that the educational status and place of residence did not affect shopping tendency.
Nowadays a great amount of data related to health issues is stored, and each time the volume is increasing. In Panama, diabetes is a disease that causes a considerable number of deaths per year. This disease is the fi...
详细信息
ISBN:
(纸本)9781728116921
Nowadays a great amount of data related to health issues is stored, and each time the volume is increasing. In Panama, diabetes is a disease that causes a considerable number of deaths per year. This disease is the fifth cause of death in the country. Diabetes is one of the diseases with the greatest socio-sanitary impact, both due to the great importance it has, and also due to the large number of chronic complications that the patient has and in addition to its high mortality rate. Diabetes is a silent disease and every day in our country there are more people who suffer from it, it is unfortunate that many young people are developing this disease they do not know they have it. Using innovative technologies such as artificial intelligence (AI) applied to sensitive areas such as health is increasing every day. The new models based on machine learning currently, is growing, however in our countries there are few studies related to the subject. Therefore, this research aims to use various techniques of machine learning and determine how these models can help us to solve health problems.
Gestational diabetes mellitus (GDM) is a disease with normal glucose tolerance before pregnancy and only diabetes during pregnancy. The discovery of GDM has a long history. The specific causes and mechanisms of its oc...
详细信息
Gestational diabetes mellitus (GDM) is a disease with normal glucose tolerance before pregnancy and only diabetes during pregnancy. The discovery of GDM has a long history. The specific causes and mechanisms of its occurrence are still unclear. There is a lack of research on intelligent diagnosis of GDM. GDM has many adverse effects on pregnant women and fetuses, which is of great significance for the early diagnosis of GDM. Based on the measured data of the hospital, this paper realizes the intelligent diagnosis of GDM by using a improved KNN algorithm and a improved BP neural network.
Nowadays, companies continuous calculations and research with the available data to minimize the cost of personnel and time. Within the company, they provide an environment in which employees can enter their suggestio...
详细信息
Nowadays, companies continuous calculations and research with the available data to minimize the cost of personnel and time. Within the company, they provide an environment in which employees can enter their suggestions for improvement or complaints with the purpose of provide better service. Accordingly, use of "Personal suggestion systems" has been increasing by corporate companies during the recent years. In order to automate suggestion systems and make some analyzes, commonly used and developing machine learning technologies are used today. On the other hand, one of the problems encountered of machine learning as is the data set problem having imbalanced distribution. In the real world, data sets have imbalanced are quite a much. In this study, the results of ROS, RUS, SMOTE and ADASYN methods on the classification algorithms were analyzed and the best method "SMOTE" and "Gradient Boosting Classifier" classification algorithm which gave the best results were preferred.
暂无评论