The examination of credit risk has become very crucial in the financial world to avoid massive losses, as, without repayment of loans, they earn no profit. It can be thought of as an expansion of the credit distributi...
详细信息
ISBN:
(数字)9781728163871
ISBN:
(纸本)9781728163888
The examination of credit risk has become very crucial in the financial world to avoid massive losses, as, without repayment of loans, they earn no profit. It can be thought of as an expansion of the credit distribution measure. To ease the task of investors, a methodology is proposed using machinelearning models that can predict the status of whether the loan should be granted to a customer based on his pre-fed attributes. first, preprocessing of the data is performed followed by feature extraction using LDA and PCA. The model is created using various machine learning algorithms on two different sized datasets. It has been observed that Logistic regression shows the highest accuracy followed by random forest classification and KNN. It is also seen that LDA performed better than PCA in all algorithms. Therefore, machinelearning regression and classification algorithms have shown reliable results for the money-lenders to safely invest.
Recently, applying quantized images to machine learning algorithms has been expected to enhance robustness against adversarial examples. In this paper, three quantization methods: linear quantization, lloyd-max quanti...
详细信息
ISBN:
(数字)9781728198026
ISBN:
(纸本)9781728198033
Recently, applying quantized images to machine learning algorithms has been expected to enhance robustness against adversarial examples. In this paper, three quantization methods: linear quantization, lloyd-max quantization and error diffusion are considered for producing quantized images, respectively, and the influence of using the quantized images is discussed in an image classification experiment under the use of typical machine learning algorithms including deep learning ones.
Most common diseases and the leading cause of death to most women across the globe is Breast Cancer (BC). Although many individuals who suffer breast cancer have no family history but women who have blood relatives su...
详细信息
ISBN:
(数字)9781728183121
ISBN:
(纸本)9781728183138
Most common diseases and the leading cause of death to most women across the globe is Breast Cancer (BC). Although many individuals who suffer breast cancer have no family history but women who have blood relatives suffering from the same disease are at higher risk. Besides, a high risk of developing breast cancer includes aging, genes, thick breast tissues, obesity, and radiation exposure. Malignant and benign are two different types of tumors and to distinguish between these two, physicians need a reliable diagnostic procedure. The mammography method is used to detect breast cancer but radiologists exhibit significant variation in interpretation. Fine Needle Aspiration Cytology (FNAC) is commonly adopted in the diagnosis of breast cancer. Moreover, early diagnosis is vital to treatment with a better chance of success. Classification and data mining attributes are an efficient and effective way of categorizing results. Using machinelearning models that will play a vital role in early prediction. In this paper, we present a prediction of breast cancer with different machine learning algorithms compare their prediction accuracy, area under the receiver operating characteristic curve (AUC) and performance parameters. For Simulation purposes, we are using the Wisconsin Dataset of Breast Cancer (WDBC). After analysis, the Support Vector machine (SVM) model has achieved 96.25% accuracy with AUC of 99.4. Further, these algorithms can be modified with their mathematical models to increase the prediction of breast cancer.
Diabetes is a chronic disease. The risk of diabetes is increasing so fast, destroying human health. The proposed model combines two algorithms of machine learning algorithms and these algorithms are Support Vector Mac...
详细信息
ISBN:
(数字)9781728126807
ISBN:
(纸本)9781728126814
Diabetes is a chronic disease. The risk of diabetes is increasing so fast, destroying human health. The proposed model combines two algorithms of machine learning algorithms and these algorithms are Support Vector machine and Random Forest to predict the diabetes. Using a real dataset collected from Security Force Primary Health Care. The proposed model achieved 98% of accuracy, ROC 99%. The result shows that Random Forest algorithm is has better accuracy score when compared to Support Vector machine.
Vegetation is one of the most important part of an ecosystem. It is responsible for providing oxygen and gets in carbon dioxide, hence providing a suitable place for the human beings to live. The information about thi...
详细信息
ISBN:
(数字)9781728144528
ISBN:
(纸本)9781728144535
Vegetation is one of the most important part of an ecosystem. It is responsible for providing oxygen and gets in carbon dioxide, hence providing a suitable place for the human beings to live. The information about this vegetation is very critical. Using remote sensing, this information can be taken and gathered and later on used for different purposes. This paper aims to classify vegetation into different types and categories. Three machine learning algorithms i.e. K-means, Support Vector machine (SVM) and Artificial Neural Networks (ANN) have been used because of their being the most popular and well known algorithms of the current time to classify vegetation. K-means being unsupervised classifier is used to compare it to two supervised classifiers i.e. SVM and ANN. Non-vegetation including buildings, roads, rivers etc. are also classified into their respective categories. This classification can be useful in many ways. They can be used by government agencies and authorities to get information about the yield of a specific crop e.g. tobacco, maize etc. This information could be very useful for gathering statistics of the crop and its location on map. These locations can be used for extracting the crops and for future planning regarding it. The information about buildings and roads can help in town planning for future.
Electrogastrogram (EGG) is a simple and non-invasive method in clinical practices for assessing the stomach function by observing the gastric myoelectrical activity extracted using the electrodes placed on the abdomin...
详细信息
ISBN:
(数字)9781665422444
ISBN:
(纸本)9781665446686
Electrogastrogram (EGG) is a simple and non-invasive method in clinical practices for assessing the stomach function by observing the gastric myoelectrical activity extracted using the electrodes placed on the abdominal surface. EGG is a slow wave propagation. Based on the dominant frequency or cycle per minute, there are three types of EGG signals: Normogastria, Bradygastria, and Tachygastria. In this study, we used the Logistic Regression (LG), Support Vector machine (SVM) and K Nearest Neighbor (KNN) machinelearning (ML) algorithms to successfully classify two and three types (classes in ML terminology) of EGG signal with high accuracy. Our results show that the SVM algorithm performs best to classify the two and three class signals with an accuracy of 100% and 92.11% respectively, while logistic regression and the KNN algorithms demonstrate similar lower performances. SVM algorithm also achieved a maximum F1 score, precision, and recall value of 100% and 92% for the two and three classes of EGG signal respectively. An Area Under the Curve (AUC) score of 100% and 92% are observed in the two-class and three-class problem respectively in EGG signal classification using the SVM algorithm. Based on our analysis, we can conclude that SVM can be implemented successfully to accurately classify multi-class EGG signals.
Sentiment analysis, also referred to as opinion mining or emotion extraction is the classification of emotions within a textual data. This technique has been widely used over the years in order to determine the sentim...
详细信息
ISBN:
(数字)9781728197852
ISBN:
(纸本)9781728197869
Sentiment analysis, also referred to as opinion mining or emotion extraction is the classification of emotions within a textual data. This technique has been widely used over the years in order to determine the sentiments, emotions within a particular textual data. Twitter is a social media platform that has been mostly used by people to express emotions for particular events. In this paper, we have collected tweets for a number of events, analyzed them using a number of machine learning algorithms like Naïve Bayes, SVM, Random Forest classifier and LSTM and compared the results.
This paper considers establishing if a news article is true or if it has been faked. To achieve the task accurately, the work compares different machinelearning classification algorithm with the different feature ext...
详细信息
ISBN:
(数字)9781728194189
ISBN:
(纸本)9781728194196
This paper considers establishing if a news article is true or if it has been faked. To achieve the task accurately, the work compares different machinelearning classification algorithm with the different feature extraction methods. The algorithm with the feature extraction method giving the highest accuracy is then used for future prediction of the labels of news headlines. In this work the algorithm show to have the highest accuracy was logistic regression with 71% percent accuracy when used with tf-idf feature extraction method.
Phishing is a malicious form of online theft and needs to be prevented in order to increase the overall trust of the public on the Internet. In this study, for that purpose, the authors present their findings on the m...
详细信息
ISBN:
(数字)9781728137834
ISBN:
(纸本)9781728137841
Phishing is a malicious form of online theft and needs to be prevented in order to increase the overall trust of the public on the Internet. In this study, for that purpose, the authors present their findings on the methods of detecting phishing websites. Data mining algorithms along with classifier algorithms are used in order to achieve a satisfactory result. In terms of classifiers, the Naïve Bayes, SMO, and J48 algorithms are used. As for the feature selection algorithm; Gain Ratio Attribute and ReliefF Attribute are selected. The results are provided in a comparative way. Accordingly; SMO and J48 algorithms provided satisfactory results in the detection of phishing websites, however, Naïve Bayes performed poor and is the least recommended method among all.
This paper focuses on the data-driven diagnosis of polycystic ovary syndrome (PCOS) in women. For this, machine learning algorithms are applied to a dataset freely available in Kaggle repository. This dataset has 43 a...
详细信息
ISBN:
(数字)9781728173665
ISBN:
(纸本)9781728173672
This paper focuses on the data-driven diagnosis of polycystic ovary syndrome (PCOS) in women. For this, machine learning algorithms are applied to a dataset freely available in Kaggle repository. This dataset has 43 attributes of 541 women, among which 177 are patients of PCOS disease. Firstly, univariate feature selection algorithm is applied to find the best features that can predict PCOS. The ranking of the attributes is computed and it is found that the most important attribute is the ratio of Follicle-stimulating hormone (FSH) and Luteinizing hormone (LH). Next, holdout and cross validation methods are applied to the dataset to separate the training and testing data. A number of classifiers such as gradient boosting, random forest, logistic regression, and hybrid random forest and logistic regression (RFLR) are applied to the dataset. Results show that the first 10 highest ranked attributed are good enough to predict the PCOS disease. Results also demonstrate that RFLR exhibits the best testing accuracy of 91.01% and recall value of 90% using 40-fold cross validation applied to the 10 most important features. Hence, RFLR is suitable for reliably classifying PCOS patients.
暂无评论