To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using d...
详细信息
To make business policy, market analysis, corporate decision, fraud detection, etc., we have to analyze and work with huge amount of data. Generally, such data are taken from different sources. Researchers are using data mining to perform such tasks. Data mining techniques are used to find hidden information from large data source. Data mining is using for various fields: Artificial intelligence, Bank, health and medical, corruption, legal issues, corporate business, marketing, etc. Special interest is given to associate rules, data mining algorithms, decision tree and distributed approach. Data is becoming larger and spreading geographically. So it is difficult to find better result from only a central data source. For knowledge discovery, we have to work with distributed database. On the other hand, security and privacy considerations are also another factor for de-motivation of working with centralized data. For this reason, distributed database is essential for future processing. In this paper, we have proposed a framework to study data mining in distributed environment. The paper presents a framework to bring out actionable knowledge. We have shown some level by which we can generate actionable knowledge. Possible tools and technique for these levels are discussed.
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted i...
详细信息
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.
Employee turnover in the IT industry is among the highest compared to other industries. Knowing factors that influence the turnover may help reduce this issue in future. One of these factors is job satisfaction, which...
详细信息
Employee turnover in the IT industry is among the highest compared to other industries. Knowing factors that influence the turnover may help reduce this issue in future. One of these factors is job satisfaction, which are composed of two important factors, status and seniority. In this study, the correlation and chi-square visualization are utilized to determine the factors that affect employee turnover. The experiment was carried out to predict turnover using a private IT consultant dataset comparing three classification algorithms (decision tree, Naive Bayes, and Random Forest). The result shows that job duration and positioning are factors that influence employee turnover in a software company.
The rapid development of technology allows people to obtain a large amount of data, which contains important information and various noises. How to obtain useful knowledge from data is the most important thing at this...
详细信息
The rapid development of technology allows people to obtain a large amount of data, which contains important information and various noises. How to obtain useful knowledge from data is the most important thing at this stage of machine learning (ML). The problem of unbalanced classification is currently an important topic in the field of data mining and ML. At present, this problem has attracted more and more attention and is a relatively new challenge for academia and industry. The problem of unbalanced classification involves classifying data when there is insufficient data or severe category distribution deviations. Due to the inherent complexity of unbalanced data sets, more new algorithms and tools are needed to effectively convert a large amount of raw data into useful information and knowledge. Unbalanced data set is a special case of classification problem, in which the distribution between classes is uneven, and it is difficult to classify data accurately. This article mainly introduces the research on the processing method of computer algorithms based on the processing method of unbalanced data sets based on ML, aiming to provide some ideas and directions for the processing of computer algorithms based on unbalanced data sets based on ML. This article proposes a research strategy for processing unbalanced data sets based on ML, including data preprocessing, decision tree data classification algorithm, and C4.5 algorithm, which are used to conduct research experiments on processing methods for unbalanced data sets based on ML. The experimental results in this article show that the accuracy rate of the decision tree C4.5 algorithm based on ML is 94.80%, which can be better used for processing unbalanced data sets based on ML.
SARS-CoV-2 pandemic is the current threat of the world with enormous number of deceases. As most of the countries have constraints on resources, particularly for intensive care and oxygen, severity prediction with hig...
详细信息
SARS-CoV-2 pandemic is the current threat of the world with enormous number of deceases. As most of the countries have constraints on resources, particularly for intensive care and oxygen, severity prediction with high accuracy is crucial. This prediction will help the medical society in the selection of patients with the need for these constrained resources. Literature shows that using clinical data in this study is the common trend and molecular data is rarely utilized in this prediction. As molecular data carry more disease related information, in this study, three different types of RNA molecules (lncRNA, miRNA and mRNA) of SARS-COV-2 patients are used to predict the severity stage and treatment stage of those patients. Using seven different machine learning al-gorithms along with several feature selection techniques shows that in both phenotypes, feature importance selected features provides the best accuracy along with random forest classifier. Further to this, it shows that in the severity stage prediction miRNA and lncRNA give the best performance, and lncRNA data gives the best in treatment stage prediction. As most of the studies related to molecular data uses mRNA data, this is an interesting finding.
The advent of 5G which strives to connect more devices with high speed and low latencies has aided the growth IoT network. Despite the benefits of IoT, its applications in several facets of our lives such as smart hea...
详细信息
The advent of 5G which strives to connect more devices with high speed and low latencies has aided the growth IoT network. Despite the benefits of IoT, its applications in several facets of our lives such as smart health, smart homes, smart cities, etc. have raised several security concerns such as Distributed Denial of Service (DDoS) attacks. In this paper, we propose a DDoS mitigation framework for IoT using fog computing to ensure fast and accurate attack detection. The fog provides resources for effective deployment of the mitigation framework, this solves the deficits in resources of the resource-constrained IoT devices. The mitigation framework uses an anomaly-based intrusion detection method and a database. The database stores signatures of previously detected attacks while the anomaly-based detection scheme utilizes k-NN classification algorithm for detecting the DDoS attacks. By using a database containing the attack signatures, attacks can be detected faster when the same type of attack is executed again. The evaluations using a DDoS based dataset show that the k-NN classification algorithm proposed for our framework achieves a satisfactory accuracy in detecting DDoS attacks.
Optical Character Recognition (OCR) system is used to generate the textual representation of handwritten or printed text. Many research works are going on in the field of OCR over the past few decades for most of the ...
详细信息
Optical Character Recognition (OCR) system is used to generate the textual representation of handwritten or printed text. Many research works are going on in the field of OCR over the past few decades for most of the Indian scripts. Devanagari one of the most spoken languages in the world as well as India. Lack of a robust OCR system for Devanagari script is still there even after so much research. The aim of this paper is to make an OCR that could classify handwritten Devanagari numerals. This paper proposes an OCR based on Histogram of the angle made by a dark pixel with the zonal center of mass. This feature bags the angle made by each dark pixel in a zone about its center of mass. This newly extracted feature was used to train various classification algorithms like K-Nearest Neighbor, SVM, Linear SVM, Random Forest, Decision Tree, Gradient Boosting, Gaussian Naive Bayes. We reported an efficiency of each algorithm based on the new feature. Our experiment result shows that the Random Forest Model outperforms over the other algorithms and reports an efficiency of 92.57%.
Diabetes, caused by the rise in level of glucose in the blood, has many devices to identify it from blood samples. Diabetes, when unnoticed, may bring many serious diseases like heart attack and kidney disease. In thi...
详细信息
Diabetes, caused by the rise in level of glucose in the blood, has many devices to identify it from blood samples. Diabetes, when unnoticed, may bring many serious diseases like heart attack and kidney disease. In this way, there is a requirement for solid research and learning model enhancement in the field of gestational diabetes identification and analysis. SVM is one of the powerful classification models in machine learning, and similarly, deep neural networks are powerful under deep learning models. In this work, the authors applied enhanced support vector machine and deep learning model deep neural network for diabetes prediction and screening. The proposed method uses a deep neural network obtaining its input from the output of enhanced support vector machine, thus having a combined efficacy. The dataset considered includes 768 patients' data with eight major features and a target column with result "Positive" or "Negative." Experiment is done with Python, and the outcome of the demonstration shows that the deep learning model gives more efficiency for diabetes prediction.
Unmanned Aerial Vehicles (UAVs) are expected to be connected through cellular networks. As the radio characteristics are different for airborne UEs compared to terrestrial UEs, it is beneficial to identify whether a U...
详细信息
ISBN:
(纸本)9781538635315
Unmanned Aerial Vehicles (UAVs) are expected to be connected through cellular networks. As the radio characteristics are different for airborne UEs compared to terrestrial UEs, it is beneficial to identify whether a UE is airborne (on a UAV) or on the ground, such that interference and mobility management can be optimized for UAVs separately from terrestrial UEs. In this paper, we present a classification algorithm using existing LTE UE radio measurements to identify whether a UE is airborne or terrestrial. The method is verified with LTE measurements made in a rural area at different heights, including terrestrial measurements and it is shown that the method in 3 out of the 4 different measurement cases can detect a UE to be airborne with 99% likelihood, while the fourth case still can classify a UE correctly in 95% of the cases. The right classification can further be improved by taking multiple consecutive samples into account before making a classification decision.
In this paper, the Extended Coupled Amplitude Delay Lock Loop (ECADLL) architecture, previously introduced as a solution able to deal with a multipath environment, is revisited and improved to tailor it to spoofing de...
详细信息
In this paper, the Extended Coupled Amplitude Delay Lock Loop (ECADLL) architecture, previously introduced as a solution able to deal with a multipath environment, is revisited and improved to tailor it to spoofing detection purposes. Exploiting a properly-defined decision algorithm, the architecture is able to effectively detect a spoofer attack, as well as distinguish it from other kinds of interference events. The new algorithm is used to classify them according to their characteristics. We also introduce the use of a ratio metric detector in order to reduce the detection latency and the computational load of the architecture.
暂无评论