Individuals, criminals or even terrorist organizations can use web-communication for criminal purposes; to avoid the prosecution they try to hide their identity. To increase level of safety in Web we have to improve t...
详细信息
ISBN:
(纸本)9781509025008
Individuals, criminals or even terrorist organizations can use web-communication for criminal purposes; to avoid the prosecution they try to hide their identity. To increase level of safety in Web we have to improve the author (or web-user) identification and authentication procedures. In field of web author identification the situation of imbalanced data sets appears rather frequent, when number of one author's texts significantly exceeds the number of other's. This is common situation for the modern web: social networks, blogs, emails etc. Author identification task is some sort of classification task. To develop methods, technics and tools for web author identification we have to examine the performance of classification algorithms for imbalanced data sets. In this work several modern classification algorithms were tested on data sets with various levels of class imbalance and different number of available webpost The best accuracy in all experiments was achieved with Random Forest algorithm.
The problem of synthesis of the algorithms for classifying quasi-deterministic signals with random parameters is considered. The need to solve such a problem is associated with the widespread occurrence of signal proc...
详细信息
ISBN:
(数字)9798331540944
ISBN:
(纸本)9798331540951
The problem of synthesis of the algorithms for classifying quasi-deterministic signals with random parameters is considered. The need to solve such a problem is associated with the widespread occurrence of signal processing cases where the average likelihood ratio is not expressed through elementary functions. It is shown that the process of calculating the statistics of the average likelihood ratio can be represented by a sequence of elementary transformations of input influence samples. Expressions are found for determining asymptotically sufficient statistics when solving radio engineering problems, including possible simplifications for certain conditions. The corresponding block diagrams of computers have been obtained, allowing them to work when both the noise environment and the parameters of useful signals change. An approximate expression for the unconditional likelihood ratio for a wide class of probability distributions of input values and random parameters of signals is found. classification algorithms are given and their quality is assessed.
Hyperspectral imagery offers a means of uncovering enormous spectral information that can be used for various applications in data exploitation. How effectively such information is used affects the way image analysis ...
详细信息
ISBN:
(纸本)0780383508
Hyperspectral imagery offers a means of uncovering enormous spectral information that can be used for various applications in data exploitation. How effectively such information is used affects the way image analysis algorithms are designed. In this paper, we take up this issue and focus on algorithms designed and developed for target detection and classification in hyperspectral imagery. In order to effectively characterize the information available before and after the data are processed, the a priori information and a posteriori information are used in accordance with how the information is obtained. A piece of information is referred to as a priori information if it is provided by known knowledge before data are processed. On the other hand, a piece of information is referred to as a posteriori information if it is unknown a priori, but can be obtained directly from the data in an unsupervised fashion during the course of data processing. Since a priori information is known beforehand, it can be further decomposed into two types of information, desired and undesired a priori information. The desired a priori information is the knowledge that will assist, improve and enhance data analysis, whereas the undesired a priori information is the knowledge that hinders, interferes or destructs analysis during data processing. This paper investigates how these three types of information play their roles in design and development of several hyperspectral target detection and classification algorithms. Experiments are also conducted to validate their utility.
In this study, the performance of different hyperspectral classification algorithms with the same training set is investigated. In addition, the effect of the dimension and sampling strategy for the training set selec...
详细信息
In this study, the performance of different hyperspectral classification algorithms with the same training set is investigated. In addition, the effect of the dimension and sampling strategy for the training set selection is demonstrated. Support Vector Machines (SVM), K- Nearest Neighbor (K-NN) and Maximum Likelihood (ML) methods are used. The contribution of using spatial information with spectral information is observed. Meanshift segmentation and window weighting methods are used for spatial information. High resolution Pavia University hyperspectral data and Indian Pines data are used in this study.
The article considers the usage of various classification algorithms to process data from integrated sensors of a modern smartphone to obtain user's physical activity. To compare classification algorithms, we use ...
详细信息
The article considers the usage of various classification algorithms to process data from integrated sensors of a modern smartphone to obtain user's physical activity. To compare classification algorithms, we use modeling based on a dataset from an experiment that has 561 significant characteristics for each type of activity with a total number of records equal to 10299. The analysis of data from integrated sensors is performed by several classification algorithms: decision trees, Random Forest algorithm, logistic regression, support vector machine with a linear function and a kernel function. In addition to pure (without modification) ones, algorithms with preliminary optimization of hyperparameters using random search and cross-validation are used. The evaluation metrics are: Accuracy, Precision, Recall, F-measure, Mean Absolute Error and Root Mean Square Error. The estimated metrics and confusion matrices obtained using specially written python scripts using the sklearn library are provided for each algorithm. Analysis of the presented results shows that the best classification results both with preliminary optimization of hyperparameters and cross-validation, and without preliminary tuning, are provided by the use of linear support vector machine algorithm.
Nowadays Brain tumor detection in early stage is necessary because many people died due to unawareness of having a brain tumor. On the other side influence of machine learning becomes larger and larger in our lifes an...
详细信息
ISBN:
(纸本)9781538642559;9781538642542
Nowadays Brain tumor detection in early stage is necessary because many people died due to unawareness of having a brain tumor. On the other side influence of machine learning becomes larger and larger in our lifes and our society, artificial intelligence might also start playing an important role in medical diagnosis and support of doctors and surgeons. This paper is focused on review of those papers which include segmentation, detection and classification of brain tumors. The common procedure for an algorithm which aims to classify brain tumors on fMRI or MRI scans is: Preprocessing the image for example by removing noise, then segmenting the image, which yields the region which might be a brain tumor, and finally classifying features such as inten-sity, shape and texture of this region. Many machine learning approaches towards brain tumor detection have already been made. However, these approaches, even though yielding good results, are not used yet. Therefore this research topic remains important and still requires attention.
The unichain classification problem detects whether an MDP with finite states and actions is unichain or not under all deterministic policies. This problem has been proven to be NP-hard. This paper provides polynomial...
详细信息
ISBN:
(纸本)9781424431236
The unichain classification problem detects whether an MDP with finite states and actions is unichain or not under all deterministic policies. This problem has been proven to be NP-hard. This paper provides polynomial algorithms for this problem while there exists a state in an MDP, which is either recurrent under all deterministic policies or absorbing under some action.
According to World Health Organization report the number of deaths by road traffic accident is more than 1.25 million people and every year with non-fatal accidents affecting more than 20-50 million people. Several fa...
详细信息
According to World Health Organization report the number of deaths by road traffic accident is more than 1.25 million people and every year with non-fatal accidents affecting more than 20-50 million people. Several factors are contributed on the occurrence of road traffic accident. In this study, data mining classification techniques applied to establish models (classifiers) to identify accident factors and to predict traffic accident severity using previously recorded traffic data. Using WEKA (Waikato Environment for Knowledge Analysis) data mining decision tree (J48, ID3 and CART) and Naïve Bayes classifiers are built to model the severity of injury. The classification performance of all these algorithms is compared based on their results. The experimental result shows that the accuracy of J48 classifier is higher than others.
Peptide-binding proteins prediction is important in understanding biological interaction, protein performance analysis, cellular processes, drug design, and even cancer prediction, so using experimental predictive met...
详细信息
ISBN:
(数字)9781728195742
ISBN:
(纸本)9781728195759
Peptide-binding proteins prediction is important in understanding biological interaction, protein performance analysis, cellular processes, drug design, and even cancer prediction, so using experimental predictive methods, despite their operational capabilities, has limitations such as being costly and need to spend more time, differences between unrecognized protein structures and sequences, so design and development of computational systems for maintenance, optimal models for representing biological knowledge, management and the analysis of big biological data is so important that the authors used machine learning-based techniques such as Support Vector Machine (SVM),Random Forest (RF),Decision Tree (C4.5), Decision Tree (ID3),Gradient Boosting classifiers, which evaluated experimental results to optimize Support Vector Machine(SVM) classifier (Radial Basis Function kernel) with significant evaluation parameters such as accuracy(ACC) is equal to 0.7401 and 0.7599 for 10-fold cross validation and independent test set and also specificity (SPE) is equal to 0.7966 and 0.8088 for 10-fold cross validation and independent test set (respectively) by using various Structure-Based and Sequence-Based features.
The application of machine learning to investigate metabolic disorders like diabetes and Alzheimer's Disease (AD), which impact significant populations of people worldwide, is currently receiving a lot of attentio...
详细信息
ISBN:
(纸本)9781665454025
The application of machine learning to investigate metabolic disorders like diabetes and Alzheimer's Disease (AD), which impact significant populations of people worldwide, is currently receiving a lot of attention. Early on, AD was challenging to predict. The method of identifying the parameters that were most effective for predicting Alzheimer's disease involved the use of decision trees, support vector machines, XGBoost, random forest, extra trees, AdaBoost, gradient boosting, and voting classifiers. In this research, the Extra Trees classifier has been used which, from our research, has rarely been explored in the past. This classifier gives the best results in terms of performance among all the implemented techniques. The performance is evaluated using evaluation parameters such as precision, recall, F1-Score and accuracy. Using these machine learning methods, especially the Extra Trees algorithm, for early identification of Alzheimer’s disease will greatly benefit in lowering the annual mortality rate caused by AD.
暂无评论