The study on machinelearning has been flourishing for several years, and machine learning algorithms are being applied to various fields with great achievements. In this paper, combining the on-line machinelearning ...
详细信息
ISBN:
(纸本)9781538630075
The study on machinelearning has been flourishing for several years, and machine learning algorithms are being applied to various fields with great achievements. In this paper, combining the on-line machinelearning method into optimization algorithms is to be studied. In many heuristic optimization algorithms, one common way to reduce execution time and improve solution optimality is, first estimating the quality of a set of candidate solutions, and solving only promising candidates in detail. Currently most estimations are performed by empirical equations, whose accuracy greatly relies on the how well the equation is designed. In this paper, we propose an on-line learning based estimator to perform the solution estimation in heuristic algorithms to improve estimation accuracy. Then a simple case study is discussed, where a local search based heuristic with random start is used, and an on-line estimator considering the properties of local search is proposed. The experiments show that the accuracy of on-line estimator is much higher than the static estimator, and is also higher than a general off-line pretrained learner. Even though the on-line estimator introduced special time for its training, the heuristic algorithm still speeds up by 3.7X without optimality sacrifice.
Due to the existence of a double-sided asymmetric information problem on the labour marketcharacterized by a mutual lack of trust by employers and unemployed people, not enough job matchesare facilitated by public emp...
详细信息
Due to the existence of a double-sided asymmetric information problem on the labour marketcharacterized by a mutual lack of trust by employers and unemployed people, not enough job matchesare facilitated by public employment services (PES), which seem to be caught in a low-end equilibrium. In order to act as a reliable third party, PES need to build a good and solid reputation among their mainclients by offering better and less time consuming pre-selection services. The use of machine-learning, data-driven relevancy algorithms that calculate the viability of a specific candidate for a particular jobopening is becoming increasingly popular in this field. Based on the Portuguese PES databases (CVs, vacancies, pre-selection and matching results), complemented by relevant external data published byStatistics Portugal and the European Classification of Skills/Competences, Qualifications andOccupations (ESCO), the current thesis evaluates the potential application of models such as RandomForests, Gradient Boosting, Support Vector machines, Neural Networks Ensembles and other tree-basedensembles to the job matching activities that are carried out by the Portuguese PES, in order tounderstand the extent to which the latter can be improved through the adoption of automatedprocesses. The obtained results seem promising and point to the possible use of robust algorithms suchas Random Forests within the pre-selection of suitable candidates, due to their advantages at variouslevels, namely in terms of accuracy, capacity to handle large datasets with thousands of variables, including badly unbalanced ones, as well as extensive missing values and many-valued categoricalvariables.
This paper investigates methods aiming at the automatic recognition and classification of discrete environmental sounds, for the purpose of subsequently applying these methods to the recognition of soundscapes. Resear...
详细信息
ISBN:
(纸本)9781450338967
This paper investigates methods aiming at the automatic recognition and classification of discrete environmental sounds, for the purpose of subsequently applying these methods to the recognition of soundscapes. Research in audio recognition has traditionally focused on the domains of speech and music. Comparatively little research has been done towards recognizing non-speech environmental sounds. For this reason, in this paper, we apply existing techniques that have been proved efficient in the other two domains. These techniques are comprehensively compared to determine the most appropriate one for addressing the problem of environmental sound recognition.
The relationships between the fatigue crack growth rate (da/dN) and stress intensity factor range (Delta K) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are div...
详细信息
The relationships between the fatigue crack growth rate (da/dN) and stress intensity factor range (Delta K) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machinelearning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learningmachine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model (K* approach). The results show that the predictions of MLAs are superior to those of K* approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.
SSH Attacks are of various types: SSH port scanning, SSH Brute-force attacks, Attacks using compromised SSH server. Attacks using a compromised server could be DoS attacks, Phishing attacks, E- mail spamming and so on...
详细信息
ISBN:
(纸本)9781509037667
SSH Attacks are of various types: SSH port scanning, SSH Brute-force attacks, Attacks using compromised SSH server. Attacks using a compromised server could be DoS attacks, Phishing attacks, E- mail spamming and so on. This paper questions whether the attacks from a compromised SSH server be segregated from other attacks using the network flows. In this work, we categorize SSH attacks into two types. The first category consists of all attack activities after a successful compromise of an SSH server. We name it as "severe" attacks. The second type includes all attacks leading to a successful compromise. It consists of SSH port scanning, SSH Brute-force attack, and compromised SSH server with no activities. The second category is named as "not-so-severe" attacks. We employ machine learning algorithms, namely, Naive Bayes learner, Logistic Regression, J48 decision tree, and Support Vector machine to classify these attacks. Suitable features were selected based on domain knowledge, literature survey, and feature selection technique to evaluate the performance of machine learning algorithms using the metrics accuracy, sensitivity, precision, and F-score.
Big data analytics is one of the emerging technologies as it promises to provide better insights from huge and heterogeneous data. Big data analytics involves selecting the suitable big data storage and computational ...
详细信息
Big data analytics is one of the emerging technologies as it promises to provide better insights from huge and heterogeneous data. Big data analytics involves selecting the suitable big data storage and computational framework augmented by scalable machine-learningalgorithms. Despite the tremendous buzz around big data analytics and its advantages, an extensive literature survey focused on parallel data-intensive machine-learningalgorithms for big data has not been conducted so far. The present paper provides a comprehensive overview of various machine-learningalgorithms used in big data analytics. The present work is an attempt to identify the gaps in the work already performed by researchers, thus paving the way for further quality research in parallel scalable algorithms for big data. (C) 2016 John Wiley & Sons, Ltd
A well-known problem with modern anti-submarine warfare sonars with narrow beamwidths and wide frequency bandwidths, is the frequent occurence of false alarms, particularly in littoral environments. This increases the...
详细信息
A well-known problem with modern anti-submarine warfare sonars with narrow beamwidths and wide frequency bandwidths, is the frequent occurence of false alarms, particularly in littoral environments. This increases the workload of sonar operators and also reduces the usefulness of automatic systems such as autonomous underwater vehicles, since their limited communication abilities hinder them from sharing large amounts of contacts. In this paper, four traditional machine learning algorithms are tested on sonar data with a high amount of false alarms together with synthetic submarine echoes. It is shown that some of the algorithms can outperform simple signal to noise ratio (SNR) thresholding by a significant amount, but that the performance is highly dependent on the parameter values chosen for each algorithm. These parameters are therefore investigated in order to determine their relative significance.
The purpose of this research is to implement different machine learning algorithms in optical character recognition. The algorithms used the pixel density of image of handwritten digits as an input. The algorithms whe...
详细信息
ISBN:
(纸本)9781467394178
The purpose of this research is to implement different machine learning algorithms in optical character recognition. The algorithms used the pixel density of image of handwritten digits as an input. The algorithms when implemented produced the value of labels of each handwritten digit. The value of labels generated, was then matched with the actual value of labels of the MNIST handwritten digits to determine the accuracy of an algorithm. machine learning algorithms that have been used for this research are Naïve Bayes, Naïve Bayes with Laplace Smoothing, Sequential Minimal Optimization, C4.5 decision trees and Logistic Regression. The accuracy for each of the algorithm was calculated and Logistic regression was found out to be the most accurate of them all for handwritten digits.
House sales are determined based on the Standard & Poor’s Case-Shiller home price indices and the housing price index of the Office of Federal Housing Enterprise Oversight (OFHEO). These reflect the tren...
详细信息
House sales are determined based on the Standard & Poor’s Case-Shiller home price indices and the housing price index of the Office of Federal Housing Enterprise Oversight (OFHEO). These reflect the trends of the US housing market. In addition to these housing price indices, the development of a housing price prediction model can greatly assist in the prediction of future housing prices and the establishment of real estate policies. This study uses machine learning algorithms as a research methodology to develop a housing price prediction model. To improve the accuracy of housing price prediction, this paper analyzes the housing data of 5359 townhouses in Fairfax County, Virginia, gathered by the Multiple Listing Service (MLS) of the Metropolitan Regional Information Systems (MRIS). We develop a housing price prediction model based on machine learning algorithms such as C4.5, RIPPER, Naïve Bayesian, and AdaBoost and compare their classification accuracy performance. We then propose an improved housing price prediction model to assist a house seller or a real estate agent make better informed decisions based on house price valuation. The experiments demonstrate that the RIPPER algorithm, based on accuracy, consistently outperforms the other models in the performance of housing price prediction.
machinelearning domain has grown quickly the last few years, in particular in the mobile eHealth domain. In the context of the DINAMO project, we aimed to detect hypoglycemia on Type 1 diabetes patients by using thei...
详细信息
ISBN:
(纸本)9781509018598
machinelearning domain has grown quickly the last few years, in particular in the mobile eHealth domain. In the context of the DINAMO project, we aimed to detect hypoglycemia on Type 1 diabetes patients by using their ECG, recorded with a sport-like chest belt. In order to know if the data contain enough information for this classification task, we needed to apply and evaluate machine learning algorithms on several kinds of features. We have built a Python toolbox for this reason. It is built on top of the scikit-learn toolbox and it allows evaluating a defined set of machine learning algorithms on a defined set of features extractors, taking care of applying good machinelearning techniques such as cross-validation or parameters grid-search. The resulting framework can be used as a first analysis toolbox to investigate the potential of the data. It can also be used to fine-tune parameters of machine learning algorithms or parameters of features extractors. In this paper we explain the motivation of such a framework, we present its structure and we show a case study presenting negative results that we could quickly spot using our toolbox.
暂无评论