Long-term air temperature prediction is of major importance in a large number of applications, including climate-related studies, energy, agricultural, or medical. This paper examines the performance of two machine Le...
详细信息
Long-term air temperature prediction is of major importance in a large number of applications, including climate-related studies, energy, agricultural, or medical. This paper examines the performance of two machine learning algorithms (Support Vector Regression (SVR) and Multi-layer Perceptron (MLP)) in a problem of monthly mean air temperature prediction, from the previous measured values in observational stations of Australia and New Zealand, and climate indices of importance in the region. The performance of the two considered algorithms is discussed in the paper and compared to alternative approaches. The results indicate that the SVR algorithm is able to obtain the best prediction performance among all the algorithms compared in the paper. Moreover, the results obtained have shown that the mean absolute error made by the two algorithms considered is significantly larger for the last 20 years than in the previous decades, in what can be interpreted as a change in the relationship among the prediction variables involved in the training of the algorithms.
Non-technical losses (NTLs) in electrical power grids, which mainly concern electrical theft, can have a major impact on the economies of energy providers and nations. The use of machine learning algorithms to detect ...
详细信息
ISBN:
(纸本)9781538622131;9781538622124
Non-technical losses (NTLs) in electrical power grids, which mainly concern electrical theft, can have a major impact on the economies of energy providers and nations. The use of machine learning algorithms to detect NTLs has been widely studied to attenuate the costs of on-site inspections of electricity consumers showing suspicious consumption behavior. An issue that has not received enough attention in the research is the imbalance between fraudulent and non-fraudulent data, which can have a major negative impact on the performance of supervised learning methods. Furthermore, most methods proposed in the literature have not evaluated the effectiveness of their methodology using meaningful performance measures. We propose a framework that addresses the problem of data imbalance in supervised classification techniques for NTL detection through resampling techniques. Additionally, we present the results of our experimental evaluation using an extensive list of performance metrics, two of which have not been previously reported in the literature-the Matthews Correlation Coefficient and the F β -score. Experiments have been carried out using 22 months of electricity consumption data corresponding to over 3,400 industrial and commercial customers in Honduras. Our experimental results show that class imbalance strategies applied on supervised classifiers for NTL detection can significantly improve the quality of predictions.
With the internet of objects, the number of devices with internet connection is increasing day by day. This leads to a very high amount of data circulating on the internet. It is one of the most common problems that c...
详细信息
With the internet of objects, the number of devices with internet connection is increasing day by day. This leads to a very high amount of data circulating on the internet. It is one of the most common problems that can be distinguished from normal and abnormal traffic by analyzing in high data amount. In this study, an analysis was carried out by using machinelearning approaches to determine whether the data received on the internet is normal or abnormal data. In order to achieve this goal, the KDD Cup 99 data set which is frequently used in literature studies is classified by Naive Bayes (NB), bayes NET (bN), Random Forest (RF), Multilayer Perception (MLP) and Sequential Minimal Optimization (SMO) algorithms. Classifiers are also compared with false rate, precision, recall, and F measure metrics along with accuracy rate values. Classification times of classifiers are also given by comparison.
Iterative machine learning algorithms, i.e., k-means (KM), expectation maximization (EM), become overwhelmed with big data since all data points are being continually and indiscriminately visited while a cost is being...
详细信息
Iterative machine learning algorithms, i.e., k-means (KM), expectation maximization (EM), become overwhelmed with big data since all data points are being continually and indiscriminately visited while a cost is being minimized. In this work, we demonstrate (1) an optimization approach to reduce training run-time complexity of iterative machine learning algorithms and (2) implementation of this framework over KM algorithm. We call this extended KM algorithm, KM*. The experimental results show that KM* outperforms KM over big real world and synthetic data sets. Lastly, we demonstrate the theoretical elements of our work.
Many resources today are shared freely through social network or cloud storage platforms, which are helpful for uses to acquire data or exchange information. Unfortunately, due to the unrestricted participations, some...
详细信息
Many resources today are shared freely through social network or cloud storage platforms, which are helpful for uses to acquire data or exchange information. Unfortunately, due to the unrestricted participations, some resources with advertisements or fraud are also uploaded, which force users to hit the ad websites or steal users' data. Therefore, the quality evaluation of one resource is needed for users to judge whether to utilize or install it. In this paper, we implement a system to evaluate the quality based on software install packages, which applies four algorithms to forecast the quality scores. We conduct an extensive experimental study on a real dataset and find that the prediction can be performed in less than one second (0.002s~0.04s) and with a high accuracy (82.84%~90.52%).
Teaching and research are two essential correlated activities in tertiary education. How research can successfully be applied to benefit the teaching and learning for the students is a challenging task for curriculum ...
详细信息
Teaching and research are two essential correlated activities in tertiary education. How research can successfully be applied to benefit the teaching and learning for the students is a challenging task for curriculum development. This paper presents the real practice of research-led teaching curriculum development for machine learning algorithms. Evaluations show the proposed research-led teaching method receives very high students' satisfaction and recognition.
Artificial Intelligence, a field which deals with the study and design of systems, which has the capability of observing its environment and does functionalities which aims at maximizing the probability of its success...
详细信息
ISBN:
(纸本)9788132226710;9788132226697
Artificial Intelligence, a field which deals with the study and design of systems, which has the capability of observing its environment and does functionalities which aims at maximizing the probability of its success in solving problems. AI turned out to be a field which captured wide interest and attention from the scientific world, so that it gained extraordinary growth. This in turn resulted in the increased focus on a field-which deals with developing the underlying conjectures of learning aspects and learningmachines-machinelearning. The methodologies and objectives of machinelearning played a vital role in the considerable progress gained by AI. machinelearning aims at improving the learning capabilities of intelligent systems. This survey is aimed at providing a theoretical insight into the major algorithms that are used in machinelearning and the basic methodology followed in them.
Searching a solution space using Stochastic Gradient Descent (SGD) depends on the examples picked at each iteration of the algorithm. Therefore, best practices suggest randomizing the order of training points to visit...
详细信息
ISBN:
(纸本)9781538678800
Searching a solution space using Stochastic Gradient Descent (SGD) depends on the examples picked at each iteration of the algorithm. Therefore, best practices suggest randomizing the order of training points to visit after every epoch. This random selection is typically implemented as a random shuffling of the order of the training vectors rather than a genuine random training point selection. The shuffling is usually performed after every epoch which results in an extremely low temporal locality of access to the training set. Indeed, each training point is used once, and not before all the other training points have been visited. This means that a cache layer in the memory hierarchy of a modern HPC computer system will have little benefit for the algorithm unless all the training points fit inside that cache.
Design of efficient, accurate, and low complexity intrusion detection system is a challenging task. Intrusion detection method is a core of intrusion detection system and it can be either signature based or anomaly ba...
详细信息
ISBN:
(纸本)9781509025503
Design of efficient, accurate, and low complexity intrusion detection system is a challenging task. Intrusion detection method is a core of intrusion detection system and it can be either signature based or anomaly based. Although, signature based has high detection rate but it cannot detect novel attacks. Asymmetrically, anomaly based detection method can detect novel attacks but it has high false positive rate. Many machinelearning techniques have been developed to cope with this problem. These machine learning algorithms develop a detection model in a training phase. This paper compares different supervised algorithms for the anomaly-based detection technique. The algorithms have been applied on the KDD99 dataset, which is the benchmark dataset used for anomaly-based detection technique. The result shows that not a single algorithm has a high detection rate for each class of KDD99 dataset. The performance measures used in this comparison are true positive rate, false positive rate, and precision.
machinelearning plays very important role in processing of large amounts of structured and unstructured data. A set of algorithms can be used to get meaningful insights into the data that are helpful in making effect...
详细信息
ISBN:
(纸本)9781509012824
machinelearning plays very important role in processing of large amounts of structured and unstructured data. A set of algorithms can be used to get meaningful insights into the data that are helpful in making effective business decisions. Document clustering is one of the popular machinelearning technique used to group unstructured data (text documents) based on its content and further analyze the data to understand the patterns in it. The unstructured data gets transformed into semi-structured data and structured data in stages by using text mining and clustering (k-means) techniques. Classification is another machinelearning technique that can be implemented for use cases like "fraud detection and cross-sell & up-sell opportunity identification" in banking, financial services and insurance industry. This paper focuses on the implementation of both document clustering algorithm and a set of classification algorithms (Decision Tree, Random Forest and Naïve Bayes), along with appropriate industry use cases. Also, the performance of three classification algorithms will be compared by calculation of "Confusion Matrix" which in turn helps us to calculate performance measures such as, "accuracy", "precision", and "recall".
暂无评论