The paper goal is to describe a geometrical data preprocessor, which is used at realization of the generalized mode-matching technique (GMMT). Its destination consists in processing of the geometry specification of th...
详细信息
ISBN:
(纸本)078037391X
The paper goal is to describe a geometrical data preprocessor, which is used at realization of the generalized mode-matching technique (GMMT). Its destination consists in processing of the geometry specification of the cross-section a complicated waveguide line by the manner that allows the unificate the process of the matrix operators required to find the mode basis.
Wind turbine data preprocessing is a key step in wind turbine equipment condition assessment,and it helps to improve data quality and data *** this paper,a data preprocessing method has been proposed based on the neig...
详细信息
ISBN:
(纸本)9781509046584
Wind turbine data preprocessing is a key step in wind turbine equipment condition assessment,and it helps to improve data quality and data *** this paper,a data preprocessing method has been proposed based on the neighbor model of least squares support vector machine,with the wind speed data as an *** are strong similarities between the operating conditions of wind turbines with similar wind *** this paper,e use the normal data of multiple wind turbine anemometers and the least squares support vector machine(LS-SVM) method to establish the neighbor model between wind speed data of multiple wind *** model reflects the similarity of the wind speed between *** the model established,the wind speed data containing the outliers will be input to the *** the wind speed data of one unit is abnormal,the similarity relation between the data and its adjacent units data is *** prediction residual of the wind speed of this unit will be increased significantly by the neighbor model,indicating that the wind speed data is abnormal *** method can realize the recognition of wind turbine abnormal *** on the actual operation data of a wind farm,the validity of the method is verified.
Web log mining is the most important method in Web data mining, and data preprocessing is the primary work. In order to find more value access mode and reduce the data size from the Web, find the data of users and eve...
详细信息
ISBN:
(纸本)9781479965755
Web log mining is the most important method in Web data mining, and data preprocessing is the primary work. In order to find more value access mode and reduce the data size from the Web, find the data of users and even between users, this paper puts forward a method of Web log data preprocessing based on user characteristic of interests, and then put forward some concepts such as user interest, user interest similarity. Finally, after some experiments, we can show the superiority and recommended value of this new method.
Business intelligence (BI) system mixes operational data with the analytical tools to represent descriptive and complicated data to groups of decision makers. BI aims to enhance the features and accuracy of data wareh...
详细信息
ISBN:
(纸本)9789811037795;9789811037788
Business intelligence (BI) system mixes operational data with the analytical tools to represent descriptive and complicated data to groups of decision makers. BI aims to enhance the features and accuracy of data warehouse to the decision-making process and widely applied in industry. In order to achieve that, BI pulls and gathers information from multiple sources of information systems. data from multiple sources tend to have flaws such as missing values, inconsistency data, and redundant data. Hence, this paper aims to show data preprocessing techniques used to produce clean and quality data for Universiti Teknologi Malaysia (UTM) research performance analysis. For this research study, required data were provided by UTM management level. In future, this study is expected to compare different data preprocessing techniques and recommend the best one for research performance analysis.
Nowadays, the integrated navigation system is very important in all kinds of aircraft, and the integrated navigation algorithm is the core of the integrated navigation system. The main research in this paper are as fo...
详细信息
ISBN:
(纸本)9789881563972
Nowadays, the integrated navigation system is very important in all kinds of aircraft, and the integrated navigation algorithm is the core of the integrated navigation system. The main research in this paper are as follows: (1) Introduce three low-pass filters to filter out the high-frequency noise collected by sensors. Then compare and analyze the performance of these three low-pass filters. (2) Adjust the Kalman filter parameters to improve the performance of the Kalman filter. The final conclusions are: (1) Through comparative analysis of the three low-pass filters, Wiener low-pass filter is the most suitable, which can better achieve the purpose of filtering out high-frequency noise. (2) By adjusting the parameters in the Kalman filter, a set of approximate optimal parameters is obtained. This set of parameters verifies that the performance of the Kalman filter can be improved.
data mining is the focus of big data applications in various fields. data pre-processing is a crucial step in the data mining process. With the development of the information society and the application of databases, ...
详细信息
ISBN:
(纸本)9781665404457
data mining is the focus of big data applications in various fields. data pre-processing is a crucial step in the data mining process. With the development of the information society and the application of databases, the educational data has seen explosive growth, and the data on poor students has become informative. However, the actual student financial aid management system collects the data on poor students which generally has problems such as missing values, attributes redundancy, and noise. To solve this problem, we proposed a novel method called DPBP to preprocess data The proposed DPBP approach consists of four stages: the preparation of data, the scoping of characteristics, the combination of characteristics, and the filtering of missing number. Firstly, we prepare the dataset by extracting data. Next, the characteristic range is limited by choosing experimental results of feature selection algorithm. Then, third stage performs feature combination to obtain the feature decomposition sets. Finally, based on accuracy and missing number, we gain the optimal dataset Series of experiments result show that our proposed method significantly improves the data quality and stability.
The Physionet/CinC 2012 challenge focused on improving patient specific mortality predictions in the intensive care unit. While most of the focus in the challenge was on applying sophisticated machine learning algorit...
详细信息
ISBN:
(纸本)9781479943463
The Physionet/CinC 2012 challenge focused on improving patient specific mortality predictions in the intensive care unit. While most of the focus in the challenge was on applying sophisticated machine learning algorithms, little attention was paid to the preprocessing performed on the data a priori. We compare four standard pre-processing methods with a novel Box-Cox outlier rejection technique and analyze their effect on machine learning classifiers for predicting the mortality of ICU patients. The best machine learning model utilized the proposed preprocessing method and achieved an AUROC of 0.848. In general, the AUROC of models using our novel preprocessing method increased, and this increase was as much as 0.02 in some cases. Furthermore, the use of preprocessing improved the performance of regression models to a higher level than that of non-linear techniques such as random forests. We demonstrate that proper preprocessing of the data prior to use in a prognostic model can significantly improve performance. This improvement can be even greater than that provided by more complex non-linear machine learning algorithms.
Learning from imbalanced data is a vital challenge for pattern classification. We often face the imbalanced data in medical decision tasks where at least one of the classes is represented by only a very small minority...
详细信息
ISBN:
(纸本)9783030419646;9783030419639
Learning from imbalanced data is a vital challenge for pattern classification. We often face the imbalanced data in medical decision tasks where at least one of the classes is represented by only a very small minority of the available data. We propose a novel framework for training base classifiers and preparing the dynamic selection dataset (dsel) to integrate data preprocessing and dynamic ensemble selection (des) methods for imbalanced data classification. des-knn algorithm has been chosen as the des method and its modifications base on oversampled training and validations sets using smote are discussed. The proposed modifications have been evaluated based on computer experiments carried out on 15 medical datasets with various imbalance ratios. The results of experiments show that the proposed framework is very useful, especially for tasks characterized by the small imbalance ratio.
Currently, the rockburst failure mode discrepancy is not taken into account in rockburst prediction, and some defects such as missing values, outlier samples and class imbalance are still present in the rockburst data...
详细信息
Currently, the rockburst failure mode discrepancy is not taken into account in rockburst prediction, and some defects such as missing values, outlier samples and class imbalance are still present in the rockburst database. To solve the above problems, a three-step rockburst prediction model based on data preprocessing combined with clustering and classification algorithms (PCC) is proposed. The first step is data preprocessing for dealing with missing values and outlier samples. In the second step, the K-means algorithm is introduced to cluster the preprocessed data into two clusters for distinguishing the rockburst failure mode discrepancy. In the third step, Random Forest (RF), Gradient Boosting Decision Tree (GBDT) and Extreme Gradient Boosting (XGB) are employed to construct classification models, respectively, after class balancing. Finally, a comparative analysis between the PCC model and the rockburst prediction model based on data preprocessing and classification algorithms (PC) is conducted, and the analysis is based on 310 samples. Furthermore, the self-proposed metric and traditional metrics are employed to evaluate the model performance. The prediction results demonstrate that the performance of the PCC model is superior to the PC model, and the rockburst failure mode discrepancy should be considered in rockburst prediction.
Learning from the non-stationary imbalanced data stream is a serious challenge to the machine learning community. There is a significant number of works addressing the issue of classifying non-stationary data stream, ...
详细信息
ISBN:
(纸本)9783030438876;9783030438869
Learning from the non-stationary imbalanced data stream is a serious challenge to the machine learning community. There is a significant number of works addressing the issue of classifying non-stationary data stream, but most of them do not take into consideration that the real-life data streams may exhibit high and changing class imbalance ratio, which may complicate the classification task. This work attempts to connect two important, yet rarely combined, research trends in data analysis, i.e., non-stationary data stream classification and imbalanced data classification. We propose a novel framework for training base classifiers and preparing the dynamic selection dataset (DSEL) to integrate data preprocessing and dynamic ensemble selection (DES) methods for imbalanced data stream classification. The proposed approach has been evaluated on the basis of computer experiments carried out on 72 artificially generated data streams with various imbalance ratios, levels of label noise and types of concept drift. In addition, we consider six variations of preprocessing methods and four DES methods. Experimentation results showed that dynamic ensemble selection, even without the use of any data preprocessing, can outperform a naive combination of the whole pool generated with the use of preprocessing methods. Combining DES with preprocessing further improves the obtained results.
暂无评论