The wide application of Internet technology and media technology produces more and more data which also leads the arrival of the era of big data. However, it is difficult to extract the needed information from the ori...
详细信息
ISBN:
(纸本)9781479972081
The wide application of Internet technology and media technology produces more and more data which also leads the arrival of the era of big data. However, it is difficult to extract the needed information from the original data directly except some special conditions. In recent years, the development of machine learning which provide a effective way to solve this problem for us. You can obtain lower rate of Miscalculate when you select a reasonable feature selection algorithm under the premise of not increasing the complexity of algorithm. At present it is divided into two categories named the Filter and Wrapper feature selection algorithm in the field of machine learning. This paper considers both the advantages and disadvantages of these two feature selection algorithm and studies the combined feature selection algorithm.
The financial fraud detection problem involves analysis of the large financial datasets. Financial statement fraud detection process is concentrated on two major aspects: first, identification of the financial variabl...
详细信息
The financial fraud detection problem involves analysis of the large financial datasets. Financial statement fraud detection process is concentrated on two major aspects: first, identification of the financial variables and ratios, also termed as features. Second, applying the data mining methods to classify the organizations into two broad categories: fraudulent and non-fraudulent organizations. If the input dataset contains large number of irrelevant and correlated features, the computational load of the machine learning technique increases and the effectiveness of the classification outcomes decreases. The featureselection process selects a subset of most significant attributes or variables that can be the representative of original data. This selected subset can help in learning the pattern in data at much less time and with accuracy, in order to produce useful information for decision-making. This article briefly states the methods applied in the prior studies for selecting the features for financial statement fraud detection. This article also presents an approach to featureselection using correlation-based filter selection methods in which featureselection is performed based on ensemble model, and tests the outcome of the approach by applying the mean ratio analysis on financial data of Indian companies.
The prediction of solar radiation data is important for countries to reduce their dependence on fossil fuels. Since the development of solar energy systems relies on an accurate prediction of solar radiation data, thi...
详细信息
The prediction of solar radiation data is important for countries to reduce their dependence on fossil fuels. Since the development of solar energy systems relies on an accurate prediction of solar radiation data, this study is conducted to predict monthly and daily solar radiation data and contribute to the development of solar energy systems. The current study develops a long short term memory (LSTM) model that can extract temporal features more efficiently than other deep learning models and predict solar radiation data. The new model is called the Read-first LSTM (RLSTM) model. The gate units of the LSTM model are independent, so they may not fully extract the features of long time series. Thus, the current study is conducted to address the limitations of the LSTM model for predicting solar data. The main innovation of this study is to develop an improved LSTM model to predict solar radiation data and establish a collaborative process between gates. While recent studies focus on optimizing LSTM parameters, the current research improves the efficiency of LSTM gates. Since there is a collaborative process between the gates of the RLSTM, correlation values , and temporal features can be captured effectively. Climate data are used to predict solar radiation in two basins of Iran country, including the Kashan Plain and the Sefidorod Basin. The Boruta-Random Forest (BRF) feature selection algorithm was used to determine the best input scenario. The RLSTM model was compared with the LSTM model, recurrent neural network (RNN), radial basis function neural network (RBFNN), and Bidirectional LSTM (BILSTM) model. The RLSTM model could successfully predict the monthly solar radiation data in the Kashan plain. The RLSTM decreased the testing mean absolute error (MAE) of the other models by 5.8%-42%, respectively. The RLSTM model also accurately predicted daily data in the Sefidrood basin. The RLSTM improved the testing index of agreement (IA) of the BILSTM, LSTM, RNN, and RB
Continuous blood pressure (BP) provides essential information for monitoring one's health condition. However, BP is currently monitored using uncomfortable cuff-based devices, which does not support continuous BP ...
详细信息
Continuous blood pressure (BP) provides essential information for monitoring one's health condition. However, BP is currently monitored using uncomfortable cuff-based devices, which does not support continuous BP monitoring. This paper aims to introduce a blood pressure monitoring algorithm based on only photoplethysmography (PPG) signals using the deep neural network (DNN). The PPG signals are obtained from 125 unique subjects with 218 records and filtered using signal processing algorithms to reduce the effects of noise, such as baseline wandering, and motion artifacts. The proposed algorithm is based on pulse wave analysis of PPG signals, extracted various domain features from PPG signals, and mapped them to BP values. Four featureselection methods are applied and yielded four feature subsets. Therefore, an ensemble featureselection technique is proposed to obtain the optimal feature set based on major voting scores from four feature subsets. DNN models, along with the ensemble featureselection technique, outperformed in estimating the systolic blood pressure (SBP) and diastolic blood pressure (DBP) compared to previously reported approaches that rely only on the PPG signal. The coefficient of determination ( R-2) and mean absolute error (MAE) of the proposed algorithm are 0.962 and 2.480 mmHg, respectively, for SBP and 0.955 and 1.499 mmHg, respectively, for DBP. The proposed approach meets the Advancement of Medical Instrumentation standard for SBP and DBP estimations. Additionally, according to the British Hypertension Society standard, the results attained Grade A for both SBP and DBP estimations. It concludes that BP can be estimated more accurately using the optimal feature set and DNN models. The proposed algorithm has the potential ability to facilitate mobile healthcare devices to monitor continuous BP.
The pH detection helps control food quality, prevent spoilage, determine storage methods, and monitor additive levels. In the previous studies, colorimetric pH detection involved manual capture of target regions and c...
详细信息
The pH detection helps control food quality, prevent spoilage, determine storage methods, and monitor additive levels. In the previous studies, colorimetric pH detection involved manual capture of target regions and classification of acid-base categories, leading to time-consuming processes. Additionally, some researchers relied solely on R*G*B* or H*S*V* to build regression models, potentially limiting their generalizability and robustness. To address the limitations, this study proposed a colorimetric method that combines pH paper, smartphone, computer vision, and machine learning for fast and precise pH detection. Advantages of the computer vision model YOLOv5 include its ability to quickly capture the target region of the pH paper and automatically categorize it as either acidic or basic. Subsequently, recursive feature elimination was applied to filter out irrelevant features from the R*G*B*, H*S*V*, L*a*b*, Gray, XR, XG, and XB. Finally, the support vector regression was used to develop the regression model for pH value prediction. YOLOv5 demonstrated exceptional performance with mean average precision of 0.995, classification accuracy of 100%, and detection time of 4.9 ms. The pH prediction model achieved a mean absolute error (MAE) of 0.023 for acidity and 0.061 for alkalinity, signifying a notable advancement compared to the MAE range of 0.03-0.46 observed in the previous studies. The proposed approach shows potential in improving the dependability and effectiveness of pH detection, specifically in resource-constrained scenarios.
INTRODUCTION: Skin cancer is an emerging disease all over the world which causes a huge mortality. To detect skin cancer at an early stage, computer aided systems is designed. The most crucial step in it is the featur...
详细信息
INTRODUCTION: Skin cancer is an emerging disease all over the world which causes a huge mortality. To detect skin cancer at an early stage, computer aided systems is designed. The most crucial step in it is the featureselection process because of its greater impact on classification performance. Various feature selection algorithms were designed previously to find the relevant features from a set of attributes. Yet, there arise challenges in selecting appropriate features from datasets related to disease prediction. OBJECTIVES: To design a hybrid feature selection algorithm for selecting relevant feature subspace from dermatology datasets. METHODS: The hybrid feature selection algorithm is designed by integrating the Latent Semantic Index (LSI) along with correlation-based featureselection (CFS). To achieve an optimal selection of feature subset, beetle swarm optimization is used. RESULTS: Statistical metrics such as accuracy, specificity, recall, F1 score and MCC are calculated. CONCLUSION: The accuracy and sensitivity value obtained is 95% and 92%.
A method for unanticipated fault diagnosis based on IGWO-iForest (Improved Grey Wolf Optimizer-Isolation Forest) is proposed to address various unpredictable problems faced by large telescopes in extreme environments....
详细信息
ISBN:
(纸本)9781510675261;9781510675254
A method for unanticipated fault diagnosis based on IGWO-iForest (Improved Grey Wolf Optimizer-Isolation Forest) is proposed to address various unpredictable problems faced by large telescopes in extreme environments. First, the random forest feature selection algorithm is used to identify the features of the original dataset and eliminate redundant features. Secondly, the differential evolution strategy is introduced into the GWO (Grey Wolf Optimizer) to improve the local search efficiency and accuracy, and the Levy flight strategy is introduced into the GWO to improve the global search ability of the algorithm. Then, the improved IGWO is used to optimize the parameters of the iForest model. Finally, the performance of the model is verified through data collected from a fault diagnosis and self-healing hardware-in-the-loop simulation platform. The experimental results show that the IGWO-iForest algorithm achieves a fault diagnosis accuracy of 99.1%, which demonstrates its higher sensitivity to a small number of unanticipated fault data compared with other anomaly detection algorithms, proving the effectiveness of this method in accurately diagnosing unanticipated faults in telescopes.
In the medical diagnosis such as WBC (white blood cell), the scattergram images show the relationships between neutrophils, eosinophils, basophils, lymphocytes, and monocytes cells in the blood. For COVID-19 detection...
详细信息
In the medical diagnosis such as WBC (white blood cell), the scattergram images show the relationships between neutrophils, eosinophils, basophils, lymphocytes, and monocytes cells in the blood. For COVID-19 detection, the distributions of these cells differ in healthy and COVID-19 patients. This study proposes a hybrid CNN model for COVID-19 detection using scatter images obtained from WBC sub (differential-DIFF) parameters instead of CT or X-Ray scans. As a data set, the scattergram images of 335 COVID-19 suspects without chronic disease, collected from the biochemistry department of Elazig Fethi Sekin City Hospital, are examined. At first, the data augmentation is performed by applying HSV(Hue, Saturation, Value) and CIE-1931(Commission Internationale de l'??clairage) conversions. Thus, three different image large sets are obtained as a result of raw, CIE-1931, and HSV conversions. Secondly, feature extraction is applied by giving these images as separate inputs to the CNN model. Finally, the ReliefF feature extraction algorithm is applied to determine the most dominant features in feature vectors and to determine the features that maximize classification accuracy. The obtaining feature vector is classified with highperformance SVM in binary classification. The overall accuracy is 95.2%, and the F1-Score is 94.1%. The results show that the method can successfully detect COVID-19 disease using scattergram images and is an alternative to CT and X-Ray scans.
Sensorimotor rhythms-based Brain-Computer Interfaces (BCIs) have successfully been employed to address upper limb motor rehabilitation after stroke. In this context, becomes crucial the choice of features that would e...
详细信息
Sensorimotor rhythms-based Brain-Computer Interfaces (BCIs) have successfully been employed to address upper limb motor rehabilitation after stroke. In this context, becomes crucial the choice of features that would enable an appropriate electroencephalographic (EEG) sensorimotor activation/engagement underlying the favourable motor recovery. Here, we present a novel feature selection algorithm (GUIDER) designed and implemented to integrate specific requirements related to neurophysiological knowledge and rehabilitative principles. The GUIDER algorithm was tested on an EEG dataset collected from 13 subacute stroke participants. The comparison between the automatic featureselection procedure by means of GUIDER algorithm and the manual featureselection executed by an expert neurophysiologist returned similar performance in terms of both featureselection and classification. Our preliminary findings suggest that the choices of experienced neurophysiologists could be reproducible by an automatic approach. The proposed automatic algorithm could be apt to support the professional end-users not expert in BCI such as therapist/clinicians and, to ultimately foster a wider employment of the BCI-based rehabilitation after stroke.
Accurate and efficient recognition of Parkinson's disease is one of the prominent issues in the field of healthcare. To address this problem, different methods have been proposed in the literature. However, existi...
详细信息
Accurate and efficient recognition of Parkinson's disease is one of the prominent issues in the field of healthcare. To address this problem, different methods have been proposed in the literature. However, existing methods are lacking in accurately recognizing the Parkinson's disease and suffer from efficiency problems. To overcome these problems faced by existing models, this paper presents a machine-learning-based model for Parkinson's disease recognition. Specifically, a hybrid feature selection algorithm has been designed by integrating the Relief and ant-colony optimization algorithms to select relevant features for training the model. Moreover, the support vector machine has been trained and tested on the selected features to achieve optimal classification accuracy. Additionally, the K-fold cross-validation technique has been employed for the optimal hyper-parameters value evaluation of the model. The experimental results on a real-world dataset, i.e., Parkinson's disease dataset is revealed that the proposed system outperforms baseline competitors by accurately recognizing the Parkinson's disease and achieving 99.50% accuracy on the selected features. Due to high performance is achieved our proposed method, we are highly recommended for the recognition of PD.
暂无评论