With the rapid growth of social media, users are getting involved in virtual socialism, generating a huge volume of textual and image contents. Considering the contents such as status updates/tweets and shared posts/r...
详细信息
With the rapid growth of social media, users are getting involved in virtual socialism, generating a huge volume of textual and image contents. Considering the contents such as status updates/tweets and shared posts/retweets, liking other posts is reflecting the online behavior of the users. Predicting personality of a user from these digital footprints has become a computationally challenging problem. In a profile-based approach, utilizing the user-generated textual contents could be useful to reflect the personality in social media. Using huge number of features of different categories, such as traditional linguistic features (character-level, word-level, structural, and so on), psycholinguistic features (emotional affects, perceptions, social relationships, and so on) or social network features (network size, betweenness, and so on) could be useful to predict personality traits from social media. According to a widely popular personality model, namely, big-five-factor model (BFFM), the five factors are openness-to-experience, conscientiousness, extraversion, agreeableness, and neuroticism. Predicting personality is redefined as predicting each of these traits separately from the extracted features. Traditionally, it takes huge number of features to get better accuracy on any prediction task although applying feature selection algorithms may improve the performance of the model. In this article, we have compared the performance of five feature selection algorithms, namely the Pearson correlation coefficient (PCC), correlation-based feature subset (CFS), information gain (IG), symmetric uncertainly (SU) evaluator, and chi-squared (CHI) method. The performance is evaluated using the classic metrics, namely, precision, recall, f-measure, and accuracy as evaluation matrices.
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest ac...
详细信息
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based featureselection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each featureselection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based featureselection algorithm, in comb
featureselection is an integral process for feature engineering prior to deep learning (DL) model development. The idea is to reduce complexities of high - dimensional data structures by keeping only relevant informa...
详细信息
ISBN:
(纸本)9781665423588
featureselection is an integral process for feature engineering prior to deep learning (DL) model development. The idea is to reduce complexities of high - dimensional data structures by keeping only relevant information in the data mining process. The critical part in developing a DL model to predict student performance is the high - dimensionality of students' profiles which results in a DL model with low performance metrics. Students' profile/data involves different aspects such as demographic information, academic records, technological resources, social attitudes, family background and/or socio - economic status. Empirically, the diversity of these data produce complexity in terms of dimension. In this paper, we compared the effectiveness of four feature selection algorithms (Information Gain Based, ReliefF, Boruta and Recursive feature Elimination) on deep learning models using an educational dataset from Portugal. The effectiveness is measured using the following model performance metrics: training accuracy, validation accuracy, testing accuracy, kappa statistic, and f - measure. Results revealed the robustness of the Boruta algorithm in dimensionality reduction as it allowed the deep learning model to achieve its highest performance metrics compared to the utilization of other feature selection algorithms.
With the rise of different types of cyber threats, an efficient intrusion detection system (IDS) becomes very crucial for the network security. In this paper, we aim to enhance the performance of the intrusion detecti...
详细信息
ISBN:
(纸本)9798350377873;9798350377866
With the rise of different types of cyber threats, an efficient intrusion detection system (IDS) becomes very crucial for the network security. In this paper, we aim to enhance the performance of the intrusion detection by involving different featureselection (FS) algorithms that identify relevant features from high-dimensional datasets, reduce complexity, and improve the model accuracy. We aspire also to prove the importance of the use of large and balanced datasets in enhancing the intrusion detection performances. To do that, we use different recent datasets taken individually or combined with each other. After the preprocessing of the datasets, we apply diverse FS algorithms, and train the machine learning models. The performance evaluation is performed using metrics like accuracy, precision, recall, and F1-score, with an emphasis on analyzing computational efficiency. The obtained results were conclusive and prove the importance of either a balanced dataset or the use of well chosen FS algorithms in improving the intrusion detection.
This paper discusses three case studies involving the use of feature subset selectionalgorithms, as well as feature ranking, based on real data on student performances in a course of a Bachelor of Computer Science pr...
详细信息
ISBN:
(纸本)9781509050475
This paper discusses three case studies involving the use of feature subset selectionalgorithms, as well as feature ranking, based on real data on student performances in a course of a Bachelor of Computer Science program. The case studies aimed at investigating, as a step prior to the use of data mining algorithms for performance prediction, the effectiveness of featureselection methods, for reducing data volume.
Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which w...
详细信息
Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, featureselection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. featureselection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR);fast correlation based featureselection (FCBF);ReliefF;t-test;and Fisher score algorithms are preferred at the featureselection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF);feed-forward neural network (FFNN);decision tree (DT);support vector machine (SVM);and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.
Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, w...
详细信息
Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the featureselection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, naive Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of featureselection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms x 5 machine learning algorithms x 15 feature set sizes x 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers. (C) 2017 Elsevier Ltd. All rights reserved.
This paper presents an efficient featureselection based on Ruzicka similarity to detect and diagnoses seizures caused by epilepsy. The proposed approach reduces the feature space while retaining the most relevant fea...
详细信息
Due to the high classification accuracy and fast computational speed offered by Deep Neural Networks (DNNs), they have been widely used for the design and development of automated Artificial Intelligence (AI) tools fo...
详细信息
Due to the high classification accuracy and fast computational speed offered by Deep Neural Networks (DNNs), they have been widely used for the design and development of automated Artificial Intelligence (AI) tools for the detection of various diseases. These tools, which are intensive computational learning models, hold tremendous significance in healthcare for identifying various diseases. The primary goal of this review is to understand the applicability and methodology for implementing DNNs, including computational costs, for the classification of distinct diseases from disparate medical imaging datasets. This study presents an extensive survey of DNNs along with their various hybridization forms. To achieve this, the research papers surveyed have been grouped into five categories: pretrained DNNs, hyperparameter-tuned optimized DNNs, hybrid DNNs and ML classifiers, hybrid models with optimization techniques, and meta-heuristics based featureselection DNNs. The major part of this review highlights the significant role of nature-inspired meta-heuristic techniques used for hyperparameter optimization or feature selection algorithms of DNNs. Besides the frameworks and computational costs, descriptions of disparate medical image datasets and image preprocessing techniques have also been discussed under each category. Furthermore, a comparative analysis for each category has been performed on the basis of different parameters, including the type and size of datasets used, image preprocessing, methodology (as per the mentioned category), and performance (in terms of classification accuracy). This study also presents a bibliometric analysis based on the publication count of various articles related to hyperparameter-tuned optimized DNNs and meta-heuristic based featureselection DNNs. This review aims to assist potential AI researchers in choosing the most sound and appropriate DNN-based techniques for disease detection and prediction, all consolidated into a one sing
Automatic modulation classification (AMC) is an essential task in intelligent receivers. AMC over multipath fading channels have two problems: The first problem is that the Higher-order moment (HOM)-based normalized c...
详细信息
Automatic modulation classification (AMC) is an essential task in intelligent receivers. AMC over multipath fading channels have two problems: The first problem is that the Higher-order moment (HOM)-based normalized channel coefficients estimator is not valid for some types of digital modulations. The second problem is about poor classification accuracy. This study addresses the aforementioned challenges through a multi-pronged approach. Firstly, it introduces a novel HOM-based normalized channel coefficient estimator applicable to a broad spectrum of digital modulation schemes. Secondly, it derives mathematical expressions for the estimated normalized HOMs and Higher-order Cumulants (HOCs) of the transmitted signal. Finally, the research employs feature selection algorithms to identify the most discriminatory estimated HOCs for Adaptive Modulation and Coding (AMC). Simulation results demonstrate that the classification accuracy using estimated Higher-order Cumulants (HOCs) for M-ary Phase Shift Keying (MPSK) and M-ary Quadrature Amplitude Modulation (MQAM) schemes shows significant improvement compared to previous studies. Perfect classification (100% accuracy) is achieved for the 3-tap multipath channel at Signal-to-Noise Ratio (SNR) values exceeding 6 dB, and for the 4-tap multipath channel at SNR values above 7 dB.
暂无评论