Sentiment Analysis (SA) effectively examines big data, such as customer reviews, market research, social media posts, online discussions, and customer feedback evaluation. Arabic Language is a complex and rich languag...
详细信息
Sentiment Analysis (SA) effectively examines big data, such as customer reviews, market research, social media posts, online discussions, and customer feedback evaluation. Arabic Language is a complex and rich language. The main reason for the need to enhance Arabic resources is the existence of numerous dialects alongside the standard version (MSA). This study investigates the impact of stemming and lemmatization methods on Arabic sentiment analysis (ASA) using Machine Learning techniques, specifically the LightGBM classifier. It also employs metaheuristic feature selection algorithms like particle swarm optimization, dragonfly optimization, grey wolf optimization, harris hawks optimizer, and a genetic optimization algorithm to identify the most relevant features to improve LightGBM's model performance. It also employs the optuna hyperparameter optimization framework to determine the optimal set of hyperparameter values to enhance LightGBM model performance. It also underscores the importance of preprocessing strategies in ASA and highlights the effectiveness of metaheuristic approaches and optuna hyperparameter optimization in improving LightGBM model performance in ASA. It also applies different stemming and lemmatization methods, Metaheuristic Feature Selection algorithms, and the optuna hyperparameter optimization on eleven datasets with different Arabic dialects. The findings indicate that metaheuristics feature selection with the LightGBM classifier, using suitable stemming and lemmatization or combining them, enhances LightGBM's accuracy by between 0 and 8%. Still, optuna hyperparameter optimization with the LightGBM classifier, using suitable stemming and lemmatization or combining them, depending on data characteristics, improves LightGBM's accuracy by between 2 and 11%. It achieves superior results than metaheuristics feature selection in more than 90% of cases. This study is of significant importance in the field of ASA, providing valuable insights and d
PurposeWhile the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardiz...
详细信息
PurposeWhile the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardizes the interests of investors. Therefore, it is important to understand how to accurately and reasonably predict the financial distress of ***/methodology/approachIn the present study, ensemble feature selection (EFS) and improved stacking were used for financial distress prediction (FDP). Mutual information, analysis of variance (ANOVA), random forest (RF), genetic algorithms, and recursive feature elimination (RFE) were chosen for EFS to select features. Since there may be missing information when feeding the results of the base learner directly into the meta-learner, the features with high importance were fed into the meta-learner together. A screening layer was added to select the meta-learner with better performance. Finally, Optima hyperparameters were used for parameter tuning by the *** empirical study was conducted with a sample of A-share listed companies in China. The F1-score of the model constructed using the features screened by EFS reached 84.55%, representing an improvement of 4.37% compared to the original features. To verify the effectiveness of improved stacking, benchmark model comparison experiments were conducted. Compared to the original stacking model, the accuracy of the improved stacking model was improved by 0.44%, and the F1-score was improved by 0.51%. In addition, the improved stacking model had the highest area under the curve (AUC) value (0.905) among all the compared ***/valueCompared to previous models, the proposed FDP model has better performance, thus bridging the research gap of feature selection. The present study provides new ideas for stacking improvement research and a reference for subsequent research in this field.
Mangroves play a significant role in carbon sequestration and storage. Mapping mangrove species and monitoring their conditions have been a crucial issue for achieving sustainable development goals. Currently combing ...
详细信息
Mangroves play a significant role in carbon sequestration and storage. Mapping mangrove species and monitoring their conditions have been a crucial issue for achieving sustainable development goals. Currently combing multidimensional optical and SAR images with machine learning have become an important approach for mangrove species classification, but there are still some challenges in feature selection and hyperparameteroptimizations. In this study, we proposed a novel classification framework by combing multi-scale variable selection algorithm (MUVR) with state-of-the-art machine learning hyperparameteroptimization method (optuna) for mapping mangrove species in the Beilun Estuary and Maowei Sea nature reserves using optical and dualpolarization SAR images, and further quantified the scattering characteristics of mangrove species using SAR image time series. We found that: (1) The MUVR algorithm could determine the optimal scale features for different scenarios and mangrove species, and improve the classification performance of machine learning with an overall accuracy (OA) improvement of 12.85%;(2) The optuna-based optimal CatBoost outperforms LightGBM and NGBoost algorithms in mapping mangrove species, which achieved the highest OA (93.18%). This study demonstrated that LightGBM was suitable for identifying Aegiceras corniculatum, while the CatBoost algorithm was suitable for discriminating Avicennia marina, Bruguiera gymnorrhiza, Cyperus malaccensis, Kandelia candel and Sonneratia apetala;(3) SAR images and its derivatives improved identification ability of mangrove species, and collaboration of multispectral images and SAR-derived features produced the better classification;(4) From 2018 to 2020, the backscattering coefficients of mangrove species in VV and VH polarization focused on 0.053-0.327 and 0.015-0.062, respectively. The coherence coefficients of mangroves displayed a seasonal change trend with the large variations in summer and small variations in
暂无评论