We propose using the Sequence Classification modeling, shap algorithm and masked-language modeling (MLM) for the task of text style transfer. To tackle cases when no parallel source-target pairs are available, we trai...
详细信息
ISBN:
(纸本)9781665463539
We propose using the Sequence Classification modeling, shap algorithm and masked-language modeling (MLM) for the task of text style transfer. To tackle cases when no parallel source-target pairs are available, we train Sequence Classification model based on Bert model with SST-2 task of GLUE for both source and target domain;and we use shap values, which are computed based on Sequence Classification model we gained, to detect and then delete words associated with original attributes. The deleted tokens are replaced by MLM trained with the target domain to retrieve new phrases associated with the target attributes. Based on this, we detect the part of speech (POS) of each word in the sentence in order to replace the suitable positions without much impact on the semantics. Additionally, we use GloVe to determine semantic similarity between the word generated by MLM and the original word so that we can trade off content versus attribute by using grid search to gain their weighting percentage. The experiments show that our methods improve style conversion rate by 9.7% and get a semantic similarity compared to original contents 28.2% on average higher than best previous system.
Due to environmental factors such as water transparency, subtidal seaweed beds are often challenging to observe directly via satellite. However, the presence of seaweed beds can lead to variations in the concentration...
详细信息
Due to environmental factors such as water transparency, subtidal seaweed beds are often challenging to observe directly via satellite. However, the presence of seaweed beds can lead to variations in the concentrations of total suspended matter (TSM), chlorophyll-a (Chl-a), and chromophoric dissolved organic matter (CDOM) in the surrounding waters. This study focuses on the seaweed beds around Gouqi Island, Zhejiang, integrating several months of in-situ water quality sampling data with PlanetScope satellite imagery to develop inversion models for water quality parameters using Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Support Vector Regression (SVR) algorithms. By analyzing the differences in water quality parameters between areas with seaweed beds and those without, we explored the underlying causes of these variations and proposed an indirect method for estimating the distribution range of underwater seaweed. This research not only provides a new perspective and technical approach for marine resource management but also contributes significant foundational data and scientific evidence for the conservation of coastal zone ecosystems.
This study applies machine learning (ML) methods to predict post-impact damage states of reinforced concrete (RC) bridge piers under vehicle collision. 251 datasets of various vehicle-bridge collision scenarios are sy...
详细信息
This study applies machine learning (ML) methods to predict post-impact damage states of reinforced concrete (RC) bridge piers under vehicle collision. 251 datasets of various vehicle-bridge collision scenarios are synthesized for training and testing six supervised ML models, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, eXtreme Gradient Boosting Trees (XGBoost), and Artificial Neural Network (ANN). Comparisons on confusion matrices indicate that SVM, Random Forest, XGBoost, and ANN possess superior and comparable classification capabilities. ML models also achieve a much higher level of accuracy when compared with existing empirical models in the literature. Furthermore, the shapley additive explanations (shap) algorithm is utilized to interpret and explain the prediction process of ML models. In particular, the shapley value of each feature captures its positive or negative contribution for the ML model to predict each damage state, where the most influential design variables include impact speed, truck mass, engine mass, and pier diameter. To facilitate the performance-based crashworthiness design of RC bridge piers, an endto-end interactive software is devised to automatically predict impact damage states using the top three ML models against any given design scenario. Real-time interactive illustrations are also provided to elucidate the shapley value contribution of each design parameter for the Random Forest model to reach each damage state. Finally, the final damage state is selected to have the highest likelihood of damage among the three ML model predictions.
Background and aimsSexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. R...
详细信息
Background and aimsSexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) *** study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The shap method was employed to enhance interpretability by identifying feature importance in the model's STIs risk *** CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea
暂无评论