We propose using the Sequence Classification modeling, shap algorithm and masked-language modeling (MLM) for the task of text style transfer. To tackle cases when no parallel source-target pairs are available, we trai...
详细信息
ISBN:
(纸本)9781665463539
We propose using the Sequence Classification modeling, shap algorithm and masked-language modeling (MLM) for the task of text style transfer. To tackle cases when no parallel source-target pairs are available, we train Sequence Classification model based on Bert model with SST-2 task of GLUE for both source and target domain;and we use shap values, which are computed based on Sequence Classification model we gained, to detect and then delete words associated with original attributes. The deleted tokens are replaced by MLM trained with the target domain to retrieve new phrases associated with the target attributes. Based on this, we detect the part of speech (POS) of each word in the sentence in order to replace the suitable positions without much impact on the semantics. Additionally, we use GloVe to determine semantic similarity between the word generated by MLM and the original word so that we can trade off content versus attribute by using grid search to gain their weighting percentage. The experiments show that our methods improve style conversion rate by 9.7% and get a semantic similarity compared to original contents 28.2% on average higher than best previous system.
This study aimed to develop and validate machine learning (ML) models to predict the occurrence of delayed hyponatremia after transsphenoidal surgery for pituitary adenoma. We retrospectively collected clinical data o...
详细信息
This study aimed to develop and validate machine learning (ML) models to predict the occurrence of delayed hyponatremia after transsphenoidal surgery for pituitary adenoma. We retrospectively collected clinical data on patients with pituitary adenomas treated with transsphenoidal surgery between January 2010 and December 2020. From January 2021 to December 2022, patients with pituitary adenomas were prospectively enrolled. We trained seven ML models to predict delayed hyponatremia using the clinical variables in the training set. The final model was internally validated using a test set and a prospective dataset. The shapley Additive exPlanations (shap) algorithm was used to determine the significance of each variable in the occurrence of delayed hyponatremia. In the training dataset, the best predictive performance was observed for XGBoost (area under the ROC curve;AUC = 0.821), followed by Random Forest (AUC = 0.8), Logistic Regression (AUC = 0.793), Support Vector Machine (AUC = 0.776), na & iuml;ve Bayes (AUC = 0.774), K-Nearest Neighbors (AUC = 0.742), and Decision Tree (AUC = 0.717). The AUC of the XGBoost model for the test and prospective datasets are 0.831 and 0.785, respectively. The differences in pituitary stalk deviation angle, the "measurable pituitary stalk" length before and after surgery, and blood sodium concentration between preoperative and postoperative day 2 were important variables for predicting delayed hyponatremia as determined by the shap algorithm. The XGBoost model was best able to predict delayed hyponatremia after transsphenoidal surgery for pituitary adenomas. The differences in pituitary stalk deviation angle, pre- versus postoperative "measurable pituitary stalk" length, and pre- versus postoperative day 2 blood sodium concentrations were important variables for predicting delayed hyponatremia.
Urban heat vulnerability (UHV) caused by anthropogenic activities and climate changes has given rise to heat health issues in urban areas worldwide. Previous studies have extensively revealed a simple linear relations...
详细信息
Urban heat vulnerability (UHV) caused by anthropogenic activities and climate changes has given rise to heat health issues in urban areas worldwide. Previous studies have extensively revealed a simple linear relationship between heat vulnerability indices (HVIs) and morbidity or mortality of heat-related illnesses, but the nonlinear relationship and interactions between main HVIs have not yet been fully explored. Based on vulnerability assessment framework, this paper selected fifteen indicators from built environment, sociodemographic and socioeconomic attributes, resource accessibility and residential thermal comfort, obtained from multisource data. Through the evaluation and analysis of composite HVI and its dimensions, we found that Qingshan district and East Lake scenic area contain more high to very high heat vulnerability communities. The performances of the ordinary least squares (OLS) and gradient boosting decision trees (GBDT) were compared, and results indicate GBDT outperforms the OLS model and captures the nonlinear relationship more efficiently in study areas with higher accuracy. When analyzing HVIs' contributions and interactions with the GBDT model and the shap algorithm, nighttime light (NTL), building year (BY), PM2.5, floor area ratio (FAR), number of elderly (>= 65 years) (NE) and urban surface roughness (USR) are six key indicators of morbidity of heat-related diseases (mean shap value>2.5), and they have an evident nonlinear relationship with the threshold effect and spatially heterogeneous contributions for the morbidity variation of heat-related diseases. Our study provides insights into machine learning (ML) model for the effect of heat vulnerability on city residential health and mitigation and adaptation strategies for governments and urban planners to develop heat resilience cities.
Background and aimsSexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. R...
详细信息
Background and aimsSexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) *** study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The shap method was employed to enhance interpretability by identifying feature importance in the model's STIs risk *** CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea
This study applies machine learning (ML) methods to predict post-impact damage states of reinforced concrete (RC) bridge piers under vehicle collision. 251 datasets of various vehicle-bridge collision scenarios are sy...
详细信息
This study applies machine learning (ML) methods to predict post-impact damage states of reinforced concrete (RC) bridge piers under vehicle collision. 251 datasets of various vehicle-bridge collision scenarios are synthesized for training and testing six supervised ML models, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, eXtreme Gradient Boosting Trees (XGBoost), and Artificial Neural Network (ANN). Comparisons on confusion matrices indicate that SVM, Random Forest, XGBoost, and ANN possess superior and comparable classification capabilities. ML models also achieve a much higher level of accuracy when compared with existing empirical models in the literature. Furthermore, the shapley additive explanations (shap) algorithm is utilized to interpret and explain the prediction process of ML models. In particular, the shapley value of each feature captures its positive or negative contribution for the ML model to predict each damage state, where the most influential design variables include impact speed, truck mass, engine mass, and pier diameter. To facilitate the performance-based crashworthiness design of RC bridge piers, an endto-end interactive software is devised to automatically predict impact damage states using the top three ML models against any given design scenario. Real-time interactive illustrations are also provided to elucidate the shapley value contribution of each design parameter for the Random Forest model to reach each damage state. Finally, the final damage state is selected to have the highest likelihood of damage among the three ML model predictions.
暂无评论