Quantifying the effect of mutations in the BRCA1 gene is useful for understanding their clinical consequences on breast cancer. Machine learning models can be applied to predict the landscape of protein variant effect...
详细信息
Quantifying the effect of mutations in the BRCA1 gene is useful for understanding their clinical consequences on breast cancer. Machine learning models can be applied to predict the landscape of protein variant effects that might not be always accessible by experiments. In this work, we propose a simple semi-supervised learning method using a Gaussian mixture model to predict ∼90% of the unlabeled missense variants of the BRCA1 gene collected from the ClinVar database. High-quality embeddings are used as a feature of the protein sequences, extracted using the latest pre-trained transformer-based language protein model. A statistical test show that the protein embeddings are effective and robust for predicting pathogenicity. Further, the lower representations of the features are then fed into the semi-supervised model. The prediction performance of the model only for the labeled testing data achieves an AUC score and an accuracy of 79.27% and 71.58%, respectively. Using our defined pathogenic probability score, we find that ∼94% of variants in our unlabeled dataset are well-separated into either benign or pathogenic classes according to that scoring. Our scores obtain a moderate Spearman rank correlation with the results of established unsupervised variant effect models. Finally, our approach can potentially be developed for more accurate and biologically reliable predictions of the variant effects.
Typhoid fever is an endemic disease that burdens Indonesia and has a potentially fatal infection multisystem. Salmonella typhi bacterium is responsible for typhoid fever disease. Poor sanitation, crowding, and slums a...
详细信息
Typhoid fever is an endemic disease that burdens Indonesia and has a potentially fatal infection multisystem. Salmonella typhi bacterium is responsible for typhoid fever disease. Poor sanitation, crowding, and slums are the main factors of increasing typhoid fever incidences. Environmental factors directly connected to meteorological factors are the main factor in breeding the Salmonella typhi bacterium. This study aims to identify the correlation between meteorological parameters and typhoid fever disease occurrence. The study was carried out in Jakarta, Indonesia, and the Bureau of Meteorological, Climatology, and Geophysics (BMKG) provided the meteorological parameter data. In addition, the Jakarta health surveillance office provided information on typhoid fever hospitalizations from 2019 to 2021. Pearson's concept was utilized d to investigate the correlation between typhoid fever incidences and the meteorological parameters. Humidity, precipitation, and wind speed are the meteorological parameters that significantly affect in contribute to the occurrence of typhoid fever disease. These findings might be used as a reference for Indonesia's government in making public policy to prevent typhoid fever in Indonesia.
Tropical disease is one of the infectious diseases that affect Indonesia. Many people die because of tropical diseases, such as dengue hemorrhagic fever (DHF), chikungunya, leprosy, lymphatic, and filariasis. The Indo...
详细信息
multidisciplinary collaboration between public health, system engineering, and UX is able to generate a solution in healthcare problem like stunting. The principle of Agile UX gathers requirements to generate an appli...
详细信息
The launch of StuntingDB is believed to enliven the role of database management systems (DBMS) in Indonesia's stunting research. However, a novelty in stunting data management that enables parallel project activat...
详细信息
This study analyzed interactions between Twitter users in conversations regarding Indonesia's state-owned vaccine manufacturer 'Biofarma' in 2021. The primary objective of this study is to identify Key Opi...
详细信息
Air pollution is a pressing issue in cities, and managing air quality poses a challenge for urban designers and decision-makers. This study proposes a Digital Twin (DT) Smart City integrated with Mixed Reality technol...
详细信息
Air pollution is a pressing issue in cities, and managing air quality poses a challenge for urban designers and decision-makers. This study proposes a Digital Twin (DT) Smart City integrated with Mixed Reality technology to enhance visualization and collaboration for addressing urban air pollution. The research adopts an applied research approach, with a focus on developing a DT framework. A use case of DT development for Jakarta, the capital of Indonesia, is presented. By integrating air quality data, meteorological information, traffic patterns, and urban infrastructure data, the DT provides a comprehensive understanding of air pollution dynamics. The visualization capabilities of the DT, utilizing Mixed Reality technology, facilitate effective decision-making and the identification of strategies for managing air quality. However, further research is needed to address data management challenges to build a DT for Smart City at scale.
In this digital era, we are exposed to a large amount of data. This includes biological data, which stores information about living organisms, including Deoxyribonucleic acid (DNA), genes, and proteins. With the devel...
详细信息
Understanding the mechanistic interpretability of mutation effects in a protein can help predict the clinical implications of the genetic variants. Hence, computational variant effect predictions that involve protein ...
Understanding the mechanistic interpretability of mutation effects in a protein can help predict the clinical implications of the genetic variants. Hence, computational variant effect predictions that involve protein structural features of the protein mutations might be suitable in this case. In this work, we focus on BRCT domains of BRCA1 gene that is widely studied in breast cancer studies. We retrieved 88 selected missense variants found in BRCT domains annotated in both ClinVar and gnomAD databases. To computationally characterize the pathogenic property of the mutations we used two types of features extracted from protein structures: a change in free Gibbs energy and a set of features derived from molecular dynamics simulations of each mutant. Using a dimensional reduction and Gaussian mixture model (GMM)-based clustering we demonstrate that the variants are segregated into two regions that may correspond to their pathogenic status. This method can be a potential computational pipeline for providing the preliminary mechanistic interpretation of mutation effects in terms of their thermodynamic and structural features.
Lactose intolerance is a type of digestive problem that may threaten the population because milk and dairy products compose of nutrients that are essential for human body. Genetic tests possess a great potential to de...
详细信息
Lactose intolerance is a type of digestive problem that may threaten the population because milk and dairy products compose of nutrients that are essential for human body. Genetic tests possess a great potential to detect lactose intolerance as it can be used in children and even infants. However, a new approach to analyze the genetic test results is needed to elucidate the Single Nucleotide Polymorphisms (SNPs) that are related to lactose intolerance. In this work, we utilized the machine learning based feature selection to select the SNPs associated with lactose tolerance trait from genotyping samples of direct-to-customer (DTCG genetic tests, obtained from the public database. Recursive Feature Elimination (RFE) with XGBoost model was used to perform feature selection. We also compared three different models, such as XGBoost, support vector machine (SVM), and random forest (RF) for training the selected features. Our findings revealed that 20 SNPs (out of 3501) were chosen, with rs4394668 as the most important variables (F-score 0.009). Furthermore, when compared to the RF and SVM models, the XGBoost model had the highest accuracy (0.87). Further studies should be undertaken to elucidate how the selected SNPs may lead to the lactose intolerance trait.
暂无评论