A wide variety of disciplines contribute to bioinformatics research, including computerscience, biology, chemistry, mathematics, and physics. This study determines the number of research articles published on arXiv c...
详细信息
A wide variety of disciplines contribute to bioinformatics research, including computerscience, biology, chemistry, mathematics, and physics. This study determines the number of research articles published on arXiv classified as bioinformatics topics and the most frequently used bioinformatics terms using topic modeling, Latent Dirichlet Allocation (LDA). An algorithm based on LDA is used to discover topics hidden within large collections of documents through the use of statistical analysis. Our research examined 226453 articles on arXiv between January 2023 and January 2024. As a result, there are more than 10521 articles categorized into bioinformatics topics. Most commonly, 6352 documents are in the "Mathematical Physics" category. The second most popular category is "computerscience," with 2950 documents. Accordingly, the terms 'RNA,' 'sequence,' 'tree,' and 'homology' are the three most commonly used terms in bioinformatics. The study of RNA plays a vital role in molecular biology; thus, the study of RNA is prevalent in bioinformatics. Sequential data refer to the order in which nucleotides or amino acids can be found in a DNA molecule or a protein.
Segmentation is manually performed by physicians, which takes considerable time and may be subject to observers. Automating this task can increase efficiency and consistency. Existing studies on meningioma segmentatio...
详细信息
Segmentation is manually performed by physicians, which takes considerable time and may be subject to observers. Automating this task can increase efficiency and consistency. Existing studies on meningioma segmentation used data from limited study centers, indicating the need for research on multi-center data to assess generalizability. In this work, two semi-automated methods with bounding box priors, LiteMedSAM and BBU-Net, are evaluated on the brain tumor segmentation (BraTS) 2023 meningioma dataset collected from five study-centers. Preprocessing included exclusion of small tumors, z-score normalization, and extraction of slices that contain tumors, generating 25,602 2D axial magnetic resonance imaging (MRI) scans. A fine-tuning strategy is adopted for LiteMedSAM while BBU-Net is trained from scratch. The models are evaluated using a five-fold cross-validation, with data split at the case level. Results show that while U-Net models can achieve performance close to LiteMedSAM, the foundation model has overall better performance, with more than 90% in all evaluation scores.
In educational institutions, an educator is responsible for assessing the student's knowledge grasp through examination. Creating exam questions, even the low-level factoid questions, is time-consuming, especially...
详细信息
In educational institutions, an educator is responsible for assessing the student's knowledge grasp through examination. Creating exam questions, even the low-level factoid questions, is time-consuming, especially for inexperienced educators. Therefore, this study aims to create a sequence-to-sequence model using CopyNet by exploiting its copying mechanism advantage to automatically generate Bahasa Indonesia factoid questions to ease the educator's burden. Indonesian records in the TyDi QA dataset are used as the model input. GRU and Bi-GRU are employed as the CopyNet encoder, while LSTM is used as the CopyNet decoder. The model that utilizes GRU as the encoder achieves BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L scores of 0.28, 0.19, 0.14, 0.1, and 0.32, respectively. Bi-GRU utilization as the model encoder achieves BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L scores of 0.26, 0.17, 0.12, 0.09, and 0.30, respectively. Models using either encoder still achieve low scores. However, compared with the previous work, the result is still on par regarding the BLEU score. Further examination found that the generated questions do not adhere to semantic and syntactical correctness. Adding more records to the dataset and utilizing a more advanced architecture like CopyBERT are encouraged to improve the model performance in future work. Despite the result, this study has shown that CopyNet, primarily designed for text summarization or single-turn dialogue, can be tailored for factoid question generation.
An accurate predictive model of temperature and humidity plays a vital role in many industrial processes that utilize a closed space such as in agriculture and building management. With the exceptional performance of ...
详细信息
Depressive Disorders (DD) is one of the most prevalent mental disorders in the world that may lead to suicide cases. To prevent the latter, ubiquitous early detection systems may be effective. Recent studies have sinc...
详细信息
Following the evolution of technology, researchers have conducted several studies to propose reliable scholarship recommender systems. However, few have explored the data storage systems, which are highly beneficial t...
详细信息
A vast number of spatiotemporal datasets collected from a wide range of sources has motivated scientists to develop effective approaches to identify interesting patterns hidden in these datasets. In this respect, kern...
详细信息
In the field of bioinformatics, the protein Post-Translational Modification (PTM) site prediction has been widely studied and Web Information Systems (WIS) has been deployed by researchers for this task. Through a lit...
详细信息
The box office (BO) income had significantly declined up to 80% in 2020, as the COVID-19 pandemic emerged. To minimize further financial risks, multiplex (multiple cinema complexes) owners need to analyze their potent...
详细信息
The box office (BO) income had significantly declined up to 80% in 2020, as the COVID-19 pandemic emerged. To minimize further financial risks, multiplex (multiple cinema complexes) owners need to analyze their potential income for each movie, each week. Therefore, we developed a proper data mining strategy that allows multiplex owners to analyze and discover insights on how successfully produced movies could be. The methodology comprises (1) data loading and exploration, (2) data cleaning, (3) data selection, integration, and transformation using Pentaho, (4) data mining in which the results were stored in the MySQL database, and (5) pattern evaluation and presentation using Qlik Sense as the Business Intelligence (BI) dashboard. Based on our data mining methodology, we revealed that drama, comedy, action, and thriller are favorite genres. We also found that DreamWorks Animation and Pixar Animation Studios are both the most popular production houses, even Apatow Productions and Escape Artists still have the biggest revenue on average.
The projected increase in PayLater utilization reaches up to five million people by 2025. To optimize the yearly profit from their PayLater service, fintech companies must examine all possible risks before a unanimous...
详细信息
The projected increase in PayLater utilization reaches up to five million people by 2025. To optimize the yearly profit from their PayLater service, fintech companies must examine all possible risks before a unanimous decision is taken. Therefore, we proposed a unified decision framework derived from decision theory and the Monte Carlo simulation technique. Two schemes were coined: (1) a decision-making scheme, and (2) a risk simulation scheme. Throughout experiments, the framework was able to estimate several alternative decisions and their impacts, analyze the causes of failure and delays in the development of the PayLater service, and execute Monte Carlo simulations in up to 10,000 trials. Outputs of this study will benefit decision-makers in the fintech initiative before launching their PayLater products.
暂无评论