检索结果-内蒙古大学图书馆

Fast and Effective Clustering Method for Ancestry Estimation

Procedia computer science 2019年 157卷 306-312页

作者： Arif Budiarto Bharuno Mahesworo James Baurley Teddy Suparyanto Bens Pardamean Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 BioRealm LLC USA Computer Science Department BINUS Graduate Program - Master of Computer Science Bina Nusantara University Jakarta 11480 Indonesia

Ancestry estimation which provides family history information is one of the most popular services in direct-to-consumer genomic testing. It is also an important task which aimed to reduce the confounding by ancestry on the relationship of genotypes and disease risk in assocation studies. Several methods have been developed to generate the best ancestry estimated scores even though some of them are still facing inefficient computation time. In this paper, a combination method between KMeans clustering and PCA is proposed estimate ancestry estimation from SNP genotyping data. This method was compared with baseline model, called fastSTRUCTURE, in term of the quality of clustering and computation time. Public data from 1000 Genome project is used to train and evaluate the proposed model and the baseline model. The proposed model can successfully generate clusters with better accuracy than fastSTRUCTURE (91.02% over 90.39%). More importantly, it can boost the computation time until 100 times faster than fastSTRUCTURE (from 490 seconds to 4.86 seconds).

关键词： Ancestry Estimation Population Stratification Clustering bioinformatics Genomics

来源：评论

学校读者我要写书评

暂无评论

Author Correction: AIVariant: a deep learning-based somatic variant detector for highly contaminated tumor samples

引用

Experimental & molecular medicine 2025年第1期57卷 284页

作者： Hyeonseong Jeon Junhak Ahn Byunggook Na Soona Hong Lee Sael Sun Kim Sungroh Yoon Daehyun Baek Interdisciplinary Program in Bioinformatics Seoul National University Seoul Republic of Korea. Genome4me Inc. Seoul Republic of Korea. School of Biological Sciences Seoul National University Seoul Republic of Korea. Department of Electrical and Computer Engineering Seoul National University Seoul Republic of Korea. AIGENDRUG Co. Ltd. Seoul Republic of Korea. Department of Software and Computer Engineering Ajou University Suwon Republic of Korea. Department of Computer Science and Engineering Seoul National University Seoul Republic of Korea. Interdisciplinary Program in Artificial Intelligence Seoul National University Seoul Republic of Korea. Interdisciplinary Program in Bioinformatics Seoul National University Seoul Republic of Korea. baek@snu.ac.kr. Genome4me Inc. Seoul Republic of Korea. baek@snu.ac.kr. School of Biological Sciences Seoul National University Seoul Republic of Korea. baek@snu.ac.kr. Interdisciplinary Program in Artificial Intelligence Seoul National University Seoul Republic of Korea. baek@snu.ac.kr.

来源：评论

学校读者我要写书评

暂无评论

Data Engineering Pipeline to Analyse Jakarta's Air Quality during COVID-19-Caused Lockdown Periods

引用

IOP Conference Series: Earth and Environmental science 2021年第1期794卷

作者： Reza Rahutomo Bens Pardamean Information System Department School of Information Systems Bina Nusantara University Jakarta Indonesia 11480 Bioinformatics & Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Program Bina Nusantara University Jakarta Indonesia 11480

Jakarta lifted up lockdown after passing more than 50 days of large-scale social activity restriction and initiated phase opening to new normal. To analyse Jakarta's air quality after passing lockdown, a pipeline of data engineering is needed. By acquiring time series data from ***, a time-series database system is developed with Python programming language and its fundamental libraries namely Pandas, NumPy, SQLite. After PM 2.5 data are pre-processed into average per-hour and grouped by applicable periods (pre-lockdown, lockdown, and phase opening), a pattern of PM 2.5 in South Jakarta is revealed by using data visualization library Matplotlib. The apex of PM 2.5 occurs earlier during lockdown (04:00) and phase opening (02:00) rather than when it was normal or pre-lockdown (08:00) even though the nadir of PM2.5 still occurs at the same time (16:00 – 17:00).

关键词：

来源：评论

学校读者我要写书评

暂无评论

Ramadhan short-term electric load: A hybrid model of cycle spinning wavelet and group method data handling (CSW-GMDH)

IAENG International Journal of Computer Science

引用

IAENG International Journal of computer science 2019年第4期46卷 670页

作者： Caraka, Rezzy Eko Chen, Rung Ching Toharudin, Toni Pardamean, Bens Bakar, Sakhinah Abu Yasin, Hasbi College of Informatics Chaoyang University of Technology Taichung City41349 Taiwan Department of Statistics Padjadjaran University Bandung Indonesia College of Informatics Chaoyang University of Technology Taichung City41349 Indonesia Bioinformatics Data Science Research Center Bina Nusantara University Indonesia BINUS Graduate Program-Master of Computer Science Bina Nusantara University. Indonesia School of Mathematical Sciences FST The National University of Malaysia Malaysia Department of Statistics Diponegoro University Semarang Indonesia

In general, performing a nonlinearity time series analysis in the modeling of data can reach a robust and increase the quality of the results. Wavelet methods have successfully been applied in a great variety of applications for modeling also forecasting. Wavelet Transform divided into two categories. There is continuous wavelet (CWT) and a discrete wavelet transform (DWT). Cycle spinning unlike the discrete wavelet transform (DWT), is highly redundant, non-orthogonal, also defined naturally for all sample sizes. There is a Group Method of Data Handling (GMDH) algorithm, which is a multivariate analysis method can be used in modeling and identifying uncertainty on linear also nonlinearity systems. In this paper, we aim to explain the combination of A-Trous wavelet transforms applied on cycle spinning and group method of data handling (GMDH) in data of short-term electric load holy month of Ramadhan from 2014 to 2015. © 2019, International Association of Engineers.

关键词： Discrete wavelet transforms

来源：评论

学校读者我要写书评

暂无评论

Minimization-Aware Recursive K∗ (MARK∗ ): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape 1

引用

23rd International Conference on Research in Computational Molecular Biology, RECOMB 2019

作者： Jou, Jonathan D. Holt, Graham T. Lowegard, Anna U. Donald, Bruce R. Department of Computer Science Duke University DurhamNC United States Computational Biology and Bioinformatics Program Duke University DurhamNC United States Department of Biochemistry Duke University Medical Center DurhamNC United States Department of Chemistry Duke University DurhamNC United States

ISBN: (数字)9783030170837

ISBN: (纸本)9783030170820

Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE- A∗ - K∗, computes provable Ε -approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations and using the A∗ enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A ∗ - K∗ runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE- A∗ - K∗ is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive K∗. We combine these two insights into a novel algorithm, Minimization-Aware Recursive K∗ (MARK∗ ), that tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE- A∗ - K∗ vs. MARK∗ by running the BBK∗ algorithm, which provably returns sequences in order of decreasing K∗ score, using either iMinDEE- A∗ - K∗ or MARK∗ to approximate partition functions. We show on 200 design problems that MARK∗ not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs upÂ to two orders of magnitude faster. Finally, we show that MARK∗ not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, MARK∗ is the first algorithm to do so. We use MARK∗ to analyze the change in energy landscape of the bound and unbound states of the HIV-1 capsid protein C-terminal domain in complex with camelid V H H, and measure the change in conformati

关键词： Proteins

来源：评论

学校读者我要写书评

暂无评论

An Evaluation of Deep Neural Network Performance on Limited Protein Phosphorylation Site Prediction Data

引用

Procedia computer science 2019年 157卷 25-30页

作者： Favorisen Rosyking Lumbanraja Bharuno Mahesworo Tjeng Wawan Cenggoro Arif Budiarto Bens Pardamean Department of Computer Science Faculty of Mathematics and Natural Science University of Lampung Jalan Prof. Dr. Sumantri Brojonegoro No.17 35145 Bandar Lampung Indonesia Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Program Bina Nusantara University Jakarta Indonesia 11480

One of the common and important post-translational modification (PTM) types is phosphorylation. Protein phosphorylation is used to regulate various enzyme and receptor activations which include signal pathways. There have been many significant studies conducted to predict phosphorylation sites using various machine learning methods. Recently, several researchers claimed deep learning based methods as the best methods for phosphorylation sited prediction. However, the performance of these methods were backed up with the massive training data used in the researches. In this paper, we study the performance of simple deep neural network on the limited data generally used prior to deep learning employment. The result shows that a deep neural network can still achieve comparable performance in the limited data settings.

关键词： Phosphorylation Site Prediction Protein Phosphorylation Deep Learning Deep Neural Network

来源：评论

学校读者我要写书评

暂无评论

COVID-19 Testing Pipeline: Lesson Learned

引用

IOP Conference Series: Earth and Environmental science 2021年第1期794卷

作者： Dian Amirulloh Digdo Sudigyo Arif Budiarto Ika Nurlaila Erlin Listiyaningsih Andrew Simon Bens Pardamean Genetics Indonesia Jakarta Indonesia 12940 Bioinformatics & Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Information System Department BINUS Online Learning Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Program Bina Nusantara University Jakarta Indonesia 11480

The transmission of severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) in Indonesia is seen to be uncontrollably increasing that urges the government to leverage the capacity for the disease detections. Real-time polymerase chain reaction (RT-PCR), rapid test and computed tomography (CT) scan are the most common methods to determine if one has been infected regardless of whether or not the common symptoms of such Corona Virus Disease 2019 (COVID-19) surface. Among these three, RT-PCR is considered the gold standard for qualitative and quantitative assessment of SARS CoV-2 detection. The present paper aims at elaborating the framework of Roche's RT-PCR machine employed specifically for SARS CoV-2 detection performed by Genetics Indonesia which is deemed to be efficient and relatively quicker than other detection kits. RT-PCR machine detected SARS Cov-2 with RNA amplification curve equals to 10 copies RNA below the cut off value of Crossing point (Cp) positive control. Also elucidated in the paper is the implementations of EAV RNA and LightCycler® 96 RT-PCR System through which analysis time, amounts of individual required sample, as well as the reagents, can be accordingly reduced.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An Open-Source Knowledge Graph Ecosystem for the Life sciences

arXiv

引用

arXiv 2023年

作者： Callahan, Tiffany J. Tripodi, Ignacio J. Stefanski, Adrianne L. Cappelletti, Luca Taneja, Sanya B. Wyrwa, Jordan M. Casiraghi, Elena Matentzoglu, Nicolas A. Reese, Justin Silverstein, Jonathan C. Hoyt, Charles Tapley Boyce, Richard D. Malec, Scott A. Unni, Deepak R. Joachimiak, Marcin P. Robinson, Peter N. Mungall, Christopher J. Cavalleri, Emanuele Fontana, Tommaso Valentini, Giorgio Mesiti, Marco Gillenwater, Lucas A. Santangelo, Brook Vasilevsky, Nicole A. Hoehndorf, Robert Bennett, Tellen D. Ryan, Patrick B. Hripcsak, George Kahn, Michael G. Bada, Michael Baumgartner, William A. Hunter, Lawrence E. Computational Bioscience Program University of Colorado Anschutz Medical Campus AuroraCO80045 United States Department of Biomedical Informatics Columbia University Irving Medical Center New YorkNY10032 United States Computer Science Department Interdisciplinary Quantitative Biology University of Colorado Boulder BoulderCO80301 United States AnacletoLab Computer Science Department University of Milan 20122 Italy Intelligent Systems Program University of Pittsburgh PittsburghPA15260 United States Department of Physical Medicine and Rehabilitation School of Medicine University of Colorado Anschutz Medical Campus AuroraCO80045 United States Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory BerkeleyCA94720 United States Semanticly Ltd Athens Greece Department of Biomedical Informatics University of Pittsburgh School of Medicine PittsburghPA15206 United States Laboratory of Systems Pharmacology Harvard Medical School BostonMA02115 United States Division of Translational Informatics University of New Mexico School of Medicine AlbuquerqueNM87131 United States SIB Swiss Institute of Bioinformatics Basel Switzerland Berlin Institute of Health at Charité-Universitatsmedizin Berlin10117 Germany ELLIS European Laboratory for Learning and Intelligent Systems Germany Department of Biomedical Informatics University of Colorado School of Medicine AuroraCO80045 United States Data Collaboration Center Critical Path Institute 1840 E River Rd. Suite 100 TucsonAZ85718 United States Computer Electrical and Mathematical Sciences & Engineering Division Computational Bioscience Research Center King Abdullah University of Science and Technology Thuwal23955-6900 Saudi Arabia Department of Pediatrics University of Colorado School of Medicine AuroraCO80045 United States Janssen Research and Development RaritanNJ08869 United States Division of General Internal Medicine University of Colorado School of Medicine A

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability. © 2023, CC BY.

关键词： Ecosystems

来源：评论

学校读者我要写书评

暂无评论

Anomaly Detection in Climate Data Using Stacked and Densely Connected Long Short-Term Memory Model

引用

Journal of computers (Taiwan) 2020年第4期31卷 42-53页

作者： Saleh Abbas, Bahtiar Wahyono, Teguh Heryadi, Yaya Soeparno, Haryono Doctor of Computer Science Binus Graduate Program Bina Nusantara University Jakarta Indonesia Faculty of Information Technology Satya Wacana Christian University Salatiga Indonesia Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia Department of Industrial Engineering Bina Nusantara University Jakarta Indonesia

Climate anomalies are considered as an important factor closely related to many disasters causing many human losses, such as airline crash, wildfires, drought and flooding in many areas. Many researchers have projected that the rising global temperature will increase the draught, especially in the mid-latitude areas. Taking those problems into account, studies on anomaly detection in climate are crucial. While climate prediction aims to analyze and model regular pattern of climate, climate anomaly studies aim to model climate deviation from its previous general patterns. Long Short Term Memory (LSTM) is a method employed in this research because it has been proven to work effectively in several anomaly detection studies, especially for data with similar characteristics. This paper presents empirical results of using Basic LSTM, Densely Connected (DC) LSTM, and Stacked-DC LSTM models to detect temperature anomaly as a manifestation of climate change in Semarang City. The results show that Stacked DC LSTM produced higher accuracy in detecting anomaly than the other two methods. © 2020 computer Society of the Republic of China. All rights reserved.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Two-Stage Framework for Identification and Characterization of Different Antibiotic-Resistant Escherichia coli Isolates Based on Mass Spectrometry Data

引用

Microbiology spectrum 2023年第3期11卷 e0347922页

作者： Chia-Ru Chung Hsin-Yao Wang Chun-Han Yao Li-Ching Wu Jang-Jih Lu Jorng-Tzong Horng Tzong-Yi Lee Department of Computer Science and Information Engineering National Central University Taoyuan Taiwan. Department of Laboratory Medicine Chang Gung Memorial Hospital at Linkou Taoyuan Taiwan. Ph.D. Program in Biomedical Engineering Chang Gung University Taoyuan Taiwan. Department of Biomedical Sciences and Engineering National Central University Taoyuan Taiwan. College of Medicine Chang Gung University Taoyuan Taiwan. Department of Medical Biotechnology and Laboratory Science Chang Gung University Taoyuan Taiwan. Department of Bioinformatics and Medical Engineering Asia University Taichung Taiwan. Institute of Bioinformatics and Systems Biology National Yang Ming Chiao Tung University Hsinchu Taiwan.

In clinical microbiology, matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS) is frequently employed for rapid microbial identification. However, rapid identification of antimicrobial resistance (AMR) in Escherichia coli based on a large amount of MALDI-TOF MS data has not yet been reported. This may be because building a prediction model to cover all E. coli isolates would be challenging given the high diversity of the E. coli population. This study aimed to develop a MALDI-TOF MS-based, data-driven, two-stage framework for characterizing different AMRs in E. coli. Specifically, amoxicillin (AMC), ceftazidime (CAZ), ciprofloxacin (CIP), ceftriaxone (CRO), and cefuroxime (CXM) were used. In the first stage, we split the data into two groups based on informative peaks according to the importance of the random forest. In the second stage, prediction models were constructed using four different machine learning algorithms-logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost). The findings demonstrate that XGBoost outperformed the other four machine learning models. The values of the area under the receiver operating characteristic curve were 0.62, 0.72, 0.87, 0.72, and 0.72 for AMC, CAZ, CIP, CRO, and CXM, respectively. This implies that a data-driven, two-stage framework could improve accuracy by approximately 2.8%. As a result, we developed AMR prediction models for E. coli using a data-driven two-stage framework, which is promising for assisting physicians in making decisions. Further, the analysis of informative peaks in future studies could potentially reveal new insights. Based on a large amount of matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS) clinical data, comprising 37,918 Escherichia coli isolates, a data-driven two-stage framework was established to evaluate the antimicrobial resistance of E. coli. Five antibiotics, including a

关键词： MALDI-TOF MS antimicrobial resistance cephalosporin cephalosporins fluoroquinolones machine learning matrix-assisted laser desorption ionization–time of flight mass spectrometry penicillin penicillins

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：