检索结果-内蒙古大学图书馆

International Conference on Cybernetics and Intelligent System (ICORIS)

作者： Joko Pebrianto Trinugroho Anzaludin Samsinga Perbangsa Bens Pardamean Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia Information Systems Department School of Information Systems Bina Nusantara University Jakarta Indonesia Computer Science Department BINUS Graduate Program - Master of Computer Science Bina Nusantara University Jakarta Indonesia

ISBN: (纸本)9781665453967

In this digital era, we are exposed to a large amount of data. This includes biological data, which stores information about living organisms, including Deoxyribonucleic acid (DNA), genes, and proteins. With the development of information technology and information system, most of available biological data are stored in an online public database. Many of the databases are free-access and easily used, which helps the users, especially researchers, to make use of the data. Among the known public biological databases are the University of California Santa Cruz (UCSC) Genome Browser Database and the Rat Genome Database (RGD). These two databases provide access to the biological data from different organisms. This paper aims to describe the technology of public biological databases. Also elucidated in this paper are the differences features between UCSC Genome Browser Database and the RGD. Our results showed that the UCSC contains much more biological data and features than the RGD. However, the genome browser of UCSC has a complex display, while the RGD has a simple display. Overall, both databases give the users the option to choose the most suitable database for them.

关键词： Proteins Databases Genomics DNA Rats Biology Organisms

来源：评论

学校读者我要写书评

暂无评论

consexpressionR: an R package for consensus differential gene expression analysis

arXiv

引用

arXiv 2025年

作者： Costa-Silva, Juliana Menotti, David Lopes, Fabricio M. Department of Informatics Federal University of Paraná Rua Coronel Francisco Heráclito dos Santos 100 Paraná81531-990 Brazil Department of Computer Science Bioinformatics Graduate Program Federal University of Technology - Paraná Av. Alberto Carazzai 1640 - Cornélio Procópio ParanáPostal code: 86300-000 Brazil

Motivation: Bulk RNA-Seq is a widely used method for studying gene expression across a variety of contexts. The significance of RNA-Seq studies has grown with the advent of high-throughput sequencing technologies. Computational methods have been developed for each stage of the identification of differentially expressed genes. Nevertheless, there are few studies exploring the association between different types of methods. In this study, we evaluated the impact of the association of methodologies in the results of differential expression analysis. By adopting two data sets with qPCR data (to gold-standard reference), seven methods were implemented and assessed in R packages (EBSeq, edgeR, DESeq2, limma, SAMseq, NOISeq, and Knowseq), which was performed and assessed separately and in association. The results were evaluated considering the adopted qPCR data. Results: Here, we introduce consexpressionR, an R package that automates differential expression analysis using consensus of at least seven methodologies, producing more assertive results with a significant reduction in false positives. Availability: consexpressionR is an R package available via Source code and support are available at GitHub (https://***/costasilvati/consexpressionR). © 2025, CC BY-NC-ND.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel

arXiv

引用

arXiv 2022年

作者： Jiang, Ziyang Zheng, Tongshu Liu, Yiling Carlson, David Department of Civil and Environmental Engineering Duke University United States Division of Natural and Applied Science Duke Kunshan University China Program in Computational Biology and Bioinformatics Duke University School of Medicine United States Department of Civil and Environmental Engineering Department of Biostatistics and Bioinformatics Department of Computer Science Duke University United States

It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of NNs and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). We implement this idea by combining a deep network and an efficient mapping function based on either Nyström approximation or random Fourier features, which we call Implicit Composite Kernel (ICK). We then adopt a sample-then-optimize approach to approximate the full GP posterior distribution. We demonstrate that ICK has superior performance and flexibility on both synthetic and real-world datasets including a remote sensing dataset. The ICK framework can be used to include prior information into neural networks in many applications. © 2022, CC BY.

关键词： Remote sensing

来源：评论

学校读者我要写书评

暂无评论

Bayesian multinomial logistic normal models through marginally latent matrix-T processes

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2022年第1期23卷 255-296页

作者： Justin D. Silverman Kimberly Roche Zachary C. Holmes Lawrence A. David Sayan Mukherjee College of Information Science and Technology Department of Statistics and Institute for Computational and Data Science Penn State University University Park PA Program in Computational Biology and Bioinformatics Duke University Durham NC Department of Molecular Genetics and Microbiology Duke University Durham NC Department of Molecular Genetics and Microbiology and Center for Genomic and Computational Biology Duke University Durham NC Departments of Statistical Science Mathematics Computer Science Biostatistics & Bioinformatics Duke University Durham NC

Bayesian multinomial logistic-normal (MLN) models are popular for the analysis of sequence count data (e.g., microbiome or gene expression data) due to their ability to model multivariate count data with complex covariance structure. However, existing implementations of MLN models are limited to small datasets due to the non-conjugacy of the multinomial and logistic-normal distributions. Motivated by the need to develop efficient inference for Bayesian MLN models, we develop two key ideas. First, we develop the class of Marginally Latent Matrix-T Process (Marginally LTP) models. We demonstrate that many popular MLN models, including those with latent linear, non-linear, and dynamic linear structure are special cases of this class. Second, we develop an efficient inference scheme for Marginally LTP models with specific accelerations for the MLN subclass. Through application to MLN models, we demonstrate that our inference scheme are both highly accurate and often 4-5 orders of magnitude faster than MCMC.

关键词： Bayesian statistics multivariate analysis count data microbiome gene expression

来源：评论

学校读者我要写书评

暂无评论

Ciwars: A Web Server for Antibiotic Resistance Surveillance Using Longitudinal Metagenomic Data

SSRN

引用

SSRN 2024年

作者： Emon, Muhit Islam Cheung, Yat Fei Stoll, James Rumi, Monjura Afrin Brown, Connor Choi, Joung Min Moumi, Nazifa Ahmed Ahmed, Shafayat Song, Haoqiu Sein, Justin Yao, Shunyu Khan, Ahmad Gupta, Suraj Kulkarni, Rutwik Butt, Ali Vikesland, Peter Pruden, Amy Zhang, Liqing Department of Computer Science Virginia Tech BlacksburgVA24060 United States Fralin Life Science Institute Virginia Tech BlacksburgVA24060 United States Department of Civil and Environmental Engineering Virginia Tech BlacksburgVA24060 United States The Interdisciplinary PhD Program in Genetics Bioinformatics and Computational Biology Virginia Tech BlacksburgVA24060 United States

The rise of antibiotic resistance (AR) poses substantial threats to human and animal health, food security, and economic stability. Wastewater-based surveillance (WBS) has emerged as a powerful strategy for population-level AR monitoring, providing valuable data to guide public health and policy decisions. Metagenomic sequencing is especially promising, as it can yield comprehensive profiles of antibiotic resistance genes (ARGs) and other genes relevant to AR in a single run. However, online analytical platforms to facilitate continuous AR monitoring through longitudinal metagenomic data are lacking. To address this, we introduce Cyberinfrastructure for Waterborne Antibiotic Resistance Surveillance (CIWARS), a web server configured for characterizing key AR trends from longitudinal metagenomic WBS data. CIWARS offers comprehensive ARG and taxonomic profiling, along with detection of anomalous AR indicators over time, aiding in identifying potential events of concern, such as the emergence of resistant strains or outbreaks. Through interactive temporal data visualization, CIWARS enables AR monitoring and demonstrates its potential for informing effective and timely interventions to mitigate the spread and transmission of AR. CIWARS is broadly applicable to longitudinal metagenomic data from any environment and thus could be a valuable tool to support global efforts to combat the evolution and spread of AR, while also guiding agricultural and public health efforts aimed at optimizing antibiotic use when it is needed. The web server is freely available at https://***/. © 2024, The Authors. All rights reserved.

关键词： Gene encoding

来源：评论

学校读者我要写书评

暂无评论

Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

arXiv

引用

arXiv 2025年

作者： Tang, Sophia Zhang, Yinuo Tong, Alexander Chatterjee, Pranam Department of Biomedical Engineering Duke University United States Management and Technology Program University of Pennsylvania United States Center of Computational Biology Duke-NUS Medical School Singapore Mila Quebec AI Institute Canada Université de Montréal Canada Department of Computer Science Duke University United States Department of Biostatistics and Bioinformatics Duke University United States

Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment. © 2025, CC BY-NC-ND.

关键词： Velocity distribution

来源：评论

学校读者我要写书评

暂无评论

A Convolutional Neural Network-based Ancient Sundanese Character Classifier with Data Augmentation

引用

Procedia computer science 2021年 179卷 195-201页

作者： Alam Ahmad Hidayat Kartika Purwandari Tjeng Wawan Cenggoro Bens Pardamean Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Bina Nusantara University Jakarta Indonesia 11480

With an increasing interest in the digitization effort of ancient manuscripts, ancient character recognition becomes one of the most important areas in the automated document image analysis. In this regard, we propose a Convolutional Neural Network (CNN)-based classifier to recognize the ancient Sundanese characters obtained from a digital collection of Southeast Asian palm leaf manuscripts. In this work, we utilize two different preprocessing techniques for the dataset. The first technique involves the use of geometric transformations, noise background addition, and brightness adjustment to augment the imbalanced samples to be fed into the classifier. The second technique makes use of the Otsu’s threshold method to binarize the characters and only uses the usual geometric transformations for the data augmentation. The proposed network with different data augmentation processes is trained on the training set and tested on the testing set. Image binarization from the second technique can outperform the performance of the CNN-based classifier upon the first technique by achieving a testing accuracy of 97.74%.

关键词： ancient sundanese characters convolutional neural network data augmentation document image analysis Otsu’s threshold method

来源：评论

学校读者我要写书评

暂无评论

Convolutional Neural Networks for Scops Owl Sound Classification

引用

Procedia computer science 2021年 179卷 81-87页

作者： Alam Ahmad Hidayat Tjeng Wawan Cenggoro Bens Pardamean Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Bina Nusantara University Jakarta Indonesia 11480

Adopting a deep learning model into bird sound classification tasks becomes a common practice in order to construct a robust automated bird sound detection system. In this paper, we employ a four-layer Convolutional Neural Network (CNN) formulated to classify different species of Indonesia scops owls based on their vocal sounds. Two widely used representations of an acoustic signal: log-scaled mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) are extracted from each sound file and fed into the network separately to compare the model performance with different inputs. A more complex CNN that can simultaneously process the two extracted acoustic representations is proposed to provide a direct comparison with the baseline model. The dual-input network is the well-performing model in our experiment that achieves 97.55% Mean Average Precision (MAP). Meanwhile, the baseline model achieves a MAP score of 94.36% for the mel-spectrogram input and 96.08% for the MFCC input.

关键词： acoustic features bird sound classification convolutional neural network mean average precision scops owl

来源：评论

学校读者我要写书评

暂无评论

Systematic Evaluation of Cross Population Polygenic Risk Score on Colorectal Cancer

引用

Procedia computer science 2021年 179卷 344-351页

作者： Bharuno Mahesworo Arif Budiarto Bens Pardamean Bioinformatics and Data Science Research Center Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department School of Computer Science Bina Nusantara University Jakarta Indonesia 11480 Computer Science Department BINUS Graduate Program - Master of Computer Science Bina Nusantara University Jakarta Indonesia 11480

The number of findings in cancer genomics research has grown rapidly in the last decade due to the decline in the cost of human sequencing and genotyping. However, the majority of the reported significant marker associated with cancer traits are based on European and East Asian population. Large population such as South Asian and South-East Asian population are under-represented in genomics research. In this study, we explored the possibility of computing a Polygenic Risk Score (PRS) of colorectal cancer on our test sample based on reported significant Single Nucleotide Polymorphism (SNP). The SNPs used to compute the risk score were collected from GWAS Central and GWAS Catalog. Significant SNPs from IC3 study were used as a benchmark. The result shows that calculating colorectal cancer risk score using reported significant marker from different population group is possible. The p-value of our statistic model shows significant differences between case and control group risk score.

关键词： Polygenic Risk Score Cross Population Colorectal Cancer

来源：评论

学校读者我要写书评

暂无评论

Cell2Sentence: Teaching Large Language Models the Language of Biology 41

Cell2Sentence: Teaching Large Language Models the Language o...

引用

41st International Conference on Machine Learning, ICML 2024

作者： Levine, Daniel Rizvi, Syed Asad Lévy, Sacha Pallikkavaliyaveetil, Nazreen Zhang, David Chen, Xingyu Ghadermarzi, Sina Wu, Ruiming Zheng, Zihe Vrkic, Ivan Zhong, Anna Raskin, Daphne Han, Insu de Oliveira Fonseca, Antonio Henrique Caro, Josue Ortega Karbasi, Amin Dhodapkar, Rahul M. van Dijk, David Department of Computer Science Yale University New HavenCT United States School of Engineering Applied Science University of Pennsylvania PhiladelphiaPA United States School of Computer and Communication Sciences Swiss Federal Institute of Technology Lausanne Lausanne Switzerland Department of Neuroscience Yale School of Medicine New HavenCT United States Wu Tsai Institute Yale University New HavenCT United States Google United States Yale Institute for Foundations of Data Science New HavenCT United States Yale School of Engineering and Applied Science New HavenCT United States Roski Eye Institute University of Southern California Los AngelesCA United States Yale School of Medicine New HavenCT United States Cardiovascular Research Center Yale School of Medicine New HavenCT United States Interdepartmental Program in Computational Biology & Bioinformatics Yale University New HavenCT United States

We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into"cell sentences," C2S bridges the gap between natural language processing and biology. We demonstrate cell sentences enable the finetuning of language models for diverse tasks in biology, including cell generation, complex cell-type annotation, and direct data-driven text generation. Our experiments reveal that GPT-2, when fine-tuned with C2S, can generate biologically valid cells based on cell type inputs, and accurately predict cell types from cell sentences. This illustrates that language models, through C2S finetuning, can acquire a significant understanding of single-cell biology while maintaining robust text generation capabilities. C2S offers a flexible, accessible framework to integrate natural language processing with transcriptomics, utilizing existing models and libraries for a wide range of biological applications. Copyright 2024 by the author(s)

关键词： Gene expression

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：