版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Computer Science and Engineering The Chinese University of Hong Kong Hong Kong Research Institute of Intelligent Complex Systems Fudan University Shanghai China Shanghai AI Laboratory Shanghai China Harbin Institute of Technology China University of Electronic Science and Technology of China China Institute for Medical Engineering and Science Massachusetts Institute of Technology CambridgeMA United States Wyss Institute for Biologically Inspired Engineering Harvard University BostonMA United States Broad Institute of MIT and Harvard CambridgeMA United States Zelixir Biotech Shanghai China The CUHK Shenzhen Research Institute Hi-Tech Park Nanshan Shenzhen China
出 版 物:《arXiv》 (arXiv)
年 卷 期:2022年
核心收录:
主 题:RNA
摘 要:Non-coding RNA structure and function are essential to understanding various biological processes, such as cell signaling, gene expression, and post-transcriptional regulations. These are all among the core problems in the RNA field. With the rapid growth of sequencing technology, we have accumulated a massive amount of unannotated RNA sequences. On the other hand, expensive experimental observatory results in only limited numbers of annotated data and 3D structures. Hence, it is still challenging to design computational methods for predicting their structures and functions. The lack of annotated data and systematic study causes inferior performance. To resolve the issue, we propose a novel RNA foundation model (RNA-FM) to take advantage of all the 23 million non-coding RNA sequences through self-supervised learning. Within this approach, we discover that the pre-trained RNA-FM could infer sequential and evolutionary information of non-coding RNAs without using any labels. Furthermore, we demonstrate RNA-FM s effectiveness by applying it to the downstream secondary/3D structure prediction, SARS-CoV-2 genome structure and evolution prediction, protein-RNA binding preference modeling, and gene expression regulation modeling. The comprehensive experiments show that the proposed method improves the RNA structural and functional modelling results significantly and consistently. Despite only being trained with unlabelled data, RNA-FM can serve as the foundational model for the field. © 2022, CC BY-NC-ND.