版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:National Center for Biotechnology Information National Library of Medicine National Institutes of Health BethesdaMD United States Joint Genome Institute Lawrence Berkeley National Laboratory BerkeleyCA United States Department of Pharmacy Practice and Science R. Ken Coit College of Pharmacy University of Arizona TucsonAZ United States University of California Davis DavisCA United States Department of Computer Science University of Maryland at College Park College ParkMD United States Center for Bioinformatics and Computational Biology University of Maryland College ParkMD United States Center for Genomics and Data Science Research National Human Genome Research Institute National Institutes of Health BethesdaMD United States African Society for Bioinformatics and Computational Biology Cape Town South Africa Department of Earth Environmental and Planetary Sciences Brown University ProvidenceRI United States Ocean Genomics Inc. PittsburghPA United States Florida Atlantic University Harbor Branch Oceanographic Institute Fort Pierce Florida United States Lethbridge Research and Development Center Agriculture and Agri-Food Canada Lethbridge Canada Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology CambridgeMA United States Department of Computer Science and Engineering The Pennsylvania State University University ParkPA United States Melbourne Bioinformatics University of Melbourne ParkvilleVIC Australia Applied Mathematics and Computational Research Division Lawrence Berkeley National Laboratory CA United States Department of Computer Science University of Montana MissoulaMT United States Chan Zuckerberg Initiative Redwood CityCA United States Department of Population Health and Reproduction School of Veterinary Medicine University of California Davis CA United States Department of Molecular Genetics University of Toronto ON Canada Department of Computer Science Stony Brook University NY United States Institute of Medical Microbiology and Virology University Hospital Leipzig Leipzig Germany SecureBio CambridgeMA United States Department of Computer Science Johns Hopkins University BaltimoreMD United States Genedata Inc. LexingtonMA United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2025年
主 题:Biodiversity
摘 要:The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. The National Institutes of Health’s (NIH) Sequence Read Archive (SRA), which is maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), is a rapidly growing public database that researchers use to drive scientific discovery across all domains of life. This increase in available data has great promise for pushing scientific discovery but also introduces new challenges that scientific communities need to address. As genomic datasets have grown in scale and diversity, a parade of new methods and associated software have been developed to address the challenges posed by this growth. These methodological advances are vital for maximally leveraging the power of next-generation sequencing (NGS) technologies, but it can be difficult to make sense of their performance trade-offs (especially speed and accuracy). With the goal of laying a foundation for evaluation of methods for petabyte-scale sequence search, the Department of Energy (DOE) Office of Biological and Environmental Research (BER), the NIH Office of Data Science Strategy (ODSS), and NCBI held a virtual codeathon ‘Petabyte Scale Sequence Search: Metagenomics Benchmarking Codeathon’ on September 27 - Oct 1 2021, to evaluate emerging solutions in petabyte scale sequence search. The codeathon attracted experts from national laboratories including the Lawrence Berkeley National Laboratory (LBNL), research institutions including the Joint Genome Institute (JGI), and universities across the world to (a) develop benchmarking approaches to address challenges in conducting large-scale analyses of metagenomic data (which comprises approximately 20% of SRA), (b) identify potential applications (i.e. use-cases) that benefit from SRA-wide searches and the tools required to execute the search, and (c) produce commun