BackgroundThe emergence of next-generation sequencing (NGS) marked a revolution in biological research, enabling comprehensive characterization of the transcriptome and detailed analysis of the epigenome landscape. Th...
详细信息
BackgroundThe emergence of next-generation sequencing (NGS) marked a revolution in biological research, enabling comprehensive characterization of the transcriptome and detailed analysis of the epigenome landscape. This technology has made it possible to detect differences across cell types, genotypes, and conditions. Advances in short-read sequencing platforms, have produced user-friendly machines that offer high throughput at a reduced cost per base. However, leveraging this data still requires bioinformatics expertise to develop and execute tailored solutions for each specific application. Democratizing access to sequence analysis tools is crucial to empower researchers from diverse fields to harness the full potential of NGS ***2, our enhanced version of UTAP published version in 2019 (Kohen et al. in BMC Bioinform 20(1):154, 2019), empowers researchers to unlock the mysteries of gene expression and epigenetic modifications with ease. This user-friendly, open-source pipeline, built by unit programmers and deep sequencing analysts, streamlines transcriptome and epigenome data analysis, handling everything from sequences to gene or peak counts and differentially expressed genes or genomic regions annotation. Results are delivered in organized folders and rich reports packed with plots, tables, and links for effortless interpretation. Since the debut of UTAP, it has been embraced by many researchers at the Weizmann Institute and over 100 citations, thus highlighting its scientific *** User-friendly Transcriptome and Epigenome Analysis Pipeline UTAP2 is available to the broader biomedical research community as an open-source installation. With a single image, it can be installed on both local servers and cloud platforms, allowing users to leverage parallel cluster resources. Once installed UTAP2 enables researchers, even those with limited bioinformatics skills to efficiently, accurately and reliably analyse transcriptome and epig
BackgroundThe explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the...
详细信息
BackgroundThe explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the amount of genomic data has surged globally. However, the cost and complexity of the computational resources required continue to be substantial barriers to leveraging big data. A promising solution to these computational challenges is cloud computing, which provides researchers with the necessary CPUs, memory, storage, and software ***, we present Closha 2.0, a cloud computing service that offers a user-friendly platform for analyzing massive genomic datasets. Closha 2.0 is designed to provide a cloud-based environment that enables all genomic researchers, including those with limited or no programming experience, to easily analyze their genomic data. The new 2.0 version of Closha has more user-friendly features than the previous 1.0 version. Firstly, the workbench features a script editor that supports Python, R, and shell script programming, enabling users to write scripts and integrate them into their pipelines. This functionality is particularly useful for downstream analysis. Second, Closha 2.0 runs on containers, which execute each tool in an independent environment. This provides a stable environment and prevents dependency issues and version conflicts among tools. Additionally, users can execute each step of a pipeline individually, allowing them to test applications at each stage and adjust parameters to achieve the desired results. We also updated a high-speed data transmission tool called GBox that facilitates the rapid transfer of large *** analysis pipelines on Closha 2.0 are reproducible, with all analysis parameters and inputs being permanently recorded. Closha 2.0 simplifies multi-step analysis with drag-and-drop functionality and provides a user-friendly interface for genomic scientists to obtain accur
BackgroundMetagenomics, the whole genome sequencing of microbial communities, has provided insight into complex ecosystems. It has facilitated the discovery of novel microorganisms, explained community interactions an...
详细信息
BackgroundMetagenomics, the whole genome sequencing of microbial communities, has provided insight into complex ecosystems. It has facilitated the discovery of novel microorganisms, explained community interactions and found applications in various fields. Advances in high-throughput and third-generation sequencing technologies have further fuelled its popularity. Nevertheless, managing the vast data produced and addressing variable dataset quality remain ongoing challenges. Another challenge arises from the number of assembly and binning strategies used across studies. Comparing datasets and analysis tools is complex as it requires the quantitative assessment of metagenome quality. The inherent limitations of metagenomic sequencing, which often involves sequencing complex communities, mean community members are challenging to interrogate with traditional culturing methods leading to many lacking reference sequences. MIMAG standards aim to provide a method to assess metagenome quality for comparison but have not been widely *** address the need for simple and quick metagenome quality assignation, here we introduce the pipeline MAGqual (Metagenome-Assembled Genome qualifier) and demonstrate its effectiveness at determining metagenomic dataset quality in the context of the MIMAG *** MAGqual pipeline offers an accessible way to evaluate metagenome quality and generate metadata on a large scale. MAGqual is built in Snakemake to ensure readability and scalability, and its open-source nature promotes accessibility, community development, and ease of updates. MAGqual is built in Snakemake, R, and Python and is available under the MIT license on GitHub at https://***/ac1513/***_9RbKd3CYdEq3zCTHKaqVideo AbstractConclusionsThe MAGqual pipeline offers an accessible way to evaluate metagenome quality and generate metadata on a large scale. MAGqual is built in Snakemake to ensure readability and scalability, and its open-source na
The Andean black cherry (P. serotina) is an underutilized fruit species that could contribute to the development of sustainable food systems in the Andean region. The species displays gametophytic self-incompatibility...
详细信息
The Andean black cherry (P. serotina) is an underutilized fruit species that could contribute to the development of sustainable food systems in the Andean region. The species displays gametophytic self-incompatibility (GSI), a mechanism controlled by the multiallelic S-locus which prevents crossbreeding between genetically related individuals and hinders breeding efforts. To design effective crosses, breeders require accurate knowledge of the S-haplotypes of parental lines. However, S-haplotype diversity is commonly evaluated using PCR-based methods that fail to accurately discriminate alleles. To address this limitation, we developed a new method to identify S-alleles in P. serotina using nanopore sequencing technology. Our method uses the Native Barcoding protocol and MinION sequencer from Oxford Nanopore Technologies to enable scalable, multiplex amplicon sequencing. For sequence analysis, we developed a bioinformatic pipeline that uses Porechop for sample demultiplexing, MeshClust for sequence alignment and clustering, and the Ugene Consensus algorithm to determine allelic variants. In this study, we evaluated the S-RNase gene of 24 P. serotina accessions using our nanopore sequencing and bioinformatic workflow. Among these accessions, we identified 12 previously reported and 6 putative new S-alleles that could not be identified with existing S-genotyping methods. Five accessions were classified as homozygous, while the other 19 were heterozygous with two or three alleles. Our results demonstrate that nanopore sequencing provides a cost-effective alternative for S-allele profiling that improves on the accuracy of existing PCR-based methods. Because of the versatility of MinION sequencing, the reported workflow can be used to characterize the diversity of other useful genes in the species, which are of relevance for conservation and breeding efforts.
The SARS-CoV-2 Variants of Concern tracking via Whole Genome Sequencing represents a pillar of public health measures for the containment of the pandemic. The ability to track down the lineage distribution on a local ...
详细信息
The SARS-CoV-2 Variants of Concern tracking via Whole Genome Sequencing represents a pillar of public health measures for the containment of the pandemic. The ability to track down the lineage distribution on a local and global scale leads to a better understanding of immune escape and to adopting interventions to contain novel outbreaks. This scenario poses a challenge for NGS laboratories worldwide that are pressed to have both a faster turnaround time and a high-throughput processing of swabs for sequencing and analysis. In this study, we present an optimization of the Illumina COVID-seq protocol carried out on thousands of SARS-CoV-2 samples at the wet and dry level. We discuss the unique challenges related to processing hundreds of swabs per week such as the tradeoff between ultra-high sensitivity and negative contamination levels, cost efficiency and bioinformatics quality metrics.(c) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).
The value of high-throughput germline genetic testing is increasingly recognized in clinical cancer care. Disease-associated germline variants in cancer patients are important for risk management and surveillance, sur...
详细信息
The value of high-throughput germline genetic testing is increasingly recognized in clinical cancer care. Disease-associated germline variants in cancer patients are important for risk management and surveillance, surgical decisions and can also have major implications for treatment strategies since many are in DNA repair genes. With the increasing availability of high-throughput DNA sequencing in cancer clinics and research, there is thus a need to provide clinically oriented sequencing reports for germline variants and their potential therapeutic relevance on a per-patient basis. To meet this need, we have developed the Cancer Predisposition Sequencing Reporter (CPSR), an open-source computational workflow that generates a structured report of germline variants identified in known cancer predisposition genes, highlighting markers of therapeutic, prognostic and diagnostic relevance. A fully automated variant classification procedure based on more than 30 refined American College of Medical Genetics and Genomics (ACMG) criteria represents an integral part of the workflow. Importantly, the set of cancer predisposition genes profiled in the report can be flexibly chosen from more than 40 virtual gene panels established by scientific experts, enabling customization of the report for different screening purposes and clinical contexts. The report can be configured to also list actionable secondary variant findings, as recommended by ACMG. CPSR demonstrates comparable sensitivity and specificity for the detection of pathogenic variants when compared to other algorithms in the field. Technically, the tool is implemented in Python/R, and is freely available through Docker technology. Source code, documentation, example reports and installation instructions are accessible via the project GitHub page: .
Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. We expect this trend...
详细信息
ISBN:
(纸本)9783030918132;9783030918149
Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. We expect this trend to continue, and it is realistic to predict amounts of COVID-19 scRNA-seq data increasing to several Petabytes in the coming years. However, thoughtful analysis of this data requires large-scale computing infrastructures, and software systems optimized for such platforms to generate biological knowledge. This paper presents CellHeap, a portable and robust workflow for scRNA-seq customizable analyses, with quality control throughout the execution steps and deployable on supercomputers. Furthermore, we present the deployment of CellHeap in the Santos Dumont supercomputer for analyzing COVID-19 scRNA-seq datasets, and discuss a case study that processed dozens of Terabytes of COVID-19 scRNA-seq raw data.
Background The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequenc...
详细信息
Background The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence. Results We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids 'black box' systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets. Conclusion This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-
RNA-seq is a sequencing technique that uses next-generation sequencing (NGS) to explore and study the entire transcriptome of a biological sample. NGS-based analyses are mostly performed via command-line interfaces, w...
详细信息
RNA-seq is a sequencing technique that uses next-generation sequencing (NGS) to explore and study the entire transcriptome of a biological sample. NGS-based analyses are mostly performed via command-line interfaces, which is an obstacle for molecular biologists and researchers. Therefore, the higher throughputs from NGS can only be accessed with the help of bioinformatics and computer science expertise. As the cost of sequencing is continuously falling, the use of RNA-seq seems certain to increase. To minimize the problems encountered by biologists and researchers in RNA-seq data analysis, we propose an automated platform with a web application that integrates various bioinformatics pipelines. The platform is intended to enable academic users to more easily analyze transcriptome datasets. Our automated Transcriptome Analysis Platform (aTAP) offers comprehensive bioinformatics workflows, including quality control of raw reads, trimming of low-quality reads, de novo tran-scriptome assembly, transcript expression quantification, differential expression analysis, and transcript annota-tion. aTAP has a user-friendly graphical interface, allowing researchers to interact with and visualize results in the web browser. This project offers an alternative way to analyze transcriptome data, by integrating efficient and well-known tools, that is simpler and more accessible to research communities. aTAP is freely available to aca-demic users at https://***/.
We describe a useful workflow for characterizing proteomics experiments incorporating many conditions and abundance data using the popular weighted gene correlation network analysis (WGCNA) approach and functional ann...
详细信息
We describe a useful workflow for characterizing proteomics experiments incorporating many conditions and abundance data using the popular weighted gene correlation network analysis (WGCNA) approach and functional annotation with the PloGO2 R package, the latter of which we have extended and made available to Bioconductor. The approach can use quantitative data from labeled or label-free experiments and was developed to handle multiple files stemming from data partition or multiple pairwise comparisons. The WGCNA approach can similarly produce a potentially large number of clusters of interest, which can also be functionally characterized using PloG2. Enrichment analysis will identify clusters or subsets of proteins of interest, and the WGCNA network topology scores will produce a ranking of proteins within these clusters or subsets. This can naturally lead to prioritized proteins to be considered for further analysis or as candidates of interest for validation in the context of complex experiments. We demonstrate the use of the package on two published data sets using two different biological systems (plant and human plasma) and proteomics platforms (sequential window acquisition of all theoretical fragment-ion spectra (SWATH) and tandem mass tag (TMT)): an analysis of the effect of drought on rice over time generated using TMT and a pediatric plasma sample data set generated using SWATH. In both, the automated workflow recapitulates key insights or observations of the published papers and provides additional suggestions for further investigation. These findings indicate that the data set analysis using WGCNA combined with the updated PloGO2 package is a powerful method to gain biological insights from complex multifaceted proteomics experiments.
暂无评论