Identification of coordinate gene expression changes across phenotypes or biological conditions is the basis of the ability to decode the role of gene expression regulatory networks. Statistically, the identification ...
详细信息
Identification of coordinate gene expression changes across phenotypes or biological conditions is the basis of the ability to decode the role of gene expression regulatory networks. Statistically, the identification of these changes can be viewed as a search for groups (most typically pairs) of genes whose expression provides better phenotype discrimination when considered jointly than when considered individually. Such groups are defined as being jointly differentially expressed. In this chapter several approaches for identifying jointly differentially expressed groups of genes are reviewed of compared on a set of simulations. less
The proper analysis of high-throughput sequencing datasets of mixed microbial communities (meta-transcriptomics) is substantially more complex than for datasets composed of single organisms. Adapting commonly used RNA...
详细信息
The proper analysis of high-throughput sequencing datasets of mixed microbial communities (meta-transcriptomics) is substantially more complex than for datasets composed of single organisms. Adapting commonly used RNA-seq methods to the analysis of meta-transcriptome datasets can be misleading and not use all the available information in a consistent manner. However, meta-transcriptomic experiments can be investigated in a principled manner using Bayesian probabilistic modeling of the data at a functional level coupled with analysis under a compositional data analysis paradigm. We present a worked example for the differential functional evaluation of mixed-species microbial communities obtained from human clinical samples that were sequenced on an Illumina platform. We demonstrate methods to functionally map reads directly, conduct a compositionally appropriate exploratory data analysis, evaluate differential relative abundance, and finally identify compositionally associated (constant ratio) functions. Using these approaches we have found that meta-transcriptomic functional analyses are highly reproducible and convey significant information regarding the ecosystem. less
We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://*** ). On a compendium of single-cel...
详细信息
We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://*** ). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity. less
Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used m...
详细信息
Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used method for revealing PTM-mediated regulatory networks is their label-free quantitation (LFQ) by high-resolution mass spectrometry. The raw data resulting from such experiments are generally interpreted using specific software, such as MaxQuant, MassChroQ, or Proline for instance. They provide data matrices containing quantified intensities for each modified peptide identified. Statistical analyses are then necessary (1) to ensure that the quantified data are of good enough quality and sufficiently reproducible, (2) to highlight the modified peptides that are differentially abundant between the biological conditions under study. The objective of this chapter is therefore to provide a complete data analysis pipeline for analyzing the quantified values of modified peptides in presence of two or more biological conditions using the R software. We illustrate our pipeline starting from MaxQuant outputs dealing with the analysis of A549-ACE2 cells infected by SARS-CoV-2 at different time stamps, freely available on PRIDE (PXD020019). less
Whole-genome bisulfite sequencing (WGBS) is a popular method for characterizing cytosine methylation because it is fully quantitative and has base-pair resolution. While WGBS is prohibitively expensive for experiments...
详细信息
Whole-genome bisulfite sequencing (WGBS) is a popular method for characterizing cytosine methylation because it is fully quantitative and has base-pair resolution. While WGBS is prohibitively expensive for experiments involving many samples, low-coverage WGBS can accurately determine global methylation and erasure at similar cost to high-performance liquid chromatography (HPLC) or enzyme-linked immunosorbent assays (ELISA). Moreover, low-coverage WGBS has the capacity to distinguish between methylation in different cytosine contexts (e.g., CG, CHH, and CHG), can tolerate low-input material (<100 cells), and can detect the presence of overrepresented DNA originating from mitochondria or amplified ribosomal DNA. In addition to describing a WGBS library construction and quantitation approach, here we detail computational methods to predict the accuracy of low-coverage WGBS using empirical bootstrap samplers and theoretical estimators similar to those used in election polling. Using examples, we further demonstrate how non-independent sampling of cytosines can alter the precision of error calculation and provide methods to improve this. less
Short, interfering RNAs (siRNAs) arise from the processing of long double-stranded RNA (dsRNA) by Dicer enzymes. Dicers generate siRNA duplexes by successive hydrolysis of both strands of the dsRNA phosphodiester back...
详细信息
Short, interfering RNAs (siRNAs) arise from the processing of long double-stranded RNA (dsRNA) by Dicer enzymes. Dicers generate siRNA duplexes by successive hydrolysis of both strands of the dsRNA phosphodiester backbone at positions determined by measuring 21–24 nucleotides from an exposed dsRNA terminus. Therefore, a population of dsRNAs with precisely identical termini will produce siRNA spaced in regular, 21–24-nucleotide intervals. This chapter presents an easily customized and generally applicable strategy for identifying loci which produce the “phased” siRNAs diagnostic of such processing. Given the input of a large set of expressed small RNAs and of the corresponding genome or transcriptome from which the small RNAs are derived, the methodology produces a ranking of user-defined loci with respect to their likely production of phased siRNAs. Top ranked loci are candidates for further computational and biological analyses. less
The GlycoWorkbench software tool allows users to semiautomatically annotate glycomics MS and MS/MS spectra and MS glycoproteomics spectra. The GlycanBuilder software tool is embedded within GlycoWorkbench allowing use...
详细信息
The GlycoWorkbench software tool allows users to semiautomatically annotate glycomics MS and MS/MS spectra and MS glycoproteomics spectra. The GlycanBuilder software tool is embedded within GlycoWorkbench allowing users to draw glycan structures and export images of the drawn structures. This chapter demonstrates to users how to draw glycan structures within GlycoWorkbench using the GlycanBuilder software tool. This chapter also demonstrates how to use GlycoWorkbench to import MS and MS/MS glycomics spectra and use the cascading annotation feature to annotate both the MS and MS/MS spectra with a single command. less
Atomic-level computer simulations are a very useful tool for describing the structure and dynamics of complex biomolecules such as DNA and for providing detail at a resolution where experimental techniques cannot arri...
详细信息
Atomic-level computer simulations are a very useful tool for describing the structure and dynamics of complex biomolecules such as DNA and for providing detail at a resolution where experimental techniques cannot arrive. Molecular dynamics (MD) simulations of mechanically distorted DNA caused by agents like supercoiling and protein binding are computationally challenging due to the large size of the associated systems and timescales. However, nowadays they are achievable thanks to the efficient usage of GPU and to the improvements of continuum solvation models. This together with the concurrent improvements in the resolution of single-molecule experiments, such as atomic force microscopy (AFM), makes possible the convergence between the two. Here we present detailed protocols for doing so: for performing molecular dynamics (MD) simulations of DNA adopting complex three-dimensional arrangements and for comparing the outcome of the calculations with single-molecule experimental data with a lower resolution than atomic. less
Assays profiling nucleosome positioning and occupancy are often coupled with high-throughput sequencing, which results in generation of large data sets. These data sets require processing in specialized computational ...
详细信息
Assays profiling nucleosome positioning and occupancy are often coupled with high-throughput sequencing, which results in generation of large data sets. These data sets require processing in specialized computational pipelines to yield useful information. Here, we describe main steps of such a pipeline, and discuss bioinformatic and statistical aspects of assessing data quality, as well as data visualization and further analysis. less
Structural variants (SVs) are known to have large functional impacts on phenotypes of agricultural interest, but they have yet to be routinely used for GWAS. Apart from the difficulty in obtaining high-quality SV geno...
详细信息
Structural variants (SVs) are known to have large functional impacts on phenotypes of agricultural interest, but they have yet to be routinely used for GWAS. Apart from the difficulty in obtaining high-quality SV genotype data for large populations, one of the main hurdles to using SVs for GWAS lies in formatting of genotype data for use with popular GWAS programs. This protocol describes how typical SV genotype data can be formatted for input to three GWAS programs commonly used by the plant genetics community: TASSEL, GAPIT, and mrMLM. less
暂无评论