Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to pe...
详细信息
Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.
Background: circular binary segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal...
详细信息
We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis o...
详细信息
We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis of multiple samples from patients with different subtypes of a disease. While the biological interest is to identify regions of differential copy numbers across disease subtypes, our model also includes sample-specific random effects that account for copy number alterations between different samples in the same disease subtype. We model the subtype and sample-specific effects using a random effects mixture model. The subtype's main effects are characterized by a mixture distribution whose components are assigned Dirichlet process priors. The performance of the proposed model is examined using simulated data as well as a breast cancer genomic data set.
Currently, comprehensive genetic testing of myeloid malignancies requires multiple testing strategies with high costs. Somatic mutations can be detected by next generation sequencing (NGS) but copy number variants (CN...
详细信息
Currently, comprehensive genetic testing of myeloid malignancies requires multiple testing strategies with high costs. Somatic mutations can be detected by next generation sequencing (NGS) but copy number variants (CNVs) require cytogenetic methods including karyotyping, fluorescence insitu hybidization and microarray. Here, we evaluated a new method for CNV detection using read depth data derived from a targeted NGS mutation panel. In a cohort of 270 samples, we detected pathogenic mutations in 208 samples and targeted CNVs in 68 cases. The most frequent CNVs were 7q deletion including LUC7L2 and EZH2, TP53 deletion, ETV6 deletion, gain of RAD21 on 8q, and 5q deletion, including NSD1 and NPM1. We were also able to detect exon-level duplications, including so-called KMT2A (MLL) partial tandem duplication, in 9 cases. In the 63 cases that were negative for mutations, targeted CNVs were observed in 4 cases. Targeted CNV detection by NGS had very high concordance with single nucleotide polymorphism microarray, the current gold standard. We found that ETV6 deletion was strongly associated with TP53 alterations and 7q deletion was associated with mutations in TP53, KRAS and IDH1. This proof-of-concept study demonstrates the feasibility of using the same NGS data to simultaneously detect both somatic mutations and targeted CNVs.
The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy nu...
详细信息
The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment.
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidd...
详细信息
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://***/projects/excavatortool/.
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH)...
详细信息
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
Background: Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locat...
详细信息
Background: Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nested hypothesis tests, each using the Bayesian information criterion. Results: Our procedure is convenient for analyzing DNA copy number in two general situations: (1) when using data from multiple sources and (2) when using cohort analysis of multiple patients suffering from the same type of cancer. In the first case, data from multiple sources such as different platforms, labs, or preprocessing methods are used to study variation in copy number in the same individual. Combining these sources provides a higher resolution, which leads to a more detailed genome-wide survey of the individual. In this case, we provide a simple statistical framework to derive a consensus molecular signature. In the framework, the multiple sequences from various sources are integrated into a single sequence, and then the proposed segmentation procedure is applied to this sequence to detect aberrant regions. In the second case, cohort analysis of multiple patients is carried out to derive overall molecular signatures for the cohort. For this case, we provide another simple statistical framework in which data across multiple profiles is standardized before segmentation. The proposed segmentation procedure is then applied to the standardized profiles one at a time to detect aberrant regions. Any such regions that are common across two or more profiles are probably real and may play important roles in the cancer pathogenesis process. Conclusions: The main advantages of the proposed procedure are flexibility and simplicity.
Background: Metastasis is characterized by spreading of neoplastic cells to an organ other than where they originated and is the predominant cause of death among cancer patients. This holds true for melanoma, whose in...
详细信息
Background: Metastasis is characterized by spreading of neoplastic cells to an organ other than where they originated and is the predominant cause of death among cancer patients. This holds true for melanoma, whose incidence is increasing more rapidly than any other cancer and once disseminated has few therapeutic options. Here we performed whole exome sequencing of two sets of matched normal and metastatic tumor DNAs. Results: Using stringent criteria, we evaluated the similarities and differences between the lesions. We find that in both cases, 96% of the single nucleotide variants are shared between the two metastases indicating that clonal populations gave rise to the distant metastases. Analysis of copy number variation patterns of both metastatic sets revealed a trend similar to that seen with our single nucleotide variants. Analysis of pathway enrichment on tumor sets shows commonly mutated pathways enriched between individual sets of metastases and all metastases combined. Conclusions: These data provide a proof-of-concept suggesting that individual metastases may have sufficient similarity for successful targeting of driver mutations.
DNA copy number change is an important form of structural variation in human genome. Somatic copy number alterations (CNAs) can cause over expression of oncogenes and loss of tumor suppressor genes in tumorigenesis. R...
详细信息
DNA copy number change is an important form of structural variation in human genome. Somatic copy number alterations (CNAs) can cause over expression of oncogenes and loss of tumor suppressor genes in tumorigenesis. Recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale, with high resolution. Quantitative analysis of somatic CNAs on genes has found broad applications in cancer research. Most tumors exhibit genomic instability at chromosome scale as a result of dynamically accumulated genomic mutations during the course of tumor progression. Such higher level cancer genomic characteristics cannot be effectively captured by the analysis of individual genes. We introduced two definitions of chromosome instability (CIN) index to mathematically and quantitatively characterize genome-wide genomic instability. The proposed CIN indices are derived from detected CNAs using circular binary segmentation and wavelet transform, which calculates a score based on both the amplitude and frequency of the copy number changes. We generated CIN indices on ovarian cancer subtypes' copy number data and used them as features to train a SVM classifier. The experimental results show promising and high classification accuracy estimated through cross-validations. Additional survival analysis is constructed on the extracted CIN scores from TCGA ovarian cancer dataset and showed considerable correlation between CIN scores and various events and severity in ovarian cancer development. Currently our methods have been integrated into G-DOC. We expect these newly defined CINs to be predictors in tumors subtype diagnosis and to be a useful tool in cancer research.
暂无评论