检索结果-内蒙古大学图书馆

On the core segmentation algorithms of copy number variation detection tools

BRIEFINGS IN BIOINFORMATICS 2024年第2期25卷 bbae022-bbae022页

作者： Zhang, Yibo Liu, Wenyu Duan, Junbo Xi An Jiao Tong Univ Sch Life Sci & Technol Xian Shannxi Peoples R China Xi An Jiao Tong Univ Biomed Engn Xian Shannxi Peoples R China Xi An Jiao Tong Univ Dept Biomed Engn Xian Peoples R China Xi An Jiao Tong Univ Sch Life Sci & Technol 28 Xianning West Rd Xian 710049 Peoples R China

Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.

关键词： copy number variants next generation sequencing circular binary segmentation hidden Markov model

来源：评论

学校读者我要写书评

暂无评论

A model-based circular binary segmentation algorithm for the analysis of array CGH data

引用

BMC Research Notes 2011年第1期4卷 394-394页

作者： Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong Graduate Institute of Biomedical Electronics and Bioinformatics Department of Electrical Engineering National Taiwan University Taipei 106 Taiwan Greehey Children's Cancer Research Institute University of Texas Health Science Center at San Antonio San Antonio TX 78229 United States Department of Epidemiology and Biostatistics University of Texas Health Science Center at San Antonio San Antonio TX 78229 United States Institute of Biotechnology Center for Systems Biology and Bioinformatics National Taiwan University Taipei 106 Taiwan Graduate Institute of Physiology National Taiwan University Taipei 100 Taiwan Cathy General Hospital Taipei 106 Taiwan

Background: circular binary segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test;but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. Results: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. Conclusions: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://***/eCBSsupplementary. © 2011 Hsu et al;licensee BioMed Central Ltd.

关键词： Generalize Extreme Value Generalize Extreme Value Distribution Extreme Value Theory eXtreme Model circular binary segmentation

来源：评论

学校读者我要写书评

暂无评论

A semiparametric Bayesian model for comparing DNA copy numbers

引用

BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS 2016年第3期30卷 345-365页

作者： Nieto-Barajas, Luis Ji, Yuan Baladandayuthapani, Veerabhadran ITAM Dept Stat Rio Hondo 1 Mexico City 01080 DF Mexico NorthShore Univ HealthSyst Biomed Informat 1001 Univ Pl Evanston IL 60201 USA Univ Chicago Chicago IL 60637 USA Univ Texas MD Anderson Canc Ctr Dept Biostat 1515 Holcombe Blvd Houston TX 77030 USA

We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis of multiple samples from patients with different subtypes of a disease. While the biological interest is to identify regions of differential copy numbers across disease subtypes, our model also includes sample-specific random effects that account for copy number alterations between different samples in the same disease subtype. We model the subtype and sample-specific effects using a random effects mixture model. The subtype's main effects are characterized by a mixture distribution whose components are assigned Dirichlet process priors. The performance of the proposed model is examined using simulated data as well as a breast cancer genomic data set.

关键词： Bayesian nonparametrics bivariate spike and slab prior circular binary segmentation comparative genomic hybridization Dirichlet process mixture model random effects

来源：评论

学校读者我要写书评

暂无评论

Concurrent detection of targeted copy number variants and mutations using a myeloid malignancy next generation sequencing panel allows comprehensive genetic analysis using a single testing strategy

引用

BRITISH JOURNAL OF HAEMATOLOGY 2016年第1期173卷 49-58页

作者： Shen, Wei Szankasi, Philippe Sederberg, Maria Schumacher, Jonathan Frizzell, Kimberly A. Gee, Elaine P. Patel, Jay L. South, Sarah T. Xu, Xinjie Kelley, Todd W. Univ Utah Sch Med ARUP Labs Salt Lake City UT 84112 USA Univ Utah Sch Med Dept Pathol Salt Lake City UT 84112 USA

Currently, comprehensive genetic testing of myeloid malignancies requires multiple testing strategies with high costs. Somatic mutations can be detected by next generation sequencing (NGS) but copy number variants (CNVs) require cytogenetic methods including karyotyping, fluorescence insitu hybidization and microarray. Here, we evaluated a new method for CNV detection using read depth data derived from a targeted NGS mutation panel. In a cohort of 270 samples, we detected pathogenic mutations in 208 samples and targeted CNVs in 68 cases. The most frequent CNVs were 7q deletion including LUC7L2 and EZH2, TP53 deletion, ETV6 deletion, gain of RAD21 on 8q, and 5q deletion, including NSD1 and NPM1. We were also able to detect exon-level duplications, including so-called KMT2A (MLL) partial tandem duplication, in 9 cases. In the 63 cases that were negative for mutations, targeted CNVs were observed in 4 cases. Targeted CNV detection by NGS had very high concordance with single nucleotide polymorphism microarray, the current gold standard. We found that ETV6 deletion was strongly associated with TP53 alterations and 7q deletion was associated with mutations in TP53, KRAS and IDH1. This proof-of-concept study demonstrates the feasibility of using the same NGS data to simultaneously detect both somatic mutations and targeted CNVs.

关键词： next generation sequencing copy number variant molecular diagnostics myeloid malignancies circular binary segmentation

来源：评论

学校读者我要写书评

暂无评论

Using high-density DNA methylation arrays to profile copy number alterations

引用

GENOME BIOLOGY 2014年第2期15卷 R30-R30页

作者： Feber, Andrew Guilhamon, Paul Lechner, Matthias Fenton, Tim Wilson, Gareth A. Thirlwell, Christina Morris, Tiffany J. Flanagan, Adrienne M. Teschendorff, Andrew E. Kelly, John D. Beck, Stephan UCL UCL Canc Inst London WC1E 6BT England Royal Natl Orthopaed Hosp Stanmore HA7 4LP Middx England UCL UCL Med Sch Div Surg & Intervent Sci London WC1E 6BT England

The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment.

关键词： Copy Number Alteration Copy Number State Copy Number Profile circular binary segmentation Infinium HumanMethylation450 BeadChips

来源：评论

学校读者我要写书评

暂无评论

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

引用

GENOME BIOLOGY 2013年第10期14卷 1-18页

作者： Magi, Alberto Tattini, Lorenzo Cifola, Ingrid D'Aurizio, Romina Benelli, Matteo Mangano, Eleonora Battaglia, Cristina Bonora, Elena Kurg, Ants Seri, Marco Magini, Pamela Giusti, Betti Romeo, Giovanni Pippucci, Tommaso De Bellis, Gianluca Abbate, Rosanna Gensini, Gian Franco Univ Florence Dept Clin & Expt Med Florence Italy G Gaslini Inst Children Mol Genet Lab Genoa Italy CNR Inst Biomed Technol Milan Italy CNR Inst Informat & Telemat LISM Pisa Italy CNR Inst Clin Physiol Pisa Italy Careggi Hosp Diagnost Genet Unit Florence Italy Univ Milan Dipartimento Biotecnol Med & Med Traslaz BIOMETRA Milan Italy Univ Bologna Dept Med & Surg Sci Med Genet Unit Bologna Italy Univ Tartu Inst Mol & Cell Biol EE-50090 Tartu Estonia

We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://***/projects/excavatortool/.

关键词： Singular Value Decomposition Copy Number State Computational Pipeline circular binary segmentation Metastatic Melanoma Cell Line

来源：评论

学校读者我要写书评

暂无评论

Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

引用

BMC BIOINFORMATICS 2013年第Sup11期14卷 S1-S1页

作者： Zhao, Min Wang, Qingguo Wang, Quan Jia, Peilin Zhao, Zhongming Vanderbilt Univ Sch Med Dept Biomed Informat Nashville TN 37232 USA Vanderbilt Univ Sch Med Dept Canc Biol Nashville TN 37232 USA Vanderbilt Univ Sch Med Dept Psychiat Nashville TN 37232 USA

Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.

关键词： Copy Number Variation Whole Exome Sequencing Whole Genome Sequencing Data Copy Number Variation Region circular binary segmentation

来源：评论

学校读者我要写书评

暂无评论

Simple binary segmentation frameworks for identifying variation in DNA copy number

引用

BMC BIOINFORMATICS 2012年第1期13卷 1-13页

作者： Yang, Tae Young Myongji Univ Dept Math Yongin 449728 Kyonggi South Korea

Background: Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nested hypothesis tests, each using the Bayesian information criterion. Results: Our procedure is convenient for analyzing DNA copy number in two general situations: (1) when using data from multiple sources and (2) when using cohort analysis of multiple patients suffering from the same type of cancer. In the first case, data from multiple sources such as different platforms, labs, or preprocessing methods are used to study variation in copy number in the same individual. Combining these sources provides a higher resolution, which leads to a more detailed genome-wide survey of the individual. In this case, we provide a simple statistical framework to derive a consensus molecular signature. In the framework, the multiple sequences from various sources are integrated into a single sequence, and then the proposed segmentation procedure is applied to this sequence to detect aberrant regions. In the second case, cohort analysis of multiple patients is carried out to derive overall molecular signatures for the cohort. For this case, we provide another simple statistical framework in which data across multiple profiles is standardized before segmentation. The proposed segmentation procedure is then applied to the standardized profiles one at a time to detect aberrant regions. Any such regions that are common across two or more profiles are probably real and may play important roles in the cancer pathogenesis process. Conclusions: The main advantages of the proposed procedure are flexibility and simplicity.

关键词： Bayesian information criterion circular binary segmentation Consensus molecular signature Overall molecular signature Variation in DNA copy number

来源：评论

学校读者我要写书评

暂无评论

Comparative exome sequencing of metastatic lesions provides insights into the mutational progression of melanoma

引用

BMC GENOMICS 2012年第1期13卷 505-505页

作者： Gartner, Jared J. Davis, Sean Wei, Xiaomu Lin, Jimmy C. Trivedi, Niraj S. Teer, Jamie K. Meltzer, Paul S. Rosenberg, Steven A. Samuels, Yardena NHGRI Canc Genet Branch NIH Bethesda MD 20892 USA NCI Genet Branch NIH Bethesda MD 20892 USA Washington Univ Sch Med Dept Pathol & Immunol Dis Lab & Genom Med St Louis MO USA NHGRI Genet Dis Res Branch NIH Bethesda MD 20892 USA NHGRI NIH Intramural Sequencing Ctr NIH Bethesda MD 20892 USA NCI Surg Branch NIH Bethesda MD 20892 USA NHGRI Genome Technol Branch NIH Bethesda MD 20892 USA

Background: Metastasis is characterized by spreading of neoplastic cells to an organ other than where they originated and is the predominant cause of death among cancer patients. This holds true for melanoma, whose incidence is increasing more rapidly than any other cancer and once disseminated has few therapeutic options. Here we performed whole exome sequencing of two sets of matched normal and metastatic tumor DNAs. Results: Using stringent criteria, we evaluated the similarities and differences between the lesions. We find that in both cases, 96% of the single nucleotide variants are shared between the two metastases indicating that clonal populations gave rise to the distant metastases. Analysis of copy number variation patterns of both metastatic sets revealed a trend similar to that seen with our single nucleotide variants. Analysis of pathway enrichment on tumor sets shows commonly mutated pathways enriched between individual sets of metastases and all metastases combined. Conclusions: These data provide a proof-of-concept suggesting that individual metastases may have sufficient similarity for successful targeting of driver mutations.

关键词： Melanoma Copy Number Variant Exome Sequencing Matched Normal Tissue circular binary segmentation

来源：评论

学校读者我要写书评

暂无评论

Computational Analysis of Genome-Wide DNA Copy Number Changes

Computational Analysis of Genome-Wide DNA Copy Number Change...

引用

作者： Lei Song Virginia Polytechnic Institute and State University

学位级别：硕士

DNA copy number change is an important form of structural variation in human genome. Somatic copy number alterations (CNAs) can cause over expression of oncogenes and loss of tumor suppressor genes in tumorigenesis. Recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale, with high resolution. Quantitative analysis of somatic CNAs on genes has found broad applications in cancer research. Most tumors exhibit genomic instability at chromosome scale as a result of dynamically accumulated genomic mutations during the course of tumor progression. Such higher level cancer genomic characteristics cannot be effectively captured by the analysis of individual genes. We introduced two definitions of chromosome instability (CIN) index to mathematically and quantitatively characterize genome-wide genomic instability. The proposed CIN indices are derived from detected CNAs using circular binary segmentation and wavelet transform, which calculates a score based on both the amplitude and frequency of the copy number changes. We generated CIN indices on ovarian cancer subtypes' copy number data and used them as features to train a SVM classifier. The experimental results show promising and high classification accuracy estimated through cross-validations. Additional survival analysis is constructed on the extracted CIN scores from TCGA ovarian cancer dataset and showed considerable correlation between CIN scores and various events and severity in ovarian cancer development. Currently our methods have been integrated into G-DOC. We expect these newly defined CINs to be predictors in tumors subtype diagnosis and to be a useful tool in cancer research.

关键词： DNA Copy Number Changes circular binary segmentation Haar Wavelet Transform Chromosome Instability Georgetown Database of Cancer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：