We study the genotype calling algorithms for the high-throughput single-nucleotide polymorphism (SNP) arrays. Building upon the novel SNP-robust multi-chip average preprocessing approach and the state-of-the-art corre...
详细信息
We study the genotype calling algorithms for the high-throughput single-nucleotide polymorphism (SNP) arrays. Building upon the novel SNP-robust multi-chip average preprocessing approach and the state-of-the-art corrected robust linear model with Mahalanobis distance (CRLMM) approach for genotypecalling, we propose a simple modification to better model and combine the information across multiple SNPs with empirical Bayes modeling, which could often significantly improve the genotypecalling of CRLMM. Through applications to the HapMap Trio data set and a non-HapMap test set of high quality SNP chips, we illustrate the competitive performance of the proposed method.
The analysis of high-throughput genotyping data in genome-wide association (GWA) studies has become a standard approach in genetic epidemiology. Data of high quality are crucial for the success of these studies. The f...
详细信息
The analysis of high-throughput genotyping data in genome-wide association (GWA) studies has become a standard approach in genetic epidemiology. Data of high quality are crucial for the success of these studies. The first step in the statistical analysis is the generation of genotypes from signal intensities, and several approaches have been proposed for obtaining as accurate genotypes as possible. For the Affymetrix Genome-Wide Human SNP Array 6.0, the genotype calling algorithms Birdseed and CRLMM are commonly used in applications. After a brief description of the statistical methods for both algorithms, their usage is described in detail. Links are provided to the software and to sample code for the installation and execution of the algorithms. Additionally, a suggestion for processing the result files is made. less
Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology ...
详细信息
Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology poses substantial computational and statistical challenges that have been addressed in the quality control, imputation, and population-based measure groups of the Genetic Analysis Workshop 16. The computational challenges pertain to efficient memory management and computational speed of the statistical procedures, and we discuss an approach for efficient SNP storage. Accuracy and computational speed is relevant for genotypecalling, and the results from a comparison of three callingalgorithms are discussed. The first statistical challenge is related to statistical quality control, and we discuss two novel quality control procedures. These low-level analyses have an effect on subsequent preparatory steps for high-level analyses, e.g., the quality of genotype imputation approaches. After the conduct of a genome-wide association study with successful replication and/or validation, measures of diagnostic accuracy, including the area under the curve, are investigated. The area under the curve can be constructed from summary data in some situations. Finally, we discuss how the population-attributable risk of a genetic variant that is only measured in a reference data set can be determined. Genet. Epidemiol. 33 (Suppl. 1):S45-S50, 2009. (C) 2009 Wiley-Liss, Inc.
暂无评论