Population Genetic Inference From Dense Genotype Data
PI:
Andrew Clark
Co-PIs:
Rasmus Nielsen, Carlos Bustamante
Support:
current
Source:
NIH -- PI Andrew Clark
Location:
Cornell University
Duration:
05/20/04- 05/19/07
Summary:
The HapMap project will generate an unprecedented volume of human genotype data that will allow a wealth of inferences about human variation extending well beyond the original goals of the project. We propose a series of investigations that center around the following four aims. First, the HapMap SNPs did not arise from complete resequencing of the samples, but rather the data are from assays of previously discovered SNPs. This imposes an ascertainment bias that may impact many aspects of subsequent analysis, including spurious appearance of associations, underestimating of LD, and potentially serious underestimation of population structure. We plan to investigate the consequences of uncorrected ascertainment bias and the means for correcting this bias. Second, if one imagines that SNPs arise in the population with a range of selection coefficients, then properties of the distribution of selection coefficients can be estimated with increasing power as the number of SNPs increases. Given the magnitude of the HapMap project, it will provide an unprecedented ability to infer the role of natural selection in shaping human variation. Third, while the physical map of the human genome is essentially complete, the genetic map remains considerably lower in resolution because of the nature of sampling meiotic recombination events from limited pedigrees. Even so, the genetic map was greatly improved recently, opening the opportunity for a much more detailed analysis of the relation between local genetic recombination rate, physical separation of markers, linkage disequilibrium, frequency spectrum, and other population genetic attributes of SNPs. Observation of levels of LD that are not commensurate with the local rate of recombination may be a sign of recent natural selection. This investigation entails extensive simulation, and devising of MCMC approaches to parameter estimation and hypothesis testing, centering on the null hypothesis that variability among genomic regions in local population genetic attributes is explicable by the local recombination rate. Finally, the central purpose of HapMap is to provide a step toward the ultimate goal of identifying allelic variation associated with risk of complex disease. The dense LD map provided by HapMap will enable us to simulate disease associations so as to thoroughly quantify the power of whole-genome LD association inference. These tests will help identify the attributes of local LD that best predict the power of association tests, and will serve as additional guidance to identify regions requiring more dense SNP coverage.