|
In the near future, dozens of mammalian and Drosophila genomes will be assembled and publicly
available. Coupled with large sequencing and genotyping projects already underway to document
genomic variation within species, the next few years provide an unprecedented opportunity to study
the forces that shape genome evolution.
The intellectual merit of this proposal is the production of broadly applicable population ge-
netic methods for identifying genes and genomic regions that are involved in adaptive molecular
evolution through the comparison of within and between species genomic variation. We propose
to develop methods that serve four important purposes: (1) Classify loci (and domains within loci)
into those that evolve neutrally, those that show excess amino acid or functional non-coding varia-
tion within species, and those that show excess amino acid (or functional non-coding) differences
among species, (2) Partition the relative contributions of mutation bias, protein structure / domain
location, and physico-chemical properties of amino acids to evolutionary exchangeability for all
pairs of amino acids, (3) Use the genomic distribution of Single Nucleotide Polymorphism (SNP)
frequencies to differentiate between selective and demographic hypotheses for the evolutionary
history of a given population, (4) Estimate the genomic distribution of selective effects on non-
lethal mutations for different functional categories of mutations. For all of these tasks we will
apply advances in computational statistics and numerical analysis to create powerful, robust, and
broadly applicable population genetic methods. Extensive coalescent and forward simulations with
selection, recombination, multiple mutations at the same nucleotide sites, and context-dependent
mutation rate variation among sites will be used to test the power, robustness, and accuracy of our
methods. We will also compare the power of our approaches to those of existing methods. The
proposed methods will be applied to publicly available genomic data from human, chimpanzee,
dog, mouse, rat, and Drosophila species to identify species-specific changes that are likely to be
involved in molecular adaptation.
The broader impact of the proposed work is the production and/or refinement of two computer
packages (mkprf and prfreq) for analyzing within- and between-species genomic variation
data. The programs and source code will be distributed free of charge and web servers will be
developed for those wishing to use our computational resources to run analyses. We will also im-
plement design features aimed at making our tools accessible to the broader evolutionary genomics
community. Application of our methods to comparative genomic data, will ultimately result in the
identification of genes and genomic regions that are implicated in primate, murine, canine, and
Drosophila adaptive evolution. Our hope that these genes can be prioritized for further molecular
genetic study. The project will also create opportunities for underrepresented minority students to
become involved with genomics research through participation in the NSF-AGEP funded Central
New York to Puerto Rico-Mayaguez (CNY-PR) Alliance for Graduate Education. |