Lecture 2 Flashcards Preview

Evolutionary Genetics and Genomics > Lecture 2 > Flashcards

Flashcards in Lecture 2 Deck (23)

Mutation rates differ across the genome, between species and by type. Methods of calculating mutation rates can be based on..

- the number of neutral sequences
- divergence between neutral sequences between two species


Types of neutral sequences include..

- junk DNA
- pseudogenes
- four fold degenerate sites
- ancestral repeats
- fraction deleted which reflects deletion rate


Four fold degenerate site:

- The situation in which any nucleotide in the third position still encodes the same amino acid.
- eg) The third base in the codons that encode glycine: GG and A, C, G or T.


Mutation rate is not even across chromosomes.. why?

- X chromosome mutation rate is lower than in the Y chromosome
- This is because there are more cell divisions in spermatogenesis than oogenesis in females.
- This means there are more opportunities for mutations to occur.


Mutation rates differ between species. A comparison between mice and humans showed

- 67% identity and 33% divergence
- This is an underestimation of divergence because parallel mutations and revertants can occur.


Dukes and Kantor model:

- Corrects for multiple hits
- We have diverged X, and we can figure out how many changes have happened over time, including parallel revertants, but excluding CpG's (which can be biased)


Species have different mutation rates eg)

- Point mutations (single nucleotide mutations) have occurred twice as frequently in rodents than humans.
- Use an out-group species to


Indels (insertion/deletion of nucleotides) occurrence:

- Deletions occur more frequently than insertions
- Deletions are bigger than insertions.


How long ago did humans and mice diverge?

65-100 mya


2012 data, the Encyclopedia of DNA elements (the ENCODE project) showed that not only the protein coding sequences are functional. What else is functional?

- cis and trans regulatory elements
- DNA replication sites
- Histone modification sites
- The transcriptome was produced, and 75% of the genome is transcribed


What is the definition of function?

- if a TF binds, a transcript is produced, chromatin state is alters, does this imply function? (as ENCODE seems to think)
- if you were to lose it, would there be a fitness cost


How can you determine what type of selection is occurring?

- calculate divergence rates, synonymous sites, introns, intergenic regions, coding sequence etc.
- compare them to determine whether purifying selection or negative selection is occurring and whether there is any selective constraint


Divergence studies between D.melanogaster and D.simulans allowed calculation of.. and showed..

- divergence rates of CDS, synonymous sites, untranslated regions, introns, intergenic regions and pseudogenes.
- this showed that there is a lot of selective constraint acting on CDS and UTRs, and this is why they are evolving slower.
- Mutations may arise there, but they are purified out by negative selection because they are deleterious.


Phylogenetic footprinting:

- looking across different sequences to determine whether there is a transcription
- Test putative regulatory elements with transgenic experiments


DNAse1 footprinting steps
(6 steps)

Flourescently label one end of the gene.
1. Region of DNA bound by DNA binding protein
2. Random cleavage by nuclease or chemical
3. Removal of the protein
4. Separation of the DNA strands
5. Separation by Gel electrophoresis
6. A footprint will show where no cleavage occurred, because of the binding of the DNA binding protein


Phylogenetic shadowing eg) ApoA

- Phylogenetic footprinting technique with a number of closely related species
- eg) High plasma levels are a risk indicator for cardiovascular disease. The ApoA gene is only found in old world monkeys so we can't compare to mice as they aren't old world monkeys. Cant compare to chimps, because there isn't enough divergence
- Get divergence by sequencing lots of closely related species in the old world family to identify functional regions based on conservation.


Comparative genomics shows some regions in the DNA that are completely conserved between multiple organisms:

- Ultra-conserved sequences - 100%
- Conserved non-coding elements 80%
- Conserved non-genic sequences 70%


Copy number Conserved Non-Genic regions (CNGs):

- Are not transcribed
- Don't show substitution patterns of coding or not transcribed regions of RNA sequences.


Ultra-conserved sequences:

- often clustered
- in gene deserts
- many are near genes that involved in regulation of transcription or development
- Some overlap with exonic and these are highly enriched for genes involved in RNA binding and splicing regulation


How could you test for ultra-conserved sequences?

Test for regulatory modules: reporter assay
1. Take none-coding region, place upstream of a reporter gene
2. Test if reporter is expressed in the same way as the gene it usually codes for



- a gene involved in eye development
- tested by putting these conserved genes upstream of the gfp reporter gene
- when this is performed in zebrafish, this allows us to see where the pax6 genes are expressed (in the eye).


The even-skipped (eve) counter-example:

- Stripe 2 enhancer is a regulatory module composed of multiple TF binding sites. This drives even-skipped expression in the 2nd stripe.
- The strip-2 enhancer is conserved in lots of species, but not D.pseudoobscura.


Lessons from eve:

- Enhancer function can be conserved without sequence conservation, as it expresses at the correct place and also rescues to the same extent.
- It is functionally equivalent, even though it is not conserved in this species.
- TF binding sites change relative position over evolutionary time