Population and Comparative Genomics Flashcards

1
Q

what is population genomics?

A

gives a comprehensive picture of genetic variation within species by looking at whole genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what features can we characterize using population genetics?

A
  • demogrpahy

- natural selection (purifying, adpative, balancing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the first stage of gathering population genetics data and what does it entail?

A
  1. hypothesis/query

- need to know what you want to find out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the second stage of gathering population genetics data and what does it entail?

A
  1. sample collection and DNA extraction
    - choose 100s/1000s of individuals information
    - choose geographic/habitat of interest
    - extract genomic DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the third stage of gathering population genetics data and what does it entail?

A
  1. genome sequencing
    - sequence the DNA, reads are from sections of the genome
    - want lots of reads
    - obtain sequence coveraring 5-40x coverage
    - sequene genome using ‘short’ read technology
    - main issue here is cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the fourth stage of gathering population genetics data and what does it entail?

A
  1. read mapping and ‘variant calling’
    - locate genetic variants (sites of the genome that differ)
    - find where each read matches to the genome
    - looking for polymorphisms
    - use SNPs and indels
    - can map sequence reads to a reference genome and identify sites that differ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the fifth stage of gathering population genetics data and what does it entail?

A
  1. segregating genetic vairants
    - as a result of read mapping you want a list of positions that vary
    - alleles/polymorphisms/variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the sixth stage of gathering population genetics data and what does it entail?

A
  1. analysis
    - analyse certain sites and use their traits to determine which alleles have an effect on a particular trait
    - describing demogrpah
    - detecting selection
    - quantitative genetics like GWAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is sanger seqeuncing?

A
  • small scale (not high throughput)
  • technology of hcoice for low-medium output sequencing
  • can use it for one gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is illumina?

A
  • produces vast numbers of reads
  • much quicker, short lengths of sequences
  • technology of choice for genome re-sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is PACBIO?

A
  • pacific biosciences
  • produces larger reads
  • fairly accurate
  • one technology of choice for genome assemblies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the oxford nanopore?

A
  • produces very long reads (up to 40,000 nucleotides long)
  • advancing fast but more expensive
  • has the worst error rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is meant by demography?

A
  • estimates of population size (can also estimate population size backwards through time)
  • population structure (which individuals are more or less closely related)
    • migration and ‘gene flow’ between populations
  • inbreeding/outbreeding rates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is selection in population genetics?

A

which regions of the genome are subject to strong purifying selection (remove bad mutation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is an example of quantitative genetics?

A

GWAS: which alleles contribute to traits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how are demography, selection and quantitative genetics interrelated?

A
  • expanding and shrinking population sizes effect selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the concept of genetic diversity in population genetics?

A

within a region of a genome there are different amounts of diversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are polymorphisms/alleles/variants?

A
  • sites in the genome that differ between individuals of a species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are SNPs?

A
  • single nucleotide polymorphisms

- these are the most common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are indels?

A
  • small insertions or deletions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the human genome comosed mostly of?

A

transposons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what are examples of structural variants?

A

duplications, rearrangements, large inserrtions/deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the initial origin of variation?

A

a mutation in one individual

- all polymorphisms start with a single mutation in the popultaion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how can polymorphisms move?

A

through space and time within a population

- their frequency will change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how will a polymorphism occur in a population?
- get two separated population - one gene gets across - a mutation is shared - over time it would increase
26
what are most mutations?
- neutral | - deleterious adnd therefor elost
27
what happens if variants are physically linked on the chromosome?
they tend to travel together but can become unlinked through recombinations
28
what is the concept GWAS?
- GWAS - lots of data is in a matrix (0s and 1s) - want to use summary statistics - summarising information in one number - average pairwise similarity
29
what does S stand for?
the number of segreagating sites
30
what is MAF?
- minor allele frequency
31
what is DAF?
derived allele frequency (frequency of new allele in populate) - need to know the ancestral genome - DAFs are rare as they tend to get lost - suggests adaptation
32
what is the concept of Tajimas D?
describes whether you have more or less rare alleles than expected
33
what happens if you have a negative tajimas D?
- have more rare alleles then you expect - happens when theres a selective sweep (new mutations throughout the population) - or expanding population
34
what happens if you have a positive tajimas D?
- too few rare alleles - signal of balancing selection - shrinking population - population structure
35
what happens if tajimas D =0?
neutrally evolving, stable population
36
what is the concept of population structure?
- when you have individuals more likely to breed with each other than another set - can see this through genomes
37
how can you look at population structure?
- genomes - act as markers to track evolution - when people etc move they carry DNA - populations somtimes have small contributions which cant be drawn on a phylogenetic tree
38
what are the rules of population strutcure?
mutations are rare, drift through populations, recombinations
39
what is the concept of purifying selection?
- loss of deleterious alleles - they are removed from the population - they are less fit so they die/produce less offspringg
40
what does the process of purifying selection result in?
- reduces diverstiy in regions that are important - increase proportion of rare alleles - causes a negative tajimas D - purifying selection expected to be a common event
41
what is the properties of most new mutations?
deleterious
42
why do exons have much lower diversity?
- mutations are more likely to be deleterious | - exons have an important function so deleterious mutations are removed quickly
43
what is the concept of adaptive evolution?
- new mutation is helpful and increases to become more common in the population - has similar effects to purifying selection (difficult to differentiate)
44
what does the process of adaptive evolution do?
- reduces diversity around the beneficial allele - increases rare alleles - causes a negative Tajimas D - adaptive selection is expected to be a rare event
45
why is adaptive evolution rare?
a mutation causing a beneficial adaptation through a random change will be rare
46
what is a selective sweep?
overtime not only will the beneficial allele become more common but so will the linked alleles
47
what is a haplotype?
- region of the genome with alleles that are linked
48
what is the concept of balancing selection?
- advantage to maintaining more than one allele in a population - very rare - when the heterozygous are fitter - advantage of rare alleles but when they become common they are less advantageous
49
what are the results of balancing selection?
- maintains more diversity | - cause a high tajimas D
50
what is the concept of polygenic selection?
- GWAS shows that most traits are determined by multiple genes - called complex traits - selection acts on all the alleles at once - there is therefore selection for multiple genes - when these traits evovle many alleels traits
51
what is the concept of linkage of alleles on the chromosome?
- when a strongly beneficial allele arise it will 'sweep' through the population - arises very quickly - alleles close to it will be carried because they are linked
52
what are the results of linkage of alleles on the chromosome?
- loss of diversity around the sweep - increase in linkage - produces a large loss of genetic diversity (always the same)
53
what happens when recombination occurs?
linked alleles can become unlinked
54
what is comparative genomics?
- the comparison of genomes between species
55
what does comparative genetics involve the analysis of?
- gene orthologs/paralogs, gene family expansions - gene loss/gain - evolutionary rate of genes - conserved genic and non-genic regions - conservation/changes in synteny (gene order)
56
what are orthologs?
gene which is from a recent ancestor between species
57
what are paralogs?
gene which is from a recent ancestor within species
58
what is the first stage of collecting comparative genomics data?
1. sequence and assembly a genome - choose the organisms interested in - assembly: connecting ll short/long sequencing reads in continuous seqeunces - sequnce machines are generally shprt reads
59
what is the second stage of collecting comparative genomics data?
2. annotate your genome (identify gene starts, ends, exons and identify gene types homology)
60
what is the third stage of collecting comparative genomics data?
3. align/ compare your genome to others - whole genome alignment - using BLAST to locate similar genes
61
what is comparative genomic data produced on?
linux server - large amount of data with a lot of processing required
62
what can be found from comparative genmoics?
* Which genes have been lost in a lineage * When genes have been gained  created through things like gene fusion * Which are the fastest evolving genes * Conserved genic and non-genic regions * How a species may have evolved to adapt to some new niche  how a particular species has evolved and adapt says something about long term evolution * The higher the peak the slower the rate = more conserved  purifying selection removes deleterious alleles
63
what is the concept of diversity of divergence are related in comparative genomics?
genetic diveristy within species gives rise to divergence between species
64
what is genetic diveristy?
differences within species
65
what is divergence?
differences between species
66
what are exons?
evolve slowly, mutations most often remove
67
what is an example of genetic diveristy giving rise to divergence?
- one population splits into two population - at some point there is no interbreeding - different alleles become fixed independently through mutations arising
68
what is fixation?
when a polymorphism becomes present in all individuals in a species (or population)
69
what is the concept of evolutionary rate in comparative genomics?
- evolutionary rate is the number of differences that occur over time or how many mutations are fixed in a population over time - measure via alignments from genes and genomes - every genome evolves at a different rate
70
how can evolutionary rates be measured?
- substitutions/year: certain numbers of substitutions per year (have to know the years they've been separated) - substitutions/gene or per site between two or more species
71
what is the concept of purifying selection in comparative genomics?
- selection to remove deleterious mutations | - over time this results in slower rates of evolution in regions of the genome with more essential function
72
what are introns?
- not conserved and are therefore not removed by purifying selection
73
what happens if regions are more highly conserved?
- suggests that the regions are more important
74
how can purifying selection be detected in comparative genomics?
- via genome alignment - looking for regions that remain the same between species - can show evolutionary rate: slower rates of evolution result in more important regions being conserved
75
what is synonymous change?
does not change the amino acid encoded for, would therefore not have a strong genetic outcome
76
what is non-synonymous change?
- does change the amino acid encoded for | - more likely to have functional consequence (which will generally be deleterious)
77
is the rate of synonymous change slower than non-synonymous change?
no
78
what is the concept of adaptive evolution in comparative genetics?
- increase frequency of adaptive allele - some genes/genomic regions evolve to have new/improved functions - this is one path to adaptation - such genes change faster than we expect by chance
79
what tests can be use to measure adaptive evolution in comparative genetics?
1. the dN/dS test | 2. the McDonald-Kreitman test
80
what is the dN/dS test?
- dN: the rate of non-synonymous change - dS: the rate of synonymous change - gene that change their function rapidly may have a higher dN than dS
81
what is the McDonald-Kreitman test?
- use for detecting adaptive change between species | - and for detecting balancing selection within species
82
what is the rate of synonymous change (dS)?
- synonymous change does not affect the protein produced - will have little or no effect on the fitness of the organims and so are selectively neutral and will accumulate - sometimes they can result in non-optimal codon (rare) - if species are far apart this rate needs to be corrected for multiple hits
83
what is the rate of non-synonymous change (dN)?
- non-synonymous change does affect the protein produced - most will be deleterious and lost - so the dN rate will generally be slower than the dS rate - hence the dN/dS rate is generally less than 1
84
what does it suggest if dN>ds?
- there has been many non-synonymous changes | - this is rare and a signature of adaptive evolution
85
what is the concept of polygenic selection and genome-scale data in comparative genomics?
- SNPs in many genes can affect one trait - adaptation may cause gradual changes in many genes - can detect this by looking for concerted signals over certain categories of genes that work together
86
what is the assumption of the McDOnald-Kreitman test?
tests the assumption that diversity within a species gives rise to divergence between species (assumes theres a stable ratio) - assumes a stable ratio of synonymous and non-synonymous polymorphisms - over time polymorphisms become fixed - gives rise to the same ratio of synonymous and non-synoymous fixed mutations
87
how can you test the McDonald-Kreitman test?
- using the chi squared test - count that sites that are synonymous and non-synobymous - chi-squared - find if the rate is stable
88
what is the result of a McDonald-Kreitman test for a neutrally evolving gene?
- ratio will be consistent
89
what is the result of a McDonald-Kreitman test for an excess of non-synonymous fixed differences (a non consistent ratio)?
adaptive evolution between species
90
what is the result of a McDonald-Kreitman test for an excess of non-synonymous polymorphisms within a species (a non-consistent ratio)?
balancing selection to maintain different non-synonymous differences within species