Week 7 Flashcards

1
Q

What is genomics?

A

The study of the entire hereditary information in an organism, which is mostly encoded in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are genomes?

A

The sequence of all the DNA in a cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are subsets and products of the genome?

A

Mitogenome the mitochondrial genome
Exome all the exons that could potentially be expressed
Transcriptome the expressed genes (expressed exons) in a particular tissue or set of tissues that you are studying
proteome the proteins
Metabolome other metabolic products…
Microbiome the combined microorganisms inhabiting a Particular environment (e.g. your gut), which can be detected using sequencing strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is needed to study genome?

A

You need its DNA sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What species were prioritised for sequencing?

A

1 - fuzzy or good to eat eg tomatos, rice, cows and dogs
2 - If it belongs to an evolutionarily, scientifically, or economically important species eg ants, bees nematode, mosquito, Arabidopsis and human

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the number genome sequencing of bacteria?

A

Thousands of species of bacteria (a few hundred dollars per genome, these days)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are examples of genome sequencing projects?

A

Vertebrate genomes project - generate error free genomes of all 66,000 extant vertebrate species
Darwin Tree of life project - gene sequence of all life living in the UK
Earth Biogenome project - gene sequence for all life on earth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an overview of the shotgun sequencing method?

A

Collect the organism and extract a lot of high-quality DNA (long strands)
break into fragments (enzymes, sound)
read the fragments with a high-throughput sequencer (currently Illumina, PacBio and Nanopore machines are dominant)
Piece together the fragments
Recognise the components (annotation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How large of the DNA fragments created in shotgun sequencing?

A

De novo genome assembly is piecing together an encyclopedia from 300-500-letter fragments of sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the cons of shotgun sequencing?

A

It takes a LOT of work and money to make a ‘finished’ genome from raw fragments. Most published genomes are just tens of thousands of fragments (contigs and scaffolds), but are long enough to read lots of genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the order for shotgun sequencing?

A

Reads –> Contigs –> Scaffolds –> Chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the pros and cons of long-read sequencing?

A

Long-read sequencing = less rebuilding needed afterwards!
Better for reading repetitive regions of the genome currently error prone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the length of PACBIO and Nanopore reads?

A

PACBIO - 20-40 kb read lengths
Nanopore - Up to 100 kb read lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the uses of sequencing genomes?

A

Genomes themselves are interesting to study
(Try) to find all the genes involved in a phenotype, not just one or two
To reconstruct deep phylogenies (phylogenomics)
Cancer genomics
Inform conservation strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are examples of organisms with variable gene size?

A

Influenza - 11
E.coli - 4,149
Fruit fly - 14,889
Chicken - 16,736
Human - 22,333

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can genome size vary?

A

Basic features similar, genome size is highly variable
10,000 fold range between fungi and flowering plants
Number of genes varies much less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are examples of genome size and number of genes in plants?

A

Arabidopsis thaliana - ~25,000 genes, 135 Mb genome size
Canopy plant (Paris japonica) - ?? genes, 152,000 Mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a case study about investigating gene diversity?

A

Three difference identical looking species of fish
1 is a diploid (Corydoras maculifer), 1 is a tetraploid (Corydoras aragu) and 1 is an unknown
Investigated the variation in the immune genes
Does the increased genome size and copy number have difference in immune genes?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What was investigation into genetic diversity of different fish species looking at?

A

TLR1 and TLR2
PCR amplified 2 toll-like receptor genes
2.5 Kb each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What was the overview of the sampling of the difference fish species?

A

Polyploid - n = 30
Diploid - n = 23
Sequenced on NextSeq platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What was the genetic diveristy metric used?

A

Single Nucleotide Polymorphisms leading to changes in amino acid sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What was the genetic diveristy metric used?

A

Single Nucleotide Polymorphisms leading to changes in amino acid sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a synonymous SNP?

A

A SNP that has a change in nucleotide sequence but not in the amino acid sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a non-synonymous SNP?

A

A SNP that has a change in nucleotide sequence causing a change in the amino acid sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
What is important about haplotypes for genetic diversity?
Number of haplotypes an organisms has can suggest what ploidy it is
25
How can you measure genetic diveristy using SNP and haplotype?
Sequence all the DNA compard to a target DNA sequence with sequencing depth showing how many times a sequence has been repeatedly sequence Then find a SNPs and count then up eg A in base and frequently in other strands
26
What is the difference between haplotype number of a diplod and tetraploid
SNP ratio of 50:50 shows that it is diploid SNP ratio of 25:75 shows that it is tetraploid
27
What was the difference in SNPs between the diploid and tetraploid fish?
Diploid fish has very few SNP across all groups eg Non-synonymous/ synnoymous and both TLRs Tetraploid fish had more SNPs across the board
28
What can show the ratio for the haplotypes of SNP in tetraploid/ haploploid?
Histogram SNP read ratio on x axis around 0.5 is a diploid SNP read ratio on x ratio with two major peaks around 0.25 and 0.75 shows tetraploid
29
What is an examole of comparative genomics?
Comparing the genes involved in echolocation between micorbats and toothed whales
30
What was the candidate gene for echolocation?
Prestin gene (important for ‘electromobility’ of the cochlear ear) from echolocating & non echolocating bats, echolocating dolphin, and many (non-echolocating) mammal
31
What is the overview of the phylogenetics of the prestin gene?
High suggestive of convergent molecular evolution of a key geen for echolocation
32
What are the problems of using the prestin gene?
Its just one gene and the trait of echolocation can't be due to one or a few genes There mist have been adaptive changes throughout the genome Further studies found other examples of convergent genes
33
What is the overview of genome wide comparison of echolocating animals?
Compared whole genomes of 6 species of bat (4 echolocating and 2 non-echolocating), 1 species of (echolocating) dolphin, and 15 other (non-echolocating) mammals. ~20-30,000 genes per species, but their genomes are very fragmented, so they ended up with 2,326 coding sequences (~genes) that were in all 22 species Made a huge phylogeny from all the genes (known as phylogenomics)
34
What where the results of the genome wide comparison of the origin of echolocation?
Echolocating independatly evolved across species Alternative (false) phylogeny that groups all the echolocating species together (think of the Prestin gene example)
35
How supported was the alternative phylogeny based on genes like prestin?
400-800 genes that supported the echolocators-together These 400-800 genes are candidates for genes that were selected to allow echolocation, and they included the 7 candidate genes that have been published before (e.g. prestin). But they also found many other hearing-related and vision-related genes
36
What is the function of resequencing?
Compare individuals within a single species
37
What is the advantage of having a reference genome?
Once you have a high-quality, reference genome, the genomes of subsequent individuals from the same species are much easier to assemble. If you already know the book, you can assemble similar versions of the book from fragments, even if you have fewer fragments
38
How expensive is resequencing?
Resequencing of humans is very common these days and costs less than $1000 per genome
39
What is an example of whole genome resequencing?
Whole genome sequencing of Oreochromis cichlid fish
40
Why did they whole genome sequence a large number of genome of cichlid fish?
Use the resequencing data to identify ideal candidates to be bred from for aquariums and hobbists to reduce the pressure on natural populations of cichlid fish as they know the genome
41
What is an example of resequencing on evolutionary history?
Relationship between gray wolf species and compared to domestic dogs Compared ancient wolf DNA which showed that modern dogs evolved from East eurasian wolf population
42
What is phylogeography?
The field of study concerned with the principles and processes governing the geographical distribution of genealogical lineages, especially those at the intraspecific level
43
What is biogeography?
Biogeography is the study of the patterns and causes of the distribution of living things
44
How can geography impact biology?
Almost all taxa are restricted geographically to some degree. Some are very restricted - endemic to a small area
45
What are the major facotes in biogeography?
Major historical factors that influence current distributions include vicariance and dispersal
46
What is vicariance?
Splitting of distributions when ancient landmasses split and separate due to continental drift, or when mountain ranges divide lowland populations
47
What is dispersal?
The movement of organisms or their propagules (e.g seeds)
48
How can vicariance and dispersal be investigated?
The relative contribution of vicariance and dispersal to current distributions can be investigated using phylogeographic methods
49
What is wallace's line?
Wallace’s line divides two regions that have separate tectonic histories that have only recently come into contact
50
What is a large factor in biogeographic distrubutions?
Continental drift offers an explanation for many biogeographic observations
51
What is an example of Wallaces line?
Indopacific North west contains large placental mammals eg Javan Rhino and Sumarian tiger South east contains marsupials and megapodes (terrstrial birds) Located on seperate tectonic plate
52
What is overview of information used in phylogeography?
Phylogeography generally uses genetic information to examine genealogical history and patterning within species and populations. This information is used to infer relationships of biogeographic areas and species histories
53
What is an overview of the function of phylogeography?
Phylogeography concerns the relationships between gene geneologies, phylogenetics and geography Aim of understanding the factors contributing to the formation of genetic population structure
54
What is an example of a use of phylogeography?
Phylogeography can explain the consequences of major historical events that had continent-wide impacts Effect of glaciation particularly well studied in Europe Last ice age had a glacial maximum ~18,000 – 22,000 years ago Fragmentation of populations in refugia (in Europe, the Iberian, Italian and Balkan peninsulas)
55
What is the migration pattern of hedgehogs after the last ice age?
Hedgehogs that refuged in spain migrated through france to UK Hedgehogs that refuged in Italy migrated through central europe eg germany and ended up in scandinavia Hedgehogs that refuged in greece migrated migrated through eastern europe and ended up in Russia
56
What is the migration pattern of Grasshopper, (Chortippus parallelus) after the last ice age?
Grasshoppers that refuged in spain stayed in spain Grasshoppers that refuged in Italy stayed in Italy Grasshoppers that refuged in Greece migrated across all of europe Cantabrian Mountains and Alps stop the migration of grasshoppers
57
What is the migration pattern of Bear, (Ursus arctos) after the last ice age?
Bears that refuged in spain migrated through France through to UK and Scandinavia Bears that refuged in Central Asia migrate through Russia through to East Europe and Northern Scandinavia Grasshoppers that refuged in Greece migrated across Balkans
58
What is the migration pattern of Chub, Leuciscus cephalus after the last ice age?
Chub that refuged in spain stayed in spain Chub that refuged in Italy stayed in Italy Chub refuged in Danube spread across europe travelling to UK and central europe Chub refuged in Ukraine spread through Russia and East Europe to baltics Cantabrian Mountains and Alps stop the migration of Chub
59
What are the markers used in phylogenetic studies?
Mitochondrial (mtDNA) or chloroplast (cpDNA) vs genomic markers
60
What are considerations for phylogenetic markers?
Polymorphism, recombination, mode of inheritance
61
What are methods used in phylogeography?
Coalescent theory Nested Clade Analysis (not much any more)
62
What are the advantages of using mtDNA as a marker?
Effectively neutral markers High mutation rate means that variation will usually be present High copy number allows for ease of amplification from limited or archived samples Effective population size ¼ of diploid nuclear genes so genetic drift occurs faster No recombination so each uniparentally inherited haplotype has only one ancestor in previous generation
63
What are the disadvantages of using mtDNA as a marker?
Uniparentally inherited so if there are differences between sexes then no information about one sex In plants mtDNA is less variable and recombines cpDNA variation higher, but not as high as animal mtDNA
64
What is an overview of using nuclear DNA as a marker?
Nuclear DNA markers are recombining and can be under selection More complete but far more complicated picture More difficult to obtain from archived specimens (fewer copies). Starting to increase with the development of Radseq and genome resequencing
65
What is an overview of Nested Clade phylogenetic analysis?
Invented by Templeton Aims to identify past demographic events that have shaped the history of a population or populations Geographically contextualized gene genealogies Infer demographic history of each taxon Prone to false inferences…. Not used any more
66
What is the pattern of use of Nested Clade analysis overtime?
Started in late 90s and then spiked in early 2000s by 2008 hit peak and then sharply declined and in 2020 no publications used it
67
What is an overview of Coalescent theory?
The tracing of allelic ancestries back to their most recent common ancestor mtDNA lineages will coalesce on average 4 fold faster than recombining nuclear markers With nuclear DNA sequences recombination is possible within and between alleles The number of potential ancestors of an individual doubles with each generation back
68
What is statistical phylogeography?
Based on coalescent models for parameter estimation and hypothesis testing
69
What is a hypothetical example of the use of statistical phylogeography?
One might want to test two models: Model A that posits that extant populations in the focal taxon arose from a single population that persisted since before the last glacial maximum (LGM), Model B that posits that extant populations descended from two isolated populations that both persisted since before the LGM. A summary statistic is calculated from simulated data sets under each model to obtain a distribution of the summary statistic under each respective model. In this scheme, the probabilities of both models are evaluated with respect to the summary statistic calculated from the empirical data (Knowles, 2001).
70
What is a rough method for statistical phylogeohraphy?
1: Summary statistic calculated from simulated datasets under each model to obtain a distribution of the summary statistic under different models. Probabilities of each model are then evaluated with respect to the summary statistic estimated from empirical data 2: Full Likelihood/Bayesian approach 3: Approximate Bayesian Computation
71
What is lineage sorting?
Equilibrium between mutation and drift
72
When is lineage sorting fastest?
Lineage sorting effect greatest with small population sizes (more likely to lose alleles by chance)
73
What impacts the number of haplotypes in a population?
Number of haplotypes in a population is a function of current and historical effective population sizes
74
What is network of haplotypes?
Most common and widespread allele in centre With single mutation steps decending off of them Size of circle represents the number of organisms with the mutation Perpendicular lines on ajoining line can be used to show multiple mutations have occured
75
What can impact haplotype spread?
Few species exist as single, undifferentiated populations Vicariance and dispersal
76
What is vicariance?
Process of separation due to environmental events (eg. formation of mountain ranges, sea-level changes)
77
What is important about vicariance and dispersion?
Together with dispersal ability important determinant of species natural geographic range
78
What factors affect divergence between populations?
Gene flow between species (inter-specific hybridization Genetic drift and gene flow
79
What can impact genetic drift and gene flow?
Isolation by distance Physical barriers
80
What is the use of haplotype trees in phylogeography?
Correlations between haplotype trees and geographical information can allow inference to be made about genetic processes occurring between populations
81
What is a haplotype network?
A haplotype network in which each number represents a different haplotype, and the size of a circle is approximately proportional to the number of individuals sequenced containing that haplotype
82
What is a general key?
Solid circles represent haplotypes that were either not sampled, or are extinct. Lines connecting haplotypes indicate single mutational differences. Circles are coloured for reference to their geographic distribution
83
What does a branch mean in a haplotype tree?
Each branch = 1 mutational step
84
What happens when you mix geography and gene tree?
Congruence of geography and gene tree – the most ancient haplotypes located at centre of tree and are geographically the most widespread, most recent haplotypes at tips of tree and localised geographically
85
What happens if you break down congruence between persistence and differential sorting?
Congruence broken down if there is persistence and differential sorting of haplotypes Gene flow can be used to further break it down
86
What predictions can be made about haplotype networks?
High frequency haplotypes likely to be older With in a network, older haplotypes more likely to be interior. whereas newer haplotypes likely to be peripheral Haplotypes with multiple connections are likely to be older Older haplotypes are expected to have a broader geographical distrubution - more time to disperse Haplotypes with only one connection are likely to be connected to haplotypes from the same population -> evolved recently less time to disperse
87
What did the haplotypes in African Buffalo show about their behaviour?
Chobe location geographically distant from other sites Genotypes in Chobe from different places in haplotype map FST = 0.08 High levels of migration
88
What did the haplotypes in Impala show about their behaviour?
Chobe location geographically distant from other sites FST = 0.1 Fragmentation or isolation by distance by distance Genotypes in chobe are clustered on a haplotype map
89
What is an overview of biogeography of the southeastern USA amogst freshwater fish?
Gulf coast and Atlantic coast freshwater drainages Large faunal differences between two drainages with biggest split between western and eastern drainages
90
What is the difference in mtDNA amogst taxa characteristics of freshwater fish in S.E USA?
A pronounced pattern of intraspecific mtDNA concordance among taxa characterises freshwater fish in the southeastern U.S Seen with distrinct mtDNA mutations depending on which basin, species seen in Bowfin (Amia calva), Spotted Sunfish (Lepomis punctatus), Redear Sunfish (Lepomis microlophus)
91
What can be said about the impact the distrubution of the genetic diversity of S.E USA freshwater fish?
Theoretically, the same factors that are inferred to have influenced the distribution of genetic diversity in southeastern American freshwater fishes, should have had similar effects in the marine and coastal environment.
92
What was the hypothesis proposed for the distinct subpopulations of S.E freshwater fish?
During glacial advances, an enlarged Floridian peninsula may have contributed to the separation of some Atlantic and Gulf coast populations through creation of a rather isolated pocket of estuarine habitat in the Western Gulf of Mexico.
93
What was used to test whether and enlarged floridian penisula may have contributed to seperation of freshwater fish populations?
10 unrelated species or species complexes analysed using mitochondrial DNA.
94
What did they discover about the subpopualtions of the gulf and atalantic fish populations?
Overall, among the 10 coastal and marine species or species complexes surveyed, at least 5 and as many as 8 evidence a fundamental mtDNA subdivision involving Atlantic versus Gulf coast populations
95
What does the outcome of the mtDNA sequencing suggest about the splitting of the populations?
This concordance within and among the faunas of the marine and freshwater environments provides a compelling case for a strong influence of historical biogeographic factors within the southeastern United States
96
What species were the exceptons for genetic difference between atalantic and gulf populations?
Two species, the hardhead catfish and the American eel showed no phylogenetic subdivision between the Gulf and Atlantic
97
What species have shown differences between the two populations?
Horseshoe crab, American Oyster, Toadfish and Black sea bass
98
What is an example of phylogeography in Hawaii?
Clermontia - phylogentic postition can be used to show potential dispersion events and then can be combined with a molecular clock to show which is most likely
99
What is an overview of the distrubution of tigers?
Tigers historically ranged across Eurasia from the Sunda Islands, west through the Indian subcontinent to the Indus river and north along the Pacific seaboard and a wide swath of central Asia from the Russian Far East to eastern Turkey.
100
What is an overview of the distrubution of the caspian tiger?
Caspian tiger thought to be a subspecies separate from others based on morphological grounds. Became extinct in February of 1970 when the last survivor was shot in Hakkari province, Turkey
101
What are the three distinct routes that the caspian tiger could of gotten to central asian range?
A) a southern route, via the Indian subcontinent south of the Himalayan plateau B) a northern route, settling first the Amur region and then traversing Siberia westward, north of the Mongolian steppe C) via the historical “Silk Road” through the Gansu corridor, between the Himalayan Plateau and the Mongolian Gobi desert
102
What tiger species would Caspian tiger have an affinity if followed route A?
A close molecular affinity would exist between Caspian tigers, P. t. virgata, and Bengal tigers, P. t. tigris
103
What tiger species would Caspian tiger have an affinity if followed route B?
A northerly migration would predict genetic admixture and similarity of P. t. virgata with South China tigers P. t. amoyensis, as a result of range overlap during the postulated migration.
104
What tiger species would Caspian tiger have an affinity if followed route C?
The Amur tiger (P. t. altaica) and the Caspian tiger (P. t virgata) are sister taxa to the Indochinese tiger (P. t. corbetti) from which they are separated by six mitochondrial steps (five for P. t. virgata).