Plant diversity and from bone to genome Flashcards by Lina Håkansson

Why is it important to study plants?

We cannot live without plants! We use them for food, material for building, fabric/clothing, they produce useful chemicals (secondary metabolites), we burn them (fossil fuels) for energy and they produce much of the oxygen we breathe. Also, they are pretty and make us happy! People at work who can see plants report significantly greater job satisfaction than those who can’t and many substances we like come from plants, cocoa, coffee, alcohol, weed, tobacco etc.

Furthermore, plants are eukaryotes like us and studying them can help us learn a lot! Some of the most important discoveries in biology came from plant studies, e.g. heredity, the cell was first discovered though looking at plants and the first described viruses were purified from plants.

How well did you know this?

Not at all

Perfectly

When was the green revolution? What is it?

The green revolution started in the 1950s (1950-1984). During this time, there was major advances in agriculture like modern plant breeding resulting in high yield varieties of food crops and the use of synthetic fertilizer, that increased the world grain production by 160%!

How well did you know this?

Not at all

Perfectly

Plants can also be used to inform about the ancient world, give two examples of plant derived proxies in palaeogenetics.

Plant derived proxies are:
- Pollen
- Macrofossils
- sedaDNA

How well did you know this?

Not at all

Perfectly

Plants have evolved to thrive in diverse habitats, give five examples.

Plants have evolved to thrive in:
- Desert climates: drought, heat and irradiation tolerance
- Grassland
- Alpine climates: cold, dry, low light
- Rain forest: Hot, humid, wet
- Aquatic climates
- Swamp forest
etc.

They have also evolved with very diverse forms and functions, flowering plants, trees, grass etc.

How well did you know this?

Not at all

Perfectly

How do we think plants evolved?

Plants are thought to have evolved from an ancestral eukaryotic cell (containing mitochondra) that also acquired a cyanobacteria through endosymbiosis, which later became chloroplasts.

How well did you know this?

Not at all

Perfectly

Plants originated in the ocean (as all other life) but when did they colonize land and what was the major difficulties they encountered up there?

Plants colonized land ~600 million years ago, and was met by a very harsh environment: first of all High UV irradiation, gravity, pathogens and wind. Later came challenges with herbivores, competition, seed dispersal, pollination etc.

How well did you know this?

Not at all

Perfectly

Plants are a highly diverse group, explain the six molecular evolution mechanisms that facilitated this.

Molecular evolution mechanisms:

Promoter duplication: If a promoter is duplicated, the gene it controls will be able to be even more active –> increased expression levels. This can also alter the tissue specific expression pattern.
Missense mutation: A mutation that causes a change in amino acid sequence in the resulting protein, which can lead to new functions.
Nonsense mutation (Premature stop codon): If a mutation leads to a stop codon instead of an AA it result in truncated gene products –> often leads to the gene being non-functional.
Intergenic space deletion: Leads to gene fusion, which can result in new function or non functional.
Transposable elements: Insertion deletion of DNA, can lead to new or lost function, overall variation.
Gene duplication: When an ancestral gene is duplicated, you have two of the same gene (Paralogous genes) and the function is redundant, which opens up the opportunity for changes being tolerated in one of them which can lead to new functions (ie neofunctionalization).

All of these mechanisms drive evolution in animals too, but unlike animals, plants are highly adaptive and can tolerate these mutations to a much higher extent!

How well did you know this?

Not at all

Perfectly

Name three use cases of plants in aDNA studies.

Plants can be useful for aDNA studies in many different contexts, for example:

Reconstructing past ecosystems: which plant species was there? What can they say about the environment?
Ancient ecosystem dynamics: How has the ecosystem changed in the past with introduction of new plant species? Competition?
Biogeography: Plant colonization can help us in understanding past geographical changes like tectonic plate movements.
human history and agriculture: aDNA can reveal the wild ancestors to domesticated crops, it can help us understand what was selected for and can be useful in bringing back genes that help with tolerance, e.g. temperature, drought etc. It can also be used to study human interactions and movements, by seeing how domesticated crops spread.

How well did you know this?

Not at all

Perfectly

Historical climate change also shaped plant diversity and evolution, name three biotic and three abiotic stressors for plants.

Biotic stress: Pathogens, Insects, Herbivores

Abiotic stress: Drought, Flood, Extreme temperatures, UV irradiation

Plants are not static, they respond to environmental stimuli fast and might be even more sensitive than we are, as they are sessile they need to respond fast, as they cannot move.

How well did you know this?

Not at all

Perfectly

Name five types of samples that are suitable for genome sequencing.

Types of samples suitable for genome sequencing:
- Bone
- Tooth
- Nail/claw
- Horn/antler
- Hair
- Skin/tissue
- Pollen & seed
- Eggshell
- Wood
- ”Chewing gums”
- Sediments
- Coprolites

How well did you know this?

Not at all

Perfectly

Name five different sources for samples that are suitable for genome sequencing.

Sources of samples suitable for genome sequencing:
- Museum collections
- Field work/excavations
- caves
- permafrost
- erosion points in rivers.

For all field work: preparation and planning is key, also learning where to look (erosion points for example).

How well did you know this?

Not at all

Perfectly

Is aDNA rare?

Yes! Often, <1% of sequences obtained from ancient samples are endogenous, much of the rest is microbial (”-but might still be interesting!”)

A sample has generally been affected by the environment for a long time after its Best-before date:
- DNA damage (fragmentation & alteration)
- Contamination of non-target organisms - Inhibitory substances

Because of this, we require lab methods that can optimize endogenous DNA recovery, reduce contamination & remove inhibitors.

How well did you know this?

Not at all

Perfectly

There is a big variation in DNA preservation between substrates, what is the most important factor for good preservation? Which substrates are the “holy grail” because of this?

The most important factor for DNA preservation is the density! Because of their extremely high density, petrous bone (the dense, hard part of the temporal bone that houses the inner ear) mainly the cochlea and tooth cementum (outer layer of the root of teeth) are the holy grail, with much higher endogenous content than other bones.

Unfortunately, these are rare to find but when you do its great! Bones in general is fairly good, so no worries if you don’t find petrous bone.

How well did you know this?

Not at all

Perfectly

Why is density so important in DNA preservation?

The more dense, the less exogenous DNA can leech into the substrate.

How well did you know this?

Not at all

Perfectly

Explain the workflow from bone to genome briefly.

To get from bone to genome, there are 6 steps of wet lab (before dry lab):

Remove surface contamination: Either by wiping with bleach (sodium hypochlorite), UV-irradiation or by removing the outer layer with a drill. This ensures that you have a clean surface.
Collect bone material: Drill for powder (faster but risks heat damage so you need to have low speed and stop in-between) or cut off a piece and pulverize (less risk of heat damage). Generally, the higher the drill speed, the less DNA you get.
Pre-treatment: Bleach and or pre-digestion. This step is not always needed, should not be used if not because it’s pretty harsh and damages the DNA - you get less DNA but also much less contamination, so it’s a trade off. You lose complexity = more clones of the same fragments and none of some, so if you are sequencing deeply it is better with low endogenous content and high complexity, but otherwise it’s fine.
DNA extraction: Digest the bone powder with proteinase K and heat to Decalcify bone, digest proteins and fats, which releases the DNA.
DNA purification:
- Binding buffer (acidic, high salt) + Silica = Immobilize DNA on membrane/beads
- Wash buffer (ethanol) = Remove cellular & inorganic remains from the solution
- Elution buffer (basic, low salt) = Release DNA from silica to a final ”pure” solution
So first the DNA is in suspension, then on silica coated beads to finally end up purified in the
column for subsequent analysis.
Amplification & Library build: Adding adapters and then send off for sequencing.

Then you clean and purify, check the size of the library to see that you have not just amplified adaptors, and then send it for sequencing.

How well did you know this?

Not at all

Perfectly

Describe the state of the aDNA after purification. Are there any problems with this state?

Study These Flashcards

aDNA is heavily fragmented and have overhang in the ends of the fragments, and for the bases that are exposed in the overhang (ss) they are much more susceptible to reacting with water.

For example, when Cytosine reacts with water, it gets converted into Uracil (which should not exist in DNA) which leads to uracil being interpreted as thymine by DNA polymerase, leading to an adenine being added on the complementary strand. When this is then amplified, a Thymine is inserted at the position of the original cytosine resulting in a C –> T base substitution. This is a problem, but is also used to authenticate aDNA (as modern DNA don’t have an overrepresentation of T and underrepresentation of C in the ends).

Is there a way to handle the problem with base substitutions in aDNA? When is this useful?

Study These Flashcards

Yes! The solution is using the USER enzyme (Uracil DNA glycosylase + Endonuclease VIII) which cuts out Uracil from the strands –> no damage anymore. But this also means that the C–>T base substitution can’t be used to authenticate anymore. This is for example useful when studying an extinct animal, as there is no possible modern contamination from modern mammoths.

There are 2 categories of aDNA library build, which?

Study These Flashcards

2 categories of aDNA library build:
- Double-stranded: Faster & cheaper
- Single-stranded: More efficient (at least without USER treatment)

Normal kits for library preparation are not suitable for aDNA, as they are not efficient in converting degraded DNA fragments into ready libraries –> Problematic for samples with little starting material! So specific protocols developed (more time consuming, but higher conversion rate).

How does ssDNA and dsDNA libraries differ in terms of library preparation?

Study These Flashcards

For dsDNA libraries, the aDNA damage creates overhanging ends, so they need to be repaired in order to get a complete dsDNA molecule to sequence. To do this, you ligate universal adapters to the 5’-ends and use those to “fill in” or extend the 3’ ends until the whole DNA molecule is double stranded (classic method developed by Meyer & Kircher).

For ssDNA libraries, you can simply denature the strands, and then there is no more overhang as the strands are now separated, so a major benefits is that damaged molecules can be used as templates = higher complexity. After strand separation, adapters are ligated to all ssDNA and then immobilized on magnetic beads. Then the primer can bind and extend and then an adapter is ligated on the other side.

When the adapters are attached, you use amplification primers that bind to the adapters (universal - so you only need one pair) with unique indexes (so that you know which is forward and reverse and which belong to which sample if you pool samples) and now you can amplify the library before sequencing.

If you are only interested in sequencing specific regions of the DNA, what method can you use? How does it work in general?

Study These Flashcards

If you only want to sequence specific regions of the DNA, you can use hybridization capture. Hybridization capture uses baits, which are short sequences that are present in the target DNA, e.g. from a specific species, which are biotinylated. Then you add magnetic beads coated in streptavidin to fish out the target DNA, e.g. mtDNA, Nuclear exomes, SNPs. This increases the endogenous content quite a lot, but since the baits are constructed from high quality modern genomes, you run the risk of not capturing variation that is now lost from the gene pool and risk of geographic misrepresentation.

What is “multiplexing”?

Study These Flashcards

Multiplexing is when you have pooled individually indexed DNA libraries before sequencing. After sequencing you know from which sample a specific sequence is, so then you de-multiplex them bioinformatically before analysis to sort each sample.

When you receive the genome sequencing data back, in what format is it?

Study These Flashcards

The sequence data output is in FASTQ format. Firt line is the index and some practical info and the second line is the sequence.

When getting the sequencing data back, the first thing to to is to check the quality. What three things do we usually look at?

Study These Flashcards

Quality control:

Mean quality score: over 30 is good, QS=30 means that there is 1 error per 10 000 nucleotides. This is usually over 30 across the board.
Per base N content: When the machine gets conflicting base calls, it assigns an N. This is usually very low for most samples today.
Adapter content: Allows you to see approximately how long your target DNA is.

What is done to the sequencing data after quality control?

Study These Flashcards

After quality control, we perform adapter trimming and merging of overlapping reads. After this the QS is usually even higher. After this the reads can be of varying lengths. Here you often plot the read length, usually around 30-45 bp, if very long it is likely due to modern contamination.

Where in a read is the quality often the lowest?

The quality is often the lowest towards the ends.

There are two approaches to assembling a genome from the reads, which?

You can assemble the genome from the sequencing data by: - Mapping to a reference genome: If not extant, use the most closely related genome there is good data for (or use several = variant graph mapping) Several different methods: for example using genome indexing with seeds, small chunks of sequences that an algorithm can check the location of and match it - much faster. - De novo assembly: Recreating the genome just by aligning overlapping sequences, much harder and takes longer but necessary when no or unknown reference (more common for microbes). Harder the bigger the genome.

For sequencing data and genome assembly the terms breadth and depth are used a lot, explain these terms.

The breadth of coverage tells you how much of the genome is covered by the reads. If every base has been sequenced at least once =100% breadth. The depth of coverage tells you how often the bases has been sequenced, with a depth of 1x, each base has been sequenced once on average. The higher the depth the more certain you can be that the bases are correct, so for assembling a genome from scrach you want really high coverage. If you are only looking at specific regions in humans for example 0,1x can be sufficient. The higher the coverage the better, but aDNA is usually not high coverage.

Are all reads usually mapped to the reference genome when working with aDNA?

No, aDNA samples usually have low amounts of target DNA with as lot of microbial DNA, so most reads are thrown out because they are from microbes.

What is indel realignment?

Indel realignment is a different type of alignment you do when you have already mapped reads and thrown out those that do not map. It uses an algorithm that more accurately align reads with gaps in relation to the reference genome, and in the final alignment it fills in gaps using all reads.

How can you tell the difference between good and poor library complexity?

If the complexity is poor, there will be many duplicate reads mapping to the exact same sequence with a lot of gaps in between. If the sample has good complexity, the sequences will overlap and cover more of the genome without being exact duplicates.

How can we assess the post mortem damage pattern from the reads?

If we plot how many bases that are different in the reads vs the reference at the ends of the reads, we see a pattern that they differ more frequently in the ends of the reads (where the ends are often ss and bases are exposed to possible post mortem damage), which result in a "smiley plot" - so if you see the slime shaped plot, it is evidence that the sequenced DNA is ancient.

What is important to account for when using different algorithms for mapping?

It is important to keep in mind that different algorithms have their respective biases, for example the calculations used are based on different thresholds of allowed mismatches, and the longer the read, the more mismatches are allowed. Some algorithms are better for specific read lengths than others, so choose wisely.

There are three methods used in bioinformatics that are used to avoid including contaminating sequences during genome assembly, which?

Methods that are used to avoid including contaminating sequences during genome asembly are: - Bacterial filtering: Checking the reads against a database of bacterial sequence data. often used to remove bacterial DNA which might otherwise map to the target genome, which avoids accidental mapping to reference genome (which can happen for short reads). - Competitive mapping: By mapping reads to sequences that are conserved between human and reference species (e.g. elephant for mammoths) and removing those from the assembly, you avoid the risk of modern contamination. - Filtering for post mortem damage: There are algorithms that checks how many mutations there are at the ends of reads and assign them with a PMD score. The higher the PMD score - the older the sample. By only working on reads that have positive PMD scores, you avoid modern contamination.

Explain the term SNP calling.

SNP calling allows you to look at sites where most reads show a different base than in the reference genome. The higher the depth of coverage, the more sure you can be that the difference is true and not just a seq error. This also allows you to see indels and heterozygous sites: If there is two different ones at the site that both differ from the reference, it is likely a heterozygous site.

How can SNP calling outputs be useful?

By looking at combinations of SNP variants, one can infer how closely related they are to another group with a certain set of SNP variants. More similar SNPS = more closely related.

Plant diversity and from bone to genome Flashcards

(35 cards)