Lecture 9 Flashcards
Substitution
F -> S
Changes shape of surface of protein
Problem with looking at two alleles is you don’t know which was the original
Ancestry of descent
Not linear
Different descendants with originate from the same common ancestor which have different evolved protein sequences across multiple generations
Attempting to identify tree structure
For example:
True ancestor:
FCYGQLVFTVKEAA
Inferred ancestor:
S A
FWYGRLVFTVKEAA
Descendants:
FWYGRLVFTVKAAA
FWYGRLVSTVKEAA
Difficult to identify true ancestor via tracing
Class 1 GPCRs
Total: 811
523 were olfactory GPCRs which were identified by evolutionary relationships
Alignment
Begin with unaligned protein sequence which are later aligned
Alignments can be completed using either substitution matrices or percentage identities (number of differing positions)
Distance matrix
After alignment, a distance matrix is formed between the different species on the tree with their genetic distances e.g. 0.074 substitutions per site between pig and cow vs 0.618 between pig and virus
p-distance i.e. percentage difference used to calculate differences in genetic alignment between organisms e.g. 0.074 would mean 7.4% of the genetic sequence is different between cow and pig
P-distance = 1 - % identity
Blosum-62 substitution matrix
- Common substitutions score high (positive values)
- Rare substitutions score negatively
For example, alanine -> alanine gives a score of 4
R->R gives score of 5
A with R gives score of -1
What forms the closest pair
Species with the smallest p-distance
Then next animal with next smallest p-distance added and so on
This particular method is called UPGMA (Unweighted pair group method with Arithmetic Mean)
Other types of methods can use median or weighted methods
Bootstrapping
Percentage confidence of how reliable or confident we are about part of a phylogenetic tree
High values (e.g., > 70%) → Strong support for the branch/clade.
Moderate values (50–70%) → Moderate support (needs caution).
Low values (< 50%) → Weak or unreliable support.
Process of bootstrapping
Bootstrapsupport
Start with multiple sequence alignment (MSA).
Resample the columns randomly with replacement to create a new “pseudo-alignment” the same size as the original.
“With replacement” means some columns can appear multiple times, while others might not appear at all.
Rebuild a phylogenetic tree using this pseudo-alignment.
Repeat this process many times (e.g., 100, 500, 1000 times).
For each branch (clade) in the original tree, count how many bootstrap trees contain the same group.
Assign a bootstrap value (support value) to each branch:
(
Numberoftimescladeappears
Totalnumberofreplicates
)
×
100
Bootstrapsupport=(
Totalnumberofreplicates
Numberoftimescladeappears
)×100
Example: if a clade appears in 950 out of 1000 replicates, its bootstrap support is 95%.
Summary of tree building
- Start with an alignment
- Alignment is based upon some kind of distance measure e.g. p-distance which represents substitution percentage identity between species OR substitution matrices like blosum-62 which gives score based upon how common/rare a substitution is
Tree-building:
- Simple: UPGMA, neighbour joining
- Parsimony-based: Maximum parsimony, minimum evolution - produces lots of trees and selects best one
- Statistical: Maximum likelihood, Bayesian - creates one tree and attempts to improve
- Can be rooted or unrooted
- Conduct bootstrapping/stastical conference measure (Bayesian)
Dating trees
Time Tree
Calibrate tree
Example:
63.1 million yrs ago for common ancestor of pigs and cows
Calibrate the tree to identify divergence
Results:
Mammals-Fish - 400.1M years ago
Pig-Cow - 61.3M yrs ago
Molecular Clock to identify divergence using Pig-Cow example
How many changes between two species
Divide by 2, then by length of alignment - substitutions per site
Divide by 63 - substitutions per site per million years
Divide by substitutions per site per million years - date for next node
Why may molecular clock not be exactly correct?
- Alignment may be (partially) incorrect
- Insufficient information to calibrate molecular clock
- Molecular clock may not be very regular in gene
- Wrong substitution model
- Choose carefully, align well, parametrise carefully, cross fingers
Parsimony’s maximum likelihood methods
Obtain a multiple sequence alignment (MSA)
Align your DNA, RNA, or protein sequences.
List all possible tree topologies
Draw all possible unrooted trees for your species.
(Example: 4 species → 3 unrooted trees.)
For each tree, evaluate the number of changes
For each site (column), find the minimum number of evolutionary changes needed.
Use methods like Fitch’s algorithm to do this efficiently.
Sum the changes across all sites
Add up the total number of changes for each tree.
Select the tree with the fewest total changes
The tree requiring the least number of steps is the Maximum Parsimony tree.
If more than one tree has the same number, they are equally parsimonious.
(Optional) Use software for larger datasets
Tools like MEGA, PAUP*, PHYLIP, or TNT can automate tree building.