# Molecular Phylogenetics Flashcards

What do these mean? Taxa Clades Branches Nodes Roots

Entities being compared Groups of taxa sharing a common ancestor Reflecting evolutionary change Points where branches meet Oldest point on the tree

What are the 4 aspects of a tree?

Topology (branching order)

Branch lengths (indication of genetic change)

Root (oldest point on tree)

Confidence (bootstraps/probabilities)

What models of sequence evolution are there?

Jukes & Cantor Model: assumes all nucleotides equally frequent and all changes equally probable, K=-0.75ln(1-4d/3)

Problem: not all changes equally likely, some bases more likely and diff rates of substitution

Kimura 2-parameter model: Allows different rates of transitions and transversions, higher rate between C & T, K=-0.5ln[(1-2p-q).(1-sq)^0.5]

Tamura-Nei model: allows different rates of transitions (A G), & of transitions (C T), & of transversions, & allows unequal base composition

How do rates vary in molecular evolution?

Rates vary among genomes - should always use sequences from the same genome to calculate distances

Rates vary among proteins - should always use same gene/protein to calculate distances, should also use same part for all species

Rates vary among lineages - rate constancy assumed by UPGMA not a safe assumption

What is the maximum parsimony method?

‘Cladistic’ method

Starts from a set of variable character states and aims to find tree with smallest number of character state changes

Only uses ‘informative’ sites

Makes an unrooted tree, and may be more than 1 equally maximally parsimonious trees

Not good estimates of branch lengths

What is the UPGMA (unweighted pair-group method with arithmetic means)?

‘Phenetic method

Starts from a matrix of pairwise distances among taxa

Assumes perfect molecular clock

Proceeds by progressively clustering taxa with shortest distances

Doesn’t evaluate all possible trees

Produces tree rooted at midpoint

What is the Neighbour-Joining (NJ) method?

Starts from pairwise distance matrix

Minimum evolution tree (shortest total branch length)

Evaluate all possible trees or take a short cut

Start from a star tree and try all possible positions for a new branch, each time: calculate branch lengths, sum for total tree branch length, choose tree with smallest total length

Fast - good for large data sets

Good at recovering the true tree

What is the maximum Likelihood method?

Need model of sequence evolution, need a criterion/set of criteria to choose between alternate trees, evaluate all possible trees

Allows complex models of sequence evolution

Formally evaluates different possible trees

Computer-intensive

For every possible tree consider probability: at each site in the alignment, of each possible nucleotide character state for ancestral nodes

Take product of all of those probabilities as the likelihood value for that tree

Choose tree with highest (log) likelihood

How do you do bootstrapping?

Construct a pseudo-replicate alignment:

- randomly sample sites from the real alignment

- sample with replacement

- until same length as real alignment

Make a tree using the same method

Repeat many times

Record how often each partition (= internal branch) occurs across pseudoreplicates

Why use bootstrapping?

Estimate of how consistent the phylogenetic ‘signal’ is along the alignment

Longer branches likely to have higher values

Values around 75% (or higher) generally taken as ‘meaningful’

What problems can occur with phylogenetic trees?

Long branch attraction

Outgroups

What is long branch attraction?

Unequal rates of evolution causes rapidly evolving lineages are inferred to be closely related, regardless of their true evolutionary relationships

Usually in maximum parsinomy

What are some examples of long branch attraction causing problems?

Herpes virus evolution: tend to co-evolve with hosts, genes evolve ~10 x faster than mammalian genes, occasionally acquire extra genes from host genome

Long branch attraction made it seem the origin of the BoHV-4 Bo17 gene not from buffalo

Why are outgroups used?

Midpoint rooting - could fail with unequal rates of evolution

Outgroups useful to root trees

(All good phylogenetic methods produce unrooted trees)

An outgroup: Should be as close as possible to the other species, because a distant outgroup may not find the root of the other species (long branch attraction, or other problems)

But a very close outgroup may not be the outgroup?