Lecture 10 Flashcards
Gamma distribution and alpha parameters
alpha determines the shape of the gamma distribution and how peaked/spread out it is
Gamma curve shows how variability is distributed
Alpha is almost universally and incorrectly referred to as gamma as some popular software allows you to choose your value of alpha from a dropdown menu
In alignment, not all sites are equally variable
Relaxed vs strict clocks
Strict clocks - substitution rate (subs/site/year) is the same on all branches of the tree at all times
Relaxed clock - required when subs, site or year have been infringed - allow for distribution variation over time and across branches
- lognormal
- exponential
- random
Rival phylogenetic methods
UPGMA
Minimum evolution - additive method, looks for minimum total tree length
Neighbour joining - Optimises tree length at local rather than global level
Maximum parsimony - looks for minimum number of substitutions
- maximum likelihood looks for most probable tree given a model of evolution
- Bayesian trees - applies Bayesian methods to probability problem
Bayesian method
- What is the most probable model that explains the data given in observed evidence
- Aims for max probability if data given in a series of competing models
Posterior probability p(data|model) = likelihood p(model|data) x prior probability p(data)/Marginal likelihood p(model)
Priors: things we do not calculate but assume:
- Inherent probability of data
- Inherent probability of model
Posteriors: Based on priors we calculate:
- Probability of model given data
- Probability of data given model
Bayesian method example
Queen Victoria had haemophilliac son and 3 daughters who were confirmed carriers
No incidence of haemophilia in royal family before Victoria
Two theories:
- Victoria was new mutation for gene
- Victoria was illegitimate daughter of Sir John Conroy
Bayes theorem in context of Queen Victoria
π(ππ’π‘πππ‘|πππππππ)=(π(πππππππβππ’π‘πππ‘)π(ππ’π‘πππ‘))/(π(πππππππ))
π(ππ’π‘πππ‘|πππππππ)=(1 β 0.00012)/0.0004 = 0.3
π(πππ π‘πππ|πππππππ)=(π(πππππππβπππ π‘πππ)π(πππ π‘πππ))/(π(πππππππ))
π(πππ π‘πππ|πππππππ)=(~0.0004 β0.06)/0.0004 = 0.06
Higher probability she was a mutant than illegitimate
Is Queen Victoriaβs undoubted carrier status the only data we have
No - if Victoria had haemophilia from Sir John Conroy, we would except him to be a haemophiliac
What proportion of male carriers are asymptomatic? - None, but 30% are mild and Conroy lived to 67 and was an army officer
Application of bayesian theory to Phylogenetics
Data is alignment you upload, plus any other fixed parameters you add e.g. tip dates, collection locations
Model is what you select from various menus in BEAST e.g. clocks
What can a phylogenetic model consist of?
Substitution matrix e.g. JTT, BLOSUM, GTR, TN93
Clock model e.g. fixed, random, strict
Population model e.g. static, fast growing, fluctuating
Any other prior e.g. substitution rate, kappa, alpha parameter, initial phylogenetic tree
Application of Bayesian theory
BEAST takes model parameters and uses equation
BEAST subtly adjusts parameters randomly and compares
Iterates repeatedly, working from model that gives best p(data|model) at each stage
Hill-climbing algorithm - strives to get better p(data|model)
Application of Bayesian theory to phylogenetics
BEAST may converge, meaning p(data|model) does not improve, and indeed must do so if you are to believe results
Presents posteriors - optimised value of each parameter, including optimised tree, usually derived as consensus tree produced in iteration after convergence
You have best p(data|model) but what about p(model|data)
Obtained by running various starting point prior models
Compare posterior probabilities in Tracer e.g.
strict vs relaxed
Constant size vs exponential growth
Substitution model vs another