Lecture 6 Flashcards
(41 cards)
What is the phenetic approach based on?
sequences with short pairwise distances cluster
What is the cladistic approach based on? what is the benefit compared to phonetic?
sequences with many shared characters cluster, therefore evolutionary process is accounted for implicitly, and it doesnt rely entirely on pairwise distance matrices, and correlations which may not have been accounted in the phonetic approach are accounted for
Cladistic approach uses —-, which finds the tree needing —?
parsimony, smallest number of mutations
What is the parsimony score of a tree?
The lowest number of mutations required to explain the sequences at the tips of the tree
What is the parsimony tree?
the tree with lowest parsimony score : an optimisation problem
T or F : Rooted trees obtained from the same unrooted tree dont have the same parsimony score.
False, they do
What does the parsimony method do?
For n sequences with length m, it considers each unrooted tree ((2n-5)!! trees), and it calculates the parsimony score for each of these trees (4^(n-1)*m), and it finally outputs the unrooted tree with the lowest score.
How can we improve the second step of the parsimony method?
using the fitch algorithm, we pick a cherry, we ask if they first nucleotides are the same, if they are not the same we write down the set containing both nucleotides. if yes, we dont update. if the sets aren’t disjoint, we write down the intersection. Therefore, the parmisony score is the minimal number of mutations required to explain the sequences at the tips.
How many internal nodes, does a rooted tree on n tips have?
n-1
What is the running time of the fitch algorithm?
(n-1)m
How is the parsimony tree found in the fitch algorithm ?
It’s found by calculating parsimony score for each unrooted tree
The parsimony decision problem is an — problem
NP complete
Is the parsimony problem statistically consistent or not?
It’s not since no back substitutions or parallel substitution are considered which lead to long branch attraction
How was the origin of HIV required?
incidence data was used which gives impression of the dynamics since the data was collected. virus sequencing data from different host species allows us to infer the phylogenetic tree informing before 1980. ML phylogenetic tree inference was used to investigate early HIV
What are the input and outputs for the ML tree inference?
Input is the sequence alignment. Output is the tree which maximises the probablity of the sequences given the tree and the sequence evolution parameters
The ML tree inference requires an —- model and the parameters of the model can be—–?
evolutionary, co-estimated
In ML in phylogenetic, what is the parameter?
Each unrooted tree with branch lengths
What do sequences in phylogenetic in ML evolve according to ?
they evolve according to the parameters provided in the rate Q matrix
What is the inference in the ML method?
determine the best unrooted tree, parameter which best explain the alignment max L(tau, Q;D) where D is the sequence alignments
Is the substitution process typically time reversible or not?
yes
What is the running time of the likelihood calculation? how did you get it?
- multiply over all sites O(m)
- sum over internal nucleotides at n-1 internal nodes (O(4^n-1))
- multiply over 2n-2 branches O(2n-2)
so over all: O(m4^nn
How can we improve the ML calculation?
Using felestein’s algorithm
What is the time complexity of felsentein’s pruning algorithm? explain how
Each recursion step is summation over four time four states : O(n)
-the recursion needs to be performed for each of m sites O(m)
so in total : O(nm)
The problem of finding a tree and branch lengths with likelihood value >/L is —.
NP complete