Bioinformatics - Final Exam Content Flashcards
(232 cards)
What are the differences between substitution models?
the substitution changes based on what parameters you include, simplest models include just the number of substitutions (hamming distance), others correct for unobserved mutations, some may characterize transitions vs transversions differently, others may have proportions of invariable sites and gamma distributions, differences between models result from what parameters each model includes
What parameters are included in substitution models?
- transitions vs transversions
- hamming distance
- jukes and cantor distance (correcting for unobserved mutations)
- equal/unequal base frequencies
- proportion of invariable sites
- gamma distributed rate variation among sites
How do you find the best substitution model?
- the best thing to do is test ALL models and find the one that best fits your sequence data, this is done under the maximum likelihood framework, based mostly on lowest BIC and highest AIC values
- after all of this is determined you also want to include bootstrap analysis
What are the steps to finding the best Tree?
- do a tree search under each model
- calculate the maximum likelihood score of the best tree for each model
- compare them using BIC or AIC scores, which are estimators of relative quality of statistical models
How do phylogenetic approaches provide insight on evolution?
phylogeny - compare phylogenies to biogeography and major paleoecological events
evolutionary processes - pattern heterogeneity and selection ratios (dN/dS)
How do we use the Disparity Index (I) to estimate pattern heterogeneity?
- a common WRONG assumption is that sequences evolve in homogeneity (same conditions and processes)
- we know that sequence evolve differently based on locations and pressures
- we measure pattern heterogeneity via the disparity index
- the disparity index identifies pairs of sequences that evolved under substantially different evolutionary processes
What is the basis for dN/dS ratio tests?
it is a means to test if selection is occuring, substitution rate outliers will include sequences which affect an organism’s ability to survive and reproduce, substitution patterns reflect selection and dN/dS is the best thing we have for this
How do you interpret I (disparity index) statistics?
I = 0 means the sequences evolved under the same processes and pressures
I > 0 means the sequences evolved under different processes and pressures
how do you interpret dN/dS statistics?
dN/dS = 1 : neutral not undergoing selection
dN/dS > 1 : positive selection so a mutation made that is beneficial
dN/dS < 1 : purifying selection so a mutation change is bad and these will lead to fixed sites
Transition
a change from an A to G or C to T
- in other words these are substitutions which are more likely to happen because we are not changing from purine to pyrimidine or vice versa
Transversion
a change from A>C, A<T, G<C, G<C
- these are substitutions which happen less frequently and are more serious because it is change from purine to pyrimidine or vice versa
Hamming Distance ( Dh)
- the simplest approach to modeling substitutions, it counts the number of difference, this is differences divided by length
- Dh = n / N
- n is the number sites which are different
- N is the length of the alignment
Jukes and Cantor (1969)
- a model for distance of substitutions which corrects for unobserved mutations
- Djc1969 = (-3/4)ln(1-4/3p)
- p = the proportion of sites which differ between sequences
Distance (phylogenetic tree sense)
essentially it is based on how different sequences in the alignment are taking into account the differences or substitutions which have occurred
Proportion of invariable sites (I)
- a parameter to significantly improve models
- (I) is the extent of static, unchanging site in the dataset
gamma distribution (G)
- a parameter to significantly improve models
- indicates a gamma distributed rate variation among sites
BIC value
bayesian information criteria (lowest scored model is best)
AIC value
akaike information criteria (highest scored model is best)
pattern heterogeneity
if two sequences evolved under the same processes their nucleotide composition will be similar, however if they evolved under separate pressures their nucleotide composition will reflect that
dN/dS ratio
- a highly important and common approach for testing if selection has occurred
- nonsynonymous subs per site / synonymous subs per site
- = 1 : neutral not undergoing selection
- > 1 : positive selection so a mutation made that is beneficial
- <1 : purifying selection so a mutation change is bad and these will lead to fixed sites
disparity index (I)
the observed difference in evolutionary patterns for a pair of sequences based on nucleotide composition
- I = 1/2 summation (xi - yi) squared - Nd
- xi = composition of ith nucleotide
- yi = composition of ith nucleotide
- Nd = composition of distance under homogeneity
values associated w disparity index:
I = 0 -> same evolutionary pressures
I > 0 -> different evolutionary pressures
neutral theory of molecular evolution (Kimura 1968)
- most mutations are neutral or “nearly neutral
- it is a basic principle that differences in fecundity lead to natural selection and fixation of mutations
- substitution pattern reflect selection
synonymous
- sub where the amino acid will stay the same
- more likely to be neutral
nonsynonymous
- sub where the amino acid will change
- more likely to change phenotype
- positive selection may result from a beneficial change in phenotype