Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards
(39 cards)
What gives us an alignment and a score for similarity for an entire sequence?
global pairwise alignment
What gives us the alignment and score for parts of sequences?
local pairwise alignment
What can precisely indicate interesting residues of nucleotides and amino acids?
Multiple sequence alignment MSA
What does MSA depend on?
accuracy in pairwise alignment - which depends on scoring
What are the desirable features of a scoring matrix?
-we can think in terms of mutation and selection
-should we think about these differently - nucleotides and amino acids and what properties matter the most
If you start with a high confidence alignment what do you get?
-have no gaps or spaces
-hopefully see very few mismatches
-can call these sequences related sequences
How do you calculate the ways to choose k elements from a set of n elements?
nCk = (n!)/k!(n-k)!
How do you move from substitution counts to probabilities?
take the number of pairs of nucleotides in column, and multiple by the number of columns and divide the counts by that calculated product
Which exons are important?
the first and last exons
What happens to the part of a gene after a stop codon?
it will be part of the mRNA post splicing but will not be expressed
Why make a substitution scoring matrix?
had to do this because there was not way to make databases of genes to a single nucelotide of DNA position
Why could you have a long untranslated part of a gene?
tRNA polymerase not starting at some position
What happens more frequently and has less of an effect on function and is less penalized in a scoring matrix than translation?
transition
In a real scoring matrix why are values scaled so that the highest entry is 100?
makes things easier to calculate
How do amino acids affect protein structure?
hydrophobic residues go inside and hydrophilic outside which affects shape and not all parts of the protein are important; need to pay attention to H bonding, acidic, basic, polar, nonpolar; will amino acids be able to same role in chemical sense
If you have two unreleated sequences if they are i.i.d than the pN is what which means the expected number of matches is what?
pN=0.25; 1/4 of the sequences is matches
What is the null hypothesis for two sequences S1 and S2?
S1 and S2 have no more similarity than expected by chance
What is the alternative hypothesis for two sequences S1 and S2?
S1 and S2 seem related more than similar than expected by chance
What is testing hypotheses equivalent to?
comparing models; allows us to compare two models which describe relationships between two factors
What is the probability for twp sequences by chance under Ho?
What is the probability for twp sequences by chance under H1 or alternative hypothesis?
What is the likelihood ratio and what does its value represent?
that the sequence is 5X more likely to have arisen from our related model than our unrelated model
What does it mean if our starting data is symmetric?
-no species are ancestors of others
-substitutions are not all symmetric in their biological rates - dinucleotides are not in equilibrium
How did we use our original scoring scheme?
-add scores corresponding to different alignment positions