Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards

(39 cards)

1
Q

What gives us an alignment and a score for similarity for an entire sequence?

A

global pairwise alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What gives us the alignment and score for parts of sequences?

A

local pairwise alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can precisely indicate interesting residues of nucleotides and amino acids?

A

Multiple sequence alignment MSA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does MSA depend on?

A

accuracy in pairwise alignment - which depends on scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the desirable features of a scoring matrix?

A

-we can think in terms of mutation and selection
-should we think about these differently - nucleotides and amino acids and what properties matter the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If you start with a high confidence alignment what do you get?

A

-have no gaps or spaces
-hopefully see very few mismatches
-can call these sequences related sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate the ways to choose k elements from a set of n elements?

A

nCk = (n!)/k!(n-k)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you move from substitution counts to probabilities?

A

take the number of pairs of nucleotides in column, and multiple by the number of columns and divide the counts by that calculated product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which exons are important?

A

the first and last exons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens to the part of a gene after a stop codon?

A

it will be part of the mRNA post splicing but will not be expressed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why make a substitution scoring matrix?

A

had to do this because there was not way to make databases of genes to a single nucelotide of DNA position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why could you have a long untranslated part of a gene?

A

tRNA polymerase not starting at some position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens more frequently and has less of an effect on function and is less penalized in a scoring matrix than translation?

A

transition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a real scoring matrix why are values scaled so that the highest entry is 100?

A

makes things easier to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do amino acids affect protein structure?

A

hydrophobic residues go inside and hydrophilic outside which affects shape and not all parts of the protein are important; need to pay attention to H bonding, acidic, basic, polar, nonpolar; will amino acids be able to same role in chemical sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If you have two unreleated sequences if they are i.i.d than the pN is what which means the expected number of matches is what?

A

pN=0.25; 1/4 of the sequences is matches

17
Q

What is the null hypothesis for two sequences S1 and S2?

A

S1 and S2 have no more similarity than expected by chance

18
Q

What is the alternative hypothesis for two sequences S1 and S2?

A

S1 and S2 seem related more than similar than expected by chance

19
Q

What is testing hypotheses equivalent to?

A

comparing models; allows us to compare two models which describe relationships between two factors

20
Q

What is the probability for twp sequences by chance under Ho?

21
Q

What is the probability for twp sequences by chance under H1 or alternative hypothesis?

22
Q

What is the likelihood ratio and what does its value represent?

A

that the sequence is 5X more likely to have arisen from our related model than our unrelated model

23
Q

What does it mean if our starting data is symmetric?

A

-no species are ancestors of others
-substitutions are not all symmetric in their biological rates - dinucleotides are not in equilibrium

24
Q

How did we use our original scoring scheme?

A

-add scores corresponding to different alignment positions

25
In the original scoring scheme what score were good and bad positions given?
good positions - positive score bad positions - negative score
26
What is mu or u in the original scoring scheme?
the relative weight of matches and mismatches
27
How do you factor the likelihood ratio to emphasize individual positions?
28
Is any information lost by taking the log of likelihood ratios?
No information is lost by taking the log of likelihood ratios
29
Why is it better to add logs than multiply ratios of prababilities?
it is easier computationally for computer and humans
30
What is the log likelihood scoring scheme?
just take the log of each term in the matrix
31
What does a score of more than zero mean for the log value?
is a high confidence alignment
32
Should match scores always be equal?
no they do not have to be
33
Should there only be on scoring matrix?
no because it depends on if we have different species and the rates and types of changes vary between different species over generations due to evolution
34
Can we make different scoring matrices for different situations?
yes you can begin with a high confidence alignment which corresponds to different time periods
35
What can we use a scoring matrix to get?
pairwise alignment
36
What can we use a pairwise alignment to get?
a MSA or multiple sequence alignment
37
What do we use out MSA or multiple sequence alignment as the basis of?
a scoring matrix
38
What is the function inference using sequence similarity?
(1)it works very well and (2) we can have a problem of drift in biases and (3) if it is recognized and persists it maybe inherent to genomics
39