Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards

1
Q

What gives us an alignment and a score for similarity for an entire sequence?

A

global pairwise alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What gives us the alignment and score for parts of sequences?

A

local pairwise alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can precisely indicate interesting residues of nucleotides and amino acids?

A

Multiple sequence alignment MSA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does MSA depend on?

A

accuracy in pairwise alignment - which depends on scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the desirable features of a scoring matrix?

A

-we can think in terms of mutation and selection
-should we think about these differently - nucleotides and amino acids and what properties matter the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If you start with a high confidence alignment what do you get?

A

-have no gaps or spaces
-hopefully see very few mismatches
-can call these sequences related sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate the ways to choose k elements from a set of n elements?

A

nCk = (n!)/k!(n-k)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you move from substitution counts to probabilities?

A

take the number of pairs of nucleotides in column, and multiple by the number of columns and divide the counts by that calculated product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which exons are important?

A

the first and last exons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens to the part of a gene after a stop codon?

A

it will be part of the mRNA post splicing but will not be expressed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why make a substitution scoring matrix?

A

had to do this because there was not way to make databases of genes to a single nucelotide of DNA position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why could you have a long untranslated part of a gene?

A

tRNA polymerase not starting at some position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens more frequently and has less of an effect on function and is less penalized in a scoring matrix than translation?

A

transition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a real scoring matrix why are values scaled so that the highest entry is 100?

A

makes things easier to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do amino acids affect protein structure?

A

hydrophobic residues go inside and hydrophilic outside which affects shape and not all parts of the protein are important; need to pay attention to H bonding, acidic, basic, polar, nonpolar; will amino acids be able to same role in chemical sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If you have two unreleated sequences if they are i.i.d than the pN is what which means the expected number of matches is what?

A

pN=0.25; 1/4 of the sequences is matches

17
Q

What is the null hypothesis for two sequences S1 and S2?

A

S1 and S2 have no more similarity than expected by chance

18
Q

What is the alternative hypothesis for two sequences S1 and S2?

A

S1 and S2 seem related more than similar than expected by chance

19
Q

What is testing hypotheses equivalent to?

A

comparing models; allows us to compare two models which describe relationships between two factors

20
Q

What is the probability for twp sequences by chance under Ho?

A
21
Q

What is the probability for twp sequences by chance under H1 or alternative hypothesis?

A
22
Q

What is the likelihood ratio and what does its value represent?

A

that the sequence is 5X more likely to have arisen from our related model than our unrelated model

23
Q

What does it mean if our starting data is symmetric?

A

-no species are ancestors of others
-substitutions are not all symmetric in their biological rates - dinucleotides are not in equilibrium

24
Q

How did we use our original scoring scheme?

A

-add scores corresponding to different alignment positions

25
Q

In the original scoring scheme what score were good and bad positions given?

A

good positions - positive score
bad positions - negative score

26
Q

What is mu or u in the original scoring scheme?

A

the relative weight of matches and mismatches

27
Q

How do you factor the likelihood ratio to emphasize individual positions?

A
28
Q

Is any information lost by taking the log of likelihood ratios?

A

No information is lost by taking the log of likelihood ratios

29
Q

Why is it better to add logs than multiply ratios of prababilities?

A

it is easier computationally for computer and humans

30
Q

What is the log likelihood scoring scheme?

A

just take the log of each term in the matrix

31
Q

What does a score of more than zero mean for the log value?

A

is a high confidence alignment

32
Q

Should match scores always be equal?

A

no they do not have to be

33
Q

Should there only be on scoring matrix?

A

no because it depends on if we have different species and the rates and types of changes vary between different species over generations due to evolution

34
Q

Can we make different scoring matrices for different situations?

A

yes you can begin with a high confidence alignment which corresponds to different time periods

35
Q

What can we use a scoring matrix to get?

A

pairwise alignment

36
Q

What can we use a pairwise alignment to get?

A

a MSA or multiple sequence alignment

37
Q

What do we use out MSA or multiple sequence alignment as the basis of?

A

a scoring matrix

38
Q

What is the function inference using sequence similarity?

A

(1)it works very well and (2) we can have a problem of drift in biases and (3) if it is recognized and persists it maybe inherent to genomics

39
Q
A