Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards

Question 1

Q

What gives us an alignment and a score for similarity for an entire sequence?

Answer

A

global pairwise alignment

Question 2

Q

What gives us the alignment and score for parts of sequences?

Answer

A

local pairwise alignment

Question 3

Q

What can precisely indicate interesting residues of nucleotides and amino acids?

Answer

A

Multiple sequence alignment MSA

Question 4

Q

What does MSA depend on?

Answer

A

accuracy in pairwise alignment - which depends on scoring

Question 5

Q

What are the desirable features of a scoring matrix?

Answer

A

-we can think in terms of mutation and selection
-should we think about these differently - nucleotides and amino acids and what properties matter the most

Question 6

Q

If you start with a high confidence alignment what do you get?

Answer

A

-have no gaps or spaces
-hopefully see very few mismatches
-can call these sequences related sequences

Question 7

Q

How do you calculate the ways to choose k elements from a set of n elements?

Answer

A

nCk = (n!)/k!(n-k)!

Question 8

Q

How do you move from substitution counts to probabilities?

Answer

A

take the number of pairs of nucleotides in column, and multiple by the number of columns and divide the counts by that calculated product

Question 9

Q

Which exons are important?

Answer

A

the first and last exons

Question 10

Q

What happens to the part of a gene after a stop codon?

Answer

A

it will be part of the mRNA post splicing but will not be expressed

Question 11

Q

Why make a substitution scoring matrix?

Answer

A

had to do this because there was not way to make databases of genes to a single nucelotide of DNA position

Question 12

Q

Why could you have a long untranslated part of a gene?

Answer

A

tRNA polymerase not starting at some position

Question 13

Q

What happens more frequently and has less of an effect on function and is less penalized in a scoring matrix than translation?

Answer

A

transition

Question 14

Q

In a real scoring matrix why are values scaled so that the highest entry is 100?

Answer

A

makes things easier to calculate

Question 15

Q

How do amino acids affect protein structure?

Answer

A

hydrophobic residues go inside and hydrophilic outside which affects shape and not all parts of the protein are important; need to pay attention to H bonding, acidic, basic, polar, nonpolar; will amino acids be able to same role in chemical sense

Question 16

Q

If you have two unreleated sequences if they are i.i.d than the pN is what which means the expected number of matches is what?

Answer

A

pN=0.25; 1/4 of the sequences is matches

Question 17

Q

What is the null hypothesis for two sequences S1 and S2?

Answer

A

S1 and S2 have no more similarity than expected by chance

Question 18

Q

What is the alternative hypothesis for two sequences S1 and S2?

Answer

A

S1 and S2 seem related more than similar than expected by chance

Question 19

Q

What is testing hypotheses equivalent to?

Answer

A

comparing models; allows us to compare two models which describe relationships between two factors

Question 20

Q

What is the probability for twp sequences by chance under Ho?

Question 21

Q

What is the probability for twp sequences by chance under H1 or alternative hypothesis?

Question 22

Q

What is the likelihood ratio and what does its value represent?

Answer

A

that the sequence is 5X more likely to have arisen from our related model than our unrelated model

Question 23

Q

What does it mean if our starting data is symmetric?

Answer

A

-no species are ancestors of others
-substitutions are not all symmetric in their biological rates - dinucleotides are not in equilibrium

Question 24

Q

How did we use our original scoring scheme?

Answer

A

-add scores corresponding to different alignment positions

Question 25

Q

In the original scoring scheme what score were good and bad positions given?

Answer

A

good positions - positive score
bad positions - negative score

Question 26

Q

What is mu or u in the original scoring scheme?

Answer

A

the relative weight of matches and mismatches

Question 27

Q

How do you factor the likelihood ratio to emphasize individual positions?

Question 28

Q

Is any information lost by taking the log of likelihood ratios?

Answer

A

No information is lost by taking the log of likelihood ratios

Question 29

Q

Why is it better to add logs than multiply ratios of prababilities?

Answer

A

it is easier computationally for computer and humans

Question 30

Q

What is the log likelihood scoring scheme?

Answer

A

just take the log of each term in the matrix

Question 31

Q

What does a score of more than zero mean for the log value?

Answer

A

is a high confidence alignment

Question 32

Q

Should match scores always be equal?

Answer

A

no they do not have to be

Question 33

Q

Should there only be on scoring matrix?

Answer

A

no because it depends on if we have different species and the rates and types of changes vary between different species over generations due to evolution

Question 34

Q

Can we make different scoring matrices for different situations?

Answer

A

yes you can begin with a high confidence alignment which corresponds to different time periods

Question 35

Q

What can we use a scoring matrix to get?

Answer

A

pairwise alignment

Question 36

Q

What can we use a pairwise alignment to get?

Answer

A

a MSA or multiple sequence alignment

Question 37

Q

What do we use out MSA or multiple sequence alignment as the basis of?

Answer

A

a scoring matrix

Question 38

Q

What is the function inference using sequence similarity?

Answer

A

(1)it works very well and (2) we can have a problem of drift in biases and (3) if it is recognized and persists it maybe inherent to genomics

Question 39

Q

Brainscape's Knowledge GenomeTM

Part 2 - Lecture 5/6 - Substitution Scoring Matrices Flashcards

Brainscape's Knowledge Genome^TM