Part 2 - Lecture 1 - Global Pairwise Sequence Alignment Flashcards

(38 cards)

1
Q

In global pairwise sequence alignment what does the sequence mean?

A

could be RNA or Amino acids or DNA (ACGT) - alphabet assumed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four criteria to define alignment?

A
  1. one sequence is positioned above the other
  2. spaces may be inserted into the sequences
  3. spaces may not appear on top of each other
    -after inserting spaces the sequences must have the same length
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does global mean in global pairwise sequence alignment?

A

that we are aligning the entire sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does pairwise mean in global pairwise sequence alignment?

A

we restrict our attention to 2 sequences at a time (in other methods there can be more sequences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Out of these four examples which are alignments and which are not?

A

-all are alignments except for the top one in the left hand corner cause there are spaces on top of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should alignments reveal?

A

biological relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why might we align sequences?

A

-Do known sequences align well with ours? - check if we discovered a new gene
-What about parts that do not align at all?
-Can gather biological and evolutionary insights from parts that align well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is sequence similarity a strong evidence of?

A

similar biological function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some sources of biological differences?

A

-substitution (point mutation)
-insertion of short sequence/deletion of short sequence (indel) do not know whether something has been inserted in one or deleted in another so call it indel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a segmental duplication?

A

duplicated blocks of genomic DNA ranging in size from 1-200kb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an inversion?

A

when a section of DNA breaks off and reattaches to the chromosome in reversed order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a transposition?

A

a discrete section of DNA is moved from one location in the genome to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a translocation?

A

On piece of chromosome breaks and attached to another chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some sources of technical differences in alignment?

A

-sequencing machines make mistakes
-different technologies lead to different errors (illumina has fewer indels an more SNVs and substitutions from PCR)
(PacBIO and Nanopore have greater indels cause they are long strand sequencing)
-PCR is a major factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to score alignments and what doe high scores indicate?

A

-high scores indicated better alignments
-each score is assigned to a position separately

-identity (match) = +1
substituion or mismatch = -u
indel = -S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What defines the best when we are trying to get the best alignmnt?

A

modeling and probability and statistics define best

17
Q

What finds the best alignment?

18
Q

What is an alignment matrix?

A

if two sequences have length n and m then we have nm rows and cols with one more row and one more col for an added space at the beginning of each sequence

19
Q

What do alignments correspond to?

20
Q

What does an alignment path look like with scoring?

A

match is +1
indel is -delta or -1
substitution is -u or -.15

21
Q

How do you calculate the best score for alignment?

A
  1. calculates the best score for prefixes of two sequences
  2. update incrementally from there
22
Q

If T is a function that gives the score of the best alignment sequence what is the first step to formalization?

23
Q

What is some formal notation for general recurrence relation?

25
What is the average size of the human genome?
3 billion bases
26
27
When we calculate the score in each cell of the matrix how do we keep a record of which neighboring cell we used?
we can represent this as an arrow pointing back, up, or diagonally back and up
28
What does following the arrows allow us to do?
write out the alignment
29
What does moving upward mean?
inserting a space in the sequence written in the top
30
What does moving backward mean?
inserting a space of the sequence on the left
31
What does moving diagonally mean?
no space one letter is on top of the other
32
How do you perform a traceback?
1. start in the bottom right 2. follow the arrows to the top left 3. each arrow adds a position to the alignment 4. moving past a row or column consumes that row or column
33
What is time complexity?
-a function of sequence length - how does the amount of work scale for each individual cell
34
What is the time complexity for global pairwise alignment?
time complexity is O(n^2) if you compute n^2 entries in the matrix for length n sequences -the amount of work for each individual cell is constant we need to look at all three instance and decide score based on that for each cell
35
What is space complexity?
the space we need is proportional to the size of the alignment matrix all values must be stored
36
What is the space complexity for global pairwise alignment?
the required space is quadratic because it is a function that scales like the square of the length of the sequences O(n^2)
37
What is never less than the space complexity?
time complexity - since every time we do work we store it and take up space
38