Sequence Alignment Flashcards

1
Q

what is sequence alignment?

A

bioinformatic method to arrange sequences in order to identify regions of similarity and evolutionary relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a query sequence

A

unknown uncharacterized sequence that we want to align to a known sequence or database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a reference sequence

A

the known sequence the query is aligned to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what do we align sequences

A

to search for close matching sequences
to assign function to genes, proteins, genomes (annotation)
to infer evolutionary relationships
determine the residue-residue correspondences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the two alignment types

A

global and local

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is global alignment

A

sequences aligned on their entire length, carried out form beginning to end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is local alignment

A

only local regions with highest level of similarity are aligned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are homologous sequences

A

evolved from a common ancestor, have similar 3D structure and function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the threshold for homologous sequences

A

more than 70% identitity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is pairwise alignment

A

aligns two sequences at a time, gaps introduced to find the best match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is dotplot alignment

A

a graphical representation of the sequence similarity between two sequences, rows correspond to residues of one sequence, columns of the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are some measures of sequence similarity

A

the hamming distance and levenshtein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is hamming distance

A

two strings of equal length, the number of positions with mismatching characters without insertions or deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is levenshtein distance

A

minimum number of edit operations required to change one string into another, including insertions, deletions and alterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what controls the cost of insertions and deletions when computing alignments

A

gap penalties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two parameters used for gap penalties

A

gap opening and gap extension

17
Q

what scores does a simple scoring system give to a match, mismatch and gap penalty

A

match = +1
mismatch = +0
gap penalty = -1

18
Q

what does a simple scoring system not take into account

A

the influence of molecular evolution
probability of replacing and amino acid with another similar one
purine-pyrimidine transversions less frequent than pur-pur and pyr-pyr

19
Q

what is transition-transversion matrix used for

A

aligning nucleotide sequences when accounting for a higher probability of transition than transversion

20
Q

what is transition

A

A to G, C to T

21
Q

what is transversion

A

purine replaced by pyrimidine

22
Q

what is PAM ineffective at identifying?

A

distant relationships

23
Q

how identical are two sequeneces 1 PAM apart

A

99% identical

24
Q

what is the lowest PAM level that produces a correct alignment

A

PAM250 with ~20% sequence identity

25
what matrix is used to identify distant relationships
BLOSUM matrix
26
how identical should sequences aligned used BLOSUM62 be
62%
27
what is the main difference between PAM and BLOSUM
in BLOSUM, all matrices are directly calculated, no extrapolation like PAM
28
Is BLOSUM45 more or less divergent than BLOSUM80
more divergent
29
are blosum matrices based on global or local alignments
local
30
are PAM matrices based on global or local alignments
global
31
what are heuristic alignments?
fast but approximate methods for finding the alignment with the highest score, not guaranteed to find the best alignment
32
which is more sensitive for nucleic acid sequences; FASTA or BLAST?
FASTA
33
what is the fasted and most widely used heuristic tool for pairwise sequence comparison
BLAST
34
what is % Identity in BLAST
number of identical residues divided by the number of matched residues ignoring gaps
35
what are positive outputs by BLAST
fraction of residues that are identical or similar
36
what is the bit-score (Max score)
highest alignment score between the query sequence and the database segments
37
what is the E value
the likelihood that the similarity occurred by chance
38
is a low E value good or bad?
good - more significant
39
what is PSI-BLAST?
searches a database for distantly related sequences