Sequence Alignment Flashcards

1
Q

what is sequence alignment?

A

bioinformatic method to arrange sequences in order to identify regions of similarity and evolutionary relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a query sequence

A

unknown uncharacterized sequence that we want to align to a known sequence or database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a reference sequence

A

the known sequence the query is aligned to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what do we align sequences

A

to search for close matching sequences
to assign function to genes, proteins, genomes (annotation)
to infer evolutionary relationships
determine the residue-residue correspondences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the two alignment types

A

global and local

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is global alignment

A

sequences aligned on their entire length, carried out form beginning to end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is local alignment

A

only local regions with highest level of similarity are aligned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are homologous sequences

A

evolved from a common ancestor, have similar 3D structure and function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the threshold for homologous sequences

A

more than 70% identitity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is pairwise alignment

A

aligns two sequences at a time, gaps introduced to find the best match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is dotplot alignment

A

a graphical representation of the sequence similarity between two sequences, rows correspond to residues of one sequence, columns of the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are some measures of sequence similarity

A

the hamming distance and levenshtein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is hamming distance

A

two strings of equal length, the number of positions with mismatching characters without insertions or deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is levenshtein distance

A

minimum number of edit operations required to change one string into another, including insertions, deletions and alterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what controls the cost of insertions and deletions when computing alignments

A

gap penalties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the two parameters used for gap penalties

A

gap opening and gap extension

17
Q

what scores does a simple scoring system give to a match, mismatch and gap penalty

A

match = +1
mismatch = +0
gap penalty = -1

18
Q

what does a simple scoring system not take into account

A

the influence of molecular evolution
probability of replacing and amino acid with another similar one
purine-pyrimidine transversions less frequent than pur-pur and pyr-pyr

19
Q

what is transition-transversion matrix used for

A

aligning nucleotide sequences when accounting for a higher probability of transition than transversion

20
Q

what is transition

A

A to G, C to T

21
Q

what is transversion

A

purine replaced by pyrimidine

22
Q

what is PAM ineffective at identifying?

A

distant relationships

23
Q

how identical are two sequeneces 1 PAM apart

A

99% identical

24
Q

what is the lowest PAM level that produces a correct alignment

A

PAM250 with ~20% sequence identity

25
Q

what matrix is used to identify distant relationships

A

BLOSUM matrix

26
Q

how identical should sequences aligned used BLOSUM62 be

A

62%

27
Q

what is the main difference between PAM and BLOSUM

A

in BLOSUM, all matrices are directly calculated, no extrapolation like PAM

28
Q

Is BLOSUM45 more or less divergent than BLOSUM80

A

more divergent

29
Q

are blosum matrices based on global or local alignments

A

local

30
Q

are PAM matrices based on global or local alignments

A

global

31
Q

what are heuristic alignments?

A

fast but approximate methods for finding the alignment with the highest score, not guaranteed to find the best alignment

32
Q

which is more sensitive for nucleic acid sequences; FASTA or BLAST?

A

FASTA

33
Q

what is the fasted and most widely used heuristic tool for pairwise sequence comparison

A

BLAST

34
Q

what is % Identity in BLAST

A

number of identical residues divided by the number of matched residues ignoring gaps

35
Q

what are positive outputs by BLAST

A

fraction of residues that are identical or similar

36
Q

what is the bit-score (Max score)

A

highest alignment score between the query sequence and the database segments

37
Q

what is the E value

A

the likelihood that the similarity occurred by chance

38
Q

is a low E value good or bad?

A

good - more significant

39
Q

what is PSI-BLAST?

A

searches a database for distantly related sequences