Bio Background Flashcards

(51 cards)

1
Q

DNA

A

Deoxyribose nucleic acid, encodes genetic program of prokaryotes and eukaryotes.

Long polymer made from nucleotides or bases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Four DNA bases

A

C ytosine
G uanine
A denine
T hymine

adhered to the sugar/phospate to form the nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Purines

A

Adenine + Guanine (AG) - pair of connected rings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pyrimidines

A

Cytosine, Thymine and Uracil (RNA) - single ring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Base pairing

A

Double helix stabilized by:
- Hydrogen bonds
- Base stacking interactions among nucleotides

A-T (2 hydrogen bonds)
C-G (3 hydrogen bonds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Structure

A
  • Backbone is alternating sugars/phosphates
  • Center are hydrogen bonds
  • Space between strands are binding sites for transcription
  • Strands are antiparallel
  • 5’ start - phosphate group
  • 3’ end hydroxyl group

Top strand: 5’ -> 3’ (watson)
Bottom strand: 3’ -> 5’ (crick)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Replication

A

Occurs from 3’ to 5’ direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Proteins

A

Linear polymer of aminoacids linked by peptide bonds

20 different types of aminoacids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aminoacids

A

Alanine, Cysteine, Aspartic Acid (Asp D), Glutamic Acid (Glu E), Phenylalanine, Glycine, Histidine, Isoleucine, Lysine, Leucine, Methionine, AsparagiNe, Proline, Glutamine, Arginine, Serine, Threonine, Valine, Tryptophan, Tyrosine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Protein structure

A

Primary - sequence of aminoacids
Secondary - Local spatial arragment due to backbone interactions, short stretches alpha helices, beta helices
Tertiary - long range 3D chain side-to-side interactions
Quaternary - Chains fold around one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ribbon diagrams

A

3D structures adopted by aminoacids
Coiled ribbon = alpha helix
arrow ribbon = beta strand
thin string = loops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Central Dogma Molecular Biology

A

DNA - (Transcription) > RNA - (Translation) > Protein

Aminoacid seq in RNA is determined by nucleotide seq in DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gene

A

Region of DNA controls a hereditary characteristic.

Corresponds to single mRNA which will be translated to a protein

Eukaryotes have exons interrupted by introns (no code seq)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RNA

A

DNA but sugar is ribose
Instead of Thymine is Uracil
Single stranded

mRNA transcribed from DNA -> translated into protein
tRNA used in translation
rRNA helps ribosomes assemble proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Transcription

A

Initiation - RNA polymerase binds to promoter site on DNA and unzips double helix
Elongation - free nucleotides bind to template strand and thymine is changed by uracil
Termination - seq signal termination RNA transcript is released and DNA zips up again

TAA, TAG, TGA -> Stop seq
ATG -> begin seq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pattern matching

A
  • Naive
  • Finite Automata
  • KMP
  • Boyer Moore
  • Suffix Tree
  • Suffix Array
  • Generalized suffix tree and array
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pattern matching - naive

A

Brute force algorithm
sliding pattern over the text
n = len(T)
m = len(P)
Time complexity O((n-m+1) * m) worst case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pattern matching - Finite automata

A

Sigma = alphabet
Time complexity O(n)
preprocessing: O (m|sigma|)
pattern matching -> O(n+m)

Finite-Automaton-Matcher(T,d,m)
begin
q := 0;
for i:= 1 to n do
begin
q := delta(q,T(i));
if q = m then
print “P occurs from position”+ (i-m+1)
end;
end;

worst time O(m^3 * |alpha|)
build_delta(P, alpha)
begin
for q := 0 to m do
for each a in alphabet do
begin
k := min(m+1, q+2);
repeat
k := k-1;
until P[1..k] ] (is a suffix of) P[1..q]a;
delta (q, a) := k;
end;
end

Suffix function logic:
get the longest prefix of P which is a suffix of current input like

P = at
suffix(atcat) = 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Pattern matching - KMP

A

Prefix
Time complexity O(n)
prefix function O(m)

Avoid testing useless shifts, avoid precompute delta function

Run linear complexity
Good for short patterns with repetitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pattern matching - Boyer Moore

A

bcr function O(|alphabet| + m)
gsr function O(m)

total: O( (n-m+1) * m + |alphabet|)

Bad Character Rule - check the cases that allow a shift of n characters on the pattern, move to right as much as possible

Good suffix Rule - use knowledge of how many matched characters in the pattern suffix
Case 1 - a complete match exists as another prefix in P, then shift based on delta array
Case 2 - there is not a complete match, use prefix function almost shift by all remaining chars

Run sublinear complexity
Usually faster for large P and T with repetitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Suffix tree

A

preprocessing O(n)
search O(m+k)
total: O(n+m+k)

22
Q

Ukkonen

A

Implicit suffix tree
- Suffix links, connect substrings using links between internal nodes
- Edge-labels compression, use indices instead of substrings O_space(1)

23
Q

Generalized Suffix Tree

A

Time complexity: O (|alphabet of two strings| * n)
Concatenate two strings with unique separators and apply ukknonen algorithm to build the suffix tree T

Used for common substrings
text comparison
palindromes
find largest suffix of S1 which is also prefix of S2, viceversa

24
Q

Pattern matching - Suffix array

A

Time complexity O(n)
Traverse depth first lexical order

Binary search to find match occurrences
Complexity O (m log n)
random strings O ( m + log n)
m = |P|

mlr accelerator (minimum left right)

25
Assumption of sequence alignment
Life is monophyletic Biological entities share common ancestry
26
Phylogenetic similarity
Homology: similarity due to evolution Analogy: similarity due to analogous function
27
Homology
A is homologus to B if they relate from divergence from a common ancestor Paralogues -> different but related functions in one species Orthologues -> Same function in different species
28
Analogy
A is analogous to B if similar function but different origins Convergent evolution -> different species in a similar env adapt to same function
29
Types of Mutations
Transition A <-> G between two purines C <->T between two pyrimidines Transversion (mixed one purine one pyrimidine) C <-> G A <-> T Deletions Insertions Inversions A-T adenine turns to thymine G-C Guanine turns to cytosine
30
Maximal parsimony hyphotesis
Among all explanations, the simplest one is preferred. Explain the absence/presence of nucleotides with the min number of evolutional changes Easier to delete n bases in one site than one base in n sites
31
Types of alignments
Global alignment - comparative and evolutionary studies Local alignment - database searching and retrieval - ignores distantly related biological regions and focuses on evolutionarily conserved signals of similarity
32
Sequence alignment
Pairwise alignment - two sequences - exact solution Multiple sequence alignment - 3 or more - approximated (heuristic) solutions
33
Gaps
ends - missing data internal - deletion or insertion
34
Types of alignment
Manual alignment Dot Matrix
35
Dot Matrix
match -> diagonal step through dot mismatch -> diagonal step through empty gap on top seq -> vertical step gap on left seq -> horizontal step W/ multiple window size, stringency, alphabet size stringency = if at least h chars are identical adv: - simple - trial and error to explore disadv: - expensive for large seq - may not find best - qualitative analysis
36
Scoring matrices and Gap penalties
- scoring system : gap penalty -> gap-opening penalty, gap-extension penalty : scoring Matrix M(a,b) -> based on the additive property of the score, implies poistion independence match (a=b) mismatch(a!=b)
37
Gap penalty
Fixed gap-penalty system -> 0, 1, or a constant Linear gap-penalty system -> gamma(g) = -g*d = gap length g by a constant d Affine score -> opening cost (d) extension cost (e) gamma(g) -d - (g-1) * e, with e < d Logarithmic gap-penalty system -> gap-ext increases with the logarithm of the gap length
38
Scoring matrices
- Identity scoring -> match 1, mismatch 0 - DNA scoring -> match 3, transition 2, transversion 0 - Chemical Similarity Scoring, higher scores for amino acids based on chemical similarity: size, charge, hydrophobicity - Observed matrices: analyze substitution frequency
39
Scoring matrices
DNA M(a,b) > 0 if match <=0 if mismatch Aminoacids PAM -> Percent/Point Accepted Mutation Possibility of pair caused to homology and not by chance BLOSUM -> Substitution matrices for aminoacids direct observation of blocks of proteins having similar functions
40
PAM
related proteins up to 85% very similar PAM1 - 1 substitution in 100 amino acid residues 1% Going through N percent mutations PAM-N Matrix PAM250 -> 250 evolutionary steps Pos score common replacement neg score unlikely replacement Short -> short seq, strong local similarities Long -> Long seq, weak similarities PAM60 60% close relations PAM120 general use 40% identity PAM250 distant 20% identity
41
BLOSUM
Blocks database, based on local alignments or blocks / observed Families of proteins with identical function highly conserved protein domains Identify motifs -> blocks of local alignments BLOSUM 62 is the default matrix in BLAST 2.0 BLOSUMn based on seq that are at most n percent identical higher n more closely related BLOSM62 general use BLOSUM80 close relations BLOSUM45 distant relations
42
PAM vs BLOSUM
top = distantly related proteins PAM100 ~= BLOSUM90 PAM120 ~= BLOSUM80 PAM160 ~= BLOSUM60 PAM200 ~= BLOSUM52 PAM250 ~= BLOSUM45 bottom = closely related sequences
42
PAM vs BLOSUM best ones
BLOSUM -> best for local alignments BLOSUM62 -> majority of weak protein similarities BLOSUM45 -> long and weak alignments PAM250 -> seq 17-27% identity BLOSUM62 -> moderately distant proteins BLOSUM50 -> FASTA searches
43
Hamming Distance
permits only substitutions (positive cost) if |A| = |B|, 0<= d(A,B) <= |A| X =aaaccd, Y=abcccd d(X,Y) = 2 two substitutions necessary
44
Levenshtein Distance (Edit distance)
permits insertions, deletions and substitutions (positive costs) 0 <= d(A,B) <= max(|A|, |B|) X=aaaccd, Y=abccd d(X,Y)=2 , one substitution and one deletion
45
Episode distance
permits only insertions (positive cost) d(A,B) = |B| - |A| or inf X=aaccd,Y=abbaccd d(X,Y) =2 two insertions
46
Dynamic programming - Pair alignment - Edit distance
D[i,0] = i D[0,j] = j D[i,j] = min( D[i-1, j-1] + f(i,j), D[i-1, j] + 1, D[i, j-1] + 1 ) f(i,j) = 0 if match else 1
47
Needleman-Wunsch - Pair alignment
Complexity - Space (nm) Time for build O(nm) backtrack O(n+m) D[i,0] = i D[0,j] = j D[i,j] = min( D[i-1,j-1] + f(i,j), D[i-1, j] + 1, D[i, j-1] + 1 ) f(i,j) = 0 if match else 1 follow path back from D[n,m] to D[0,0] vertical step -> align a symbol in A with gap in B horizontal step -> align a symbol in B with gap in A diagonal step -> match or mismatch
48
Similarity Score - Semiglobal alignment
y = gap penalty D[i,0] = i*y D[0,j] = j*y D[i,j] = max( D[i-1,j-1] + ro(i,j), D[i-1, j] + y, D[i, j-1] + y ) ro(i,j) = scoring matrix B top A left D[0,j] = 0 gap beginning of B first column D[n,j] = 0 gap tail of B last column D[i,0] = 0 gap beginning of A first row D[i,m] = 0 gap tail of A last row
49
Smith-Waterman - Local alignment
D[i,0] = 0 D[0,j] = 0 D[i,j] = max( D[i-1,j-1] + ro(i,j), D[i-1, j] + y, D[i, j-1] + y, 0 ) ro(i,j) = scoring matrix . f.e. match 1 mismath -1 y gap -1 y = gap penalty
50
Needleman-Wunsch vs Smith-Waterman
SW finds segments in two seq that have similarities First row and first column = 0 neg score = 0 begin with highest score end at 0, top left NW aligns two complete sequences first row and column = gap penalty can be negative begin with cell at (n,m) end at (0,0)