Dynamic Programming Flashcards
What Is Dynamic Programming?
Optimizing recursive functions so that previously computed values are stored and do not require recomputing.
What is Memoizing?
Caching the previous computed into an array, check if it was computed already then don’t need to waste time again. Sort of like memorizing
Type of Pattern Matching For Dynamic
It is Approximate Pattern Matching
Resulting in usual O(nm) time being a bit slower and O(n) space
Two types of Algorithms for Approximate Pattern Matching
Global Comparison O(mn)
Local Comparison O(mn)
Why Use a Global Algorithm?
Aligns entire two seq. of roughly equal size,
Why Use Local algorhtm?
Used for less similar strings, suspected to contain similar substring, like spell check
What is the Edit Distance Problem?
ED(X, Y) is the smallest amount of insertions, deletions, and substitutions to transform X to Y.
Ex.
X = GTTACTCGA
Y = GCTTGCCG
G - T T A C T C G A
| | | | | |
G C T T G C - C G -
ED(X, Y) = 4
Input: Two strings
Edit Distance Algorithm
Algorithm EditDistance(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← j
2: for i ← 1 to m do A[i, 0] ← i
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← min(A[i, j], A[i−1, j] + 1)
7: A[i, j] ← min(A[i, j], A[i, j−1] + 1)
8: return A[m, n]
Edit Distance Complexity
Time Complexity O(mn)
Space Complexity O(mn)
- Can improve Space with only storing last two
“it is very unlikely”. a better version made
Edit Distance Explain
You make the big problem into a bunch of subproblems we do using the prefixes of both strings
Only comparing the last letter
Have a 2d Matrix that stores all possible subproblems turning any prefix of X into any prefix of Y. If the last letter same then it is the same ED(X[1..M-1],Y[1..N-1]. Go through the array solve for each sub-problem then return the main problem which is A[m,n].
Also use this lemma for a start:
For every X: ED(X, ε) = ED(ε, X) = |X|,
ED(X[1. .m], Y[1. .n]) = min{
ED(X[1. .m), Y[1. .n)) + δ(X[m], Y[n]),
ED(X[1. .m), Y[1. .n]) + 1,
ED(X[1. .m], Y[1. .n)) + 1
},
δ(X[m], Y[n]), (same letter = 1 else 0)
X[M-1] X[M]
Y[n-1] -> [ REPLACE ] [ INSERT ]
Y[n] -> [ DELETE ] [ HERE]
How to get Alignment from Edit Distance
Alignment:
A B C D E F
A Z C - E D
| |
Start from A[m,n] and traverse to A[0,0], moving only left, up, or upper left in the direction of the minimum value.
L, D, D’, U
D if the value remains the same, D’ if the value changes.
Store the string during the traversal,
then reverse IT.
Meaning of directions:
L: Add new character in X
U: Add new character in Y
D: Keep the same character, no change
D’: Change the character from X to Y
Iterate through string:
L: Character above, - below.
D: Same character above and below. use ‘|’ between
D’: Character above, different character below.
U: - above, character below
A B C D E F 0 1 2 3 4 5 6 A 1 0 1 2 3 4 5 Z 2 1 1 2 3 4 5 C 3 2 2 1 2 3 4 E 4 3 3 2 2 2 3 D 5 4 4 3 2 3 3 A B C D E F 0* 1 2 3 4 5 6 A 1 0* 1 2 3 4 5 Z 2 1 1* 2 3 4 5 C 3 2 2 1* 2* 3 4 E 4 3 3 2 2 2* 3 D 5 4 4 3 2 3 3*
3 -> 2 -> 2 -> 1 -> 1 -> 0 -> 0
D’ D L D D’ D
Reverse:
D D’ D L D D’
Alignment:
A B C D E F
A Z C - E D
| |
Edit Distance With Better Space but Alignment Gone
Only Store last two rows/columns are kept, space can be reduced to O(n) Space
Algorithm EditDistance(X, m, Y, n)
1: b ← 0
2: for j ← 0 to n do A[b, j] ← j
3: for i ← 1 to m do
4: b ← 1 − b
5: A[b, 0] ← i
6: for j ← 1 to n do
7: A[b, j] ← A[1−b, j−1] + δ(X[i], Y[j])
8: A[b, j] ← min(A[b, j], A[1−b, j] + 1)
9: A[b, j] ← min(A[b, j], A[b, j−1] + 1)
10: return A[b, n]
Longest Common Subsequence
Longest Sequence of non-consecutive letters in two strings
EX:
TWO STRINGS
ABCDAF
ACBCF
LONGEST SEQUENCE: ABCF
LCS = 4
Very similar problem to Edit Distance as we have to solve for every prefix of both in a double matrix.
Longest Common Subsequence LEMMA Explain
The Lemma is basically knowing that if the two end letters are the same it is basically LCS of both of the Strings with the last letter removed + 1. If they are not the same then it will be the max LCS of all combo of removing the last letter for both strings. Since the Non-consecutive sequence, the sequence continues even after the last doesn’t match. as we are comparing the whole prefix with the last element
ADD when the last letter is the same, for edit distance we add when they are different
LCS(X[1..m],Y[1..n]) = max {
LCS(X[1..m),Y[1..n)) + f(X[m], Y[n]),
LCS(X[1..m],Y[1..n)),
LCS(X[1..m),Y[1..n])
}
Longest Common Subsequence Algorithm
Algorithm LCS(X, m, Y, n)
1: for j ← 0 to n do A[0, j] ← 0
2: for i ← 1 to m do A[i, 0] ← 0
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← max(A[i, j], A[i−1, j])
7: A[i, j] ← max(A[i, j], A[i, j−1])
8: return A[m, n]
Longest Common Subsequence Time Complexity
Time Complexity O(mn)
Space Complexity O(mn)
Longest Common Subsequence Alignment
Start at A[m,n] What you do is only put the letter the result string when the current [row][col] is the max out of [row-1][coll],[row][col-1],[row-1][col-1]. and then move in the direction of the diagonal or [row-1][col-1]
if it is not the max then don’t print just go to the row col that is equal to the current [row][col]. after you traverse all the way to A[0,0] reverse the string then it will be aligned
Longest Common Subsequence Better Space, No Alignment
Complexity with this
Time Complexity: O(mn)
Space: O(n)
Algorithm LCS(X, m, Y, n)
1: b ← 0
2: for j ← 0 to n do A[b, j] ← 0
for i ← 0 to n do A[i, b] ← 0
3: for i ← 1 to m do
4: b ← 1 − b
6: for j ← 1 to n do
7: A[b, j] ← A[1−b, j−1] + δ(X[i], Y[j])
8: A[b, j] ← max(A[b, j], A[1−b, j])
9: A[b, j] ← max(A[b, j], A[b, j−1])
10: return A[b, n]
Approximate Pattern Matching
COMPLEXITY:
TIME: O(mn)
SPACE: O(mn)
Input: pattern and text
Output: Index positions where Edit distance is between some amount
USES EDIT DISTANCE SO ALGO VERY SIMILAR;
Algorithm APRROXPM(X, m, Y, n)
1: for j ← 1 to n do A[0, j] ← 0
2: for i ← 1 to m do A[i, 0] ← i
3: for i ← 1 to m do
4: for j ← 1 to n do
5: A[i, j] ← A[i−1, j−1] + δ(X[i], Y[j])
6: A[i, j] ← min(A[i, j], A[i−1, j] + 1)
7: A[i, j] ← min(A[i, j], A[i, j−1] + 1)
8:for j ← 1 to n do
9: if A[m, j] ≤ t then output(j)
//WANT TO KNOW ED OF THE FULL PATTERN IN THE TEXT, SO ALREADY A[m, j]
Subset Sum Problem
Input: Array of positive integers and a positive integer T
Output: TRUE FALSE if there exist a subset of the array that its sum is T
EX:
■ Input: X = [4, 2, 7, 4, 9, 2], T = 20
■ Output: TRUE (20 = 9 + 7 + 4)
■ Input: X = [4, 2, 7, 4, 9, 2], T = 40
■ Output: FALSE
■ Input: X = [4, 2, 7, 4, 9, 2], T = 5
■ Output: FALSE
Subset Sum LEMMA
IF T = 0 It will always be TRUE
IF the Array is empty it will always be FALSE
When T> 0 and array_size > 0 then :
SSum(X, n, T) = {
IF X[n] <= T:
SSum(X,n-1,T or SSum(X,n-1,T-X[n]
OTHERWISE: SSum(X,n-1,T) }
- If the last element we are examining is less than or equal to T then we check if we already hit TRUE OR FALSE. IF TRUE we out then it is true for every element after. IF FALSE then this element might be a possibility so we SSum(X, n-1, T- X[n]).
We are breaking the array into smaller pieces to find the piece with the subset sum, starting from last and checking before.
If the X[n] or last element we are examining is greater than T then it not in solution so just check the element before
ex.
X = [2, 3, 7, 8, 10]
T = 11
expected: TRUE (11 = 3 + 8)
Initial setup for S[][]
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T - - - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
First iteration:
Compare X[i] = 2 with 1, 2 is not <= 1, inherit row above
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F - - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
Second iteration:
Compare X[i] = 2 with 2, 2 <= 2, get row above OR j-X[i] in above row.
S[i-1, j] = F
S[i-1, j-X[i]] = S[i-1, j-2] = T
= T
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F T - - - - - - - - - 3 T - - - - - - - - - - - 7 T - - - - - - - - - - - 8 T - - - - - - - - - - - 10 T - - - - - - - - - - -
Final iteration:
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T F T F F F F F F F F F 3 T F T T F T F F F F F F 7 T F T T F T F T F T T F 8 T F T T F T F T T T T T 10 T F T T F T F T T T T T
Check bottom right value = TRUE
SubSet Sum Complexity
Time Complexity: O(nT)
Space: O(nT) can be O(T) by only storing the last cols
SubSet Sum Algorithm
Algorithm SubsetSum(X, n, T)
1: for i ← 0 to n do S[i, 0] ← TRUE
2: for j ← 1 to T do S[0, j] ← FALSE
3: for i ← 1 to n do
4: for j ← 1 to T do
5: if X[i] ≤ j then
6: S[i, j] ← S[i−1, j] or S[i−1, j−X[i]]
7: else S[i, j] ← S[i−1, j]
8: return S[n, T]
Detailed Solution to the SubSet Sum Algo
To print out the solution to the algorithm
Prints the exact values that make the subset sum
def PrintSubsetSum(X, n, T):
SubsetSum(X, n, T)
if S[n, T] == FALSE: return i = n j = T while j > 0: if S[i - 1, j] == True: i -= 1 else: print(X[i]) j -= X[i] i -= 1
Basically going up the T in the 2d array because once true all the bottoms are true
ex.
0 1 2 3 4 5 6 7 8 9 10 11 T F F F F F F F F F F F 2 T* F T F F F F F F F F F 3 T F T T* F T F F F F F F 7 T F T T* F T F T F T T F 8 T F T T F T F T T T T T* 10 T F T T F T F T T T T T*
Column jumping happens at X[i] = 8 and 3.