Diff Flashcards
(20 cards)
What is the goal of the diff algorithm?
To compute the minimum sequence of insertions and deletions to transform sequence A into sequence B.
What edit operations are used in the diff problem?
Insert after a position and delete at a position.
What is a subsequence?
A sequence formed by deleting zero or more elements without changing order.
What is a common subsequence?
A subsequence that appears in both sequences A and B.
Why is finding the LCS useful in diff?
Because maximizing the LCS minimizes the number of edits required.
What is the Bellman equation for LCS?
l(i,j) = 1 + l(i-1,j-1) if a_i = b_j, else max(l(i-1,j), l(i,j-1)); l(i,0) = l(0,j) = 0.
What is the time complexity of LCS via dynamic programming?
Θ(m · n), where m and n are the lengths of the sequences.
How do you recover the actual LCS from the DP table?
By tracing backwards from the bottom-right cell, following the matches.
How is diff modeled as a graph problem?
As a grid where right = delete, down = insert, and diagonal = match (zero-cost).
What graph algorithm can be used to solve diff efficiently?
Dijkstra’s algorithm with cost 1 for insert/delete and 0 for match.
What is the size of the graph used in the diff problem?
Θ(m · n) vertices and edges.
What optimization helps Dijkstra in this scenario?
Using a bucket queue for constant-time priority queue operations.
How many diagonals are relevant when the edit distance is D?
Only diagonals from -D to D.
What is the key idea in Myers’ algorithm?
Track the furthest x-position on each diagonal for each number of edits d.
What is the time complexity of Myers’ algorithm?
Θ(D · min(m, n)), where D is the edit distance.
Why is Myers’ algorithm efficient for small D?
Because it limits the search space to a narrow band of relevant diagonals.
What does Git use for its diff implementation?
Myers’ algorithm due to its efficiency and practicality.
What is patience diff in Git?
A strategy that preserves large unchanged blocks for better readability.
Why is patience diff useful for code diffs?
Because it highlights logically related blocks and minimizes noise.
What does the DP table for LCS represent?
Lengths of LCS for each prefix pair of A and B.