Longest Common Subsequence

Longest common subsequence (LCS) is a classic problem of dynamic programming. The problem is to find out the longest subsequence which presents in the two given sequences. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous.

The solution of LCS is delicate, and the idea can be applied to many other problems. I will walk through these problems in this post.

I. Solution of LCS

The key is to break down the problem into a collection of simpler subproblems.

1
2
3
4
5
6
7
8
Let mark[i][j] represent the length of the LCS of s1(0..i-1) and s2(0..j-1).

When i = 0 && j = 0, mark[i][j] = 0, because the LCS of "" and "" is "";
When i = 0 && j > 0, mark[i][j] = 0, because the LCS of "" and s2(0..j-1) is "";
When i > 0 && j = 0, mark[i][j] = 0, because the LCS of s1(0..i-1) and "" is "";
When i > 0 && j > 0,
if s1[i-1] = s2[j-1], mark[i][j] = mark[i-1][j-1] + 1;
otherwise, mark[i][j] = max{mark[i-1][j], mark[i][j-1]};

If we need to return the subsequence, we can use a matrix to record the path we take in the if/else above and then recover the subsequence according to it.

1
2
3
4
When i > 0 && j > 0,
if s1[i-1] = s2[j-1], mark[i][j] = mark[i-1][j-1] + 1, trace[i][j] = 0;
if mark[i-1][j] >= mark[i][j-1], mark[i][j] = mark[i-1][j], trace[i][j] = 1;
otherwise, mark[i][j] = mark[i][j-1], trace[i][j] = 2;

1
2
3
4
5
6
7
8
9
10
11
12
13
int i = s1.length();
int j = s2.length();
StringBuilder subsequence = new StringBuilder();
while (i > 0 && j > 0) {
if (trace[i][j] == 0) {
subsequence.insert(0, s1.charAt(i - 1));
i--; j--;
} else if (trace[i][j] == 1) {
i--;
} else if (trace[i][j] == 2) {
j--;
}
}

II. Applying the idea of LCS

i. Longest Common Substring

It is a bit different from Longest Common Subsequence. Substring requires to be contiguous. But the solution is similar. Note that this is not a problem of dp, the answer is not mark[s1.length()][s2.length()], but the max value in the matrix.

1
2
3
4
5
6
7
8
Let mark[i][j] represent the length of the LCSubstring of s1(0..i-1) and s2(0..j-1).

When i = 0 && j = 0, mark[i][j] = 0, because the LCSubstring of "" and "" is "";
When i = 0 && j > 0, mark[i][j] = 0, because the LCSubstring of "" and s2(0..j-1) is "";
When i > 0 && j = 0, mark[i][j] = 0, because the LCSubstring of s1(0..i-1) and "" is "";
When i > 0 && j > 0,
if s1[i-1] = s2[j-1], mark[i][j] = mark[i-1][j-1] + 1;
otherwise, mark[i][j] = 0;

A similar problem on leetcode: 718. Maximum Length of Repeated Subarray.

ii. Interleaving String

97. Interleaving String: given s1, s2, s3, find whether s3 is formed by the interleaving of s1 and s2.

We can directly apply the idea of LCS to this problem. Just be aware of the index.

1
2
3
4
5
6
7
8
9
10
Let mark[i][j] represent if the interleaving of s1(0..i-1) and s2(0..j-1) can form s3(0..i+j-1).

When i = 0 && j = 0, mark[i][j] = true, because "" interleaving "" can get "";
When i = 0 && j > 0, mark[i][j] = true, because "" interleaving s2(0..j-1) can form s3(0..j-1);
When i > 0 && j = 0, mark[i][j] = true, same as above;
When i > 0 && j > 0,
if s1[i-1] = s3[i+j-1] && s2[j-1] = s3[i+j-1], mark[i][j] = mark[i-1][j] || mark[i][j-1];
if s1[i-1] = s3[i+j-1] && s2[j-1] != s3[i+j-1], mark[i][j] = mark[i-1][j];
if s1[i-1] != s3[i+j-1] && s2[j-1] = s3[i+j-1], mark[i][j] = mark[i][j-1];
otherwise, mark[i][j] = false;

III. Matching

i. Regular expression matching

10. Regular Expression Matching: decide if the pattern p matches the string s. ‘.’ matches any single character; ‘*’ matches zero or more of the preceding element.

1
2
3
4
5
6
7
8
9
Let mark[i][j] be true if p(0..j-1) matches s(0..i-1); otherwise it is false.

When i = 0 && j = 0, mark[i][j] = true, because "" matches "";
When i = 0 && j > 0, mark[i][j] = mark[i][j-2] && p[j-1] = '*', e.g. "a*" matches "", but "aa*" does not match "";
When i > 0 && j = 0, mark[i][j] = false, because "" does not match s(0..i-1);
When i > 0 && j > 0,
if s[i-1] = p[j-1] || p[j-1] = '.', mark[i][j] = mark[i-1][j-1];
if p[j-1] = '*', mark[i][j] = mark[i][j-2] || mark[i-1][j] && (s[i-1] = p[j-2] || p[j-2] = '.'), e.g. "abb" vs. "ab*";
otherwise, mark[i][j] = false;

ii. Wilcard matching

44. Wildcard Matching: decide if the pattern p matches the string s. ‘?’ matches any single character; ‘*’ matches any sequence of characters (including the empty sequance).

1
2
3
4
5
6
7
8
9
Let mark[i][j] be true if p(0..j-1) matches s(0..i-1); otherwise it is false.

When i = 0 && j = 0, mark[i][j] = true, because "" matches "";
When i = 0 && j > 0, mark[i][j] = mark[i][j-1] && p[j-1] = '*', e.g. "**" matches "", but "aa*" does not match "";
When i > 0 && j = 0, mark[i][j] = false, because "" does not match s(0..i-1);
When i > 0 && j > 0,
if s[i-1] = p[j-1] || p[j-1] = '?', mark[i][j] = mark[i-1][j-1];
if p[j-1] = '*', mark[i][j] = mark[i][j-1] || mark[i-1][j], e.g. "abcd" vs. "ab*";
otherwise, mark[i][j] = false;

IV. Edit distance

i. Delete distance

583. Delete Operation for Two Strings: given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can delete one character in either string.

The idea is to find the LCS and remove the rest characters. Or we can directly apply the idea of LCS to this problem.

1
2
3
4
5
6
7
8
Let mark[i][j] represent the number of characters to be deleted in order to make s1(0..i-1) and s2(0..j-1) the same.

When i = 0 && j = 0, mark[i][j] = 0, because "" and "" are already the same;
When i = 0 && j > 0, mark[i][j] = j, because all the characters in s2(0..j-1) have to be deleted;
When i > 0 && j = 0, mark[i][j] = i, same as above;
When i > 0 && j > 0,
if s1[i-1] = s2[j-1], mark[i][j] = mark[i-1][j-1];
otherwise, mark[i][j] = min{mark[i-1][j], mark[i][j-1]} + 1;

ii. Edit distance

72. Edit Distance: given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can insert/delete/edit one character in either string.

The solution is to apply the idea of LCS to this problem. As the insert operation is the same as the delete operation, we can just use either one.

1
2
3
4
5
6
7
8
Let mark[i][j] represent the number of characters to be deleted/edited in order to make s1(0..i-1) and s2(0..j-1) the same.

When i = 0 && j = 0, mark[i][j] = 0, because "" and "" are already the same;
When i = 0 && j > 0, mark[i][j] = j, because all the characters in s2(0..j-1) have to be deleted;
When i > 0 && j = 0, mark[i][j] = i, same as above;
When i > 0 && j > 0,
if s1[i-1] = s2[j-1], mark[i][j] = mark[i-1][j-1];
otherwise, mark[i][j] = min{mark[i-1][j], mark[i][j-1], mark[i-1][j-1]} + 1;

V. Palindrome

i. Longest Palindromic Subsequence

516. Longest Palindromic Subsequence: find the longest palindromic subsequence’s length in the given string s.

One solution is to reverse string s and find the LCS of s and the reversed string. Time complexity is O(s.length() * s.length()).

Another solution is to apply the idea of LCS.

1
2
3
4
5
6
7
8
Let mark[i][j] represents the length of the longest palindromic subsequence in s(i..j).

When i = j, mark[i][j] = 1;
When i != j && s[i] = s[j]
if i+1 = j, mark[i][j] = 2;
otherwise, mark[i][j] = mark[i+1][j-1] + 2;
When i != j && s[i] != s[j]
mark[i][j] = max{mark[i+1][j], mark[i][j-1]};

Time complexity is: O(s.length() * s.length()).

ii. Longest Palindromic Substring

5. Longest Palindromic Substring: find the longest palindromic substring in the given string s.

A straight-forward solution is to traverse the array and check if an element is the center of a palindrome.

1
2
3
4
5
6
7
8
9
10
11
12
for (int i = 0; i < s.length(); i++) {
int left = i - 1, right = i + 1;
while (left >= 0 && right < s.length() && s.charAt(left) == s.charAt(right)) {
left--; right++;
}
//...
int left = i, right = i + 1;
while (left >= 0 && right < s.length() && s.charAt(left) == s.charAt(right)) {
left--; right++;
}
//...
}

Time complexity: O(s.length() * s.length()).

Another solution is to apply the idea of LCS.

1
2
3
4
5
6
7
Let mark[i][j] represents the length of the longest palindromic substring in s(i..j).

When i = j, mark[i][j] = 1;
When i != j && s[i] = s[j]
if i+1 = j, mark[i][j] = 2;
if mark[i+1][j-1] = j-i-1, mark[i][j] = mark[i+1][j-1] + 2;
Otherwise, mark[i][j] = 0;

Time complexity is: O(s.length() * s.length()).

Note that the trick that reverses the string and then finds the longest common substring does not work here. This is a counterexample: “abcdkecba”.