KMP Algorithm
https://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
For string pattern matching, brutal force way needs to match the start character at every possible index i. The worst case complexity of this Naive algorithm is O(m(n-m+1))
*m is the length of the pattern
KMP (Knuth Morris Pratt) Pattern Searching
It is O(n) space and O(n) worst case time complexity. The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of next window. We take advantage of this information to avoid matching the characters that we know will anyway match.
In the next step, we compare next window of txt with pat.
txt = "AAAAABAAABA"
pat = "AAAA" [Pattern shifted one position]
This is where KMP does optimization over Naive. In this
second window, we only compare fourth A of pattern
with fourth character of current window of text to decide
whether current window matches or not. Since we know
first three characters will anyway match, we skipped
matching first three characters.
To do this, we need the preprocessing to find out:
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Searching Algorithm:
Preprocessing Algorithm:
We need the lookup of lps[len-1] whenever there is a mismatch because of several scenarios:
AAACAAAA
ACDBACDE ACDBACDB
pat[] = "AAACAAAA"
len = 3, i = 7.
Since pat[len] and pat[i] do not match and len > 0,
set len = lps[len-1] = lps[2] = 2
len = 2, i = 7.
Since pat[len] and pat[i] match, do len++,
store it in lps[i] and do i++.
len = 3, lps[7] = 3, i = 8
public int[] getKMPLookupTable(String s) {
int[] lookup = new int[s.length()];
lookup[0] = 0;
int i=1, len = 0;
while(i<s.length()) {
if(s.charAt(len)==s.charAt(i)) {
len++; lookup[i] = len;
i++;
} else if(len!=0) {
len = lookup[len-1];
} else {
i++;
}
}
return lookup;
}
Applications
Find Palindrome
We can use KMP proper prefix-suffix lookup to find the longest palindrome starting at index 0 of a string.
c a t a c b # b c a t a c
0 0 0 0 1 0 0 0 1 2 3 4 5