Horspool Algorithm

Let's checkout Horspool's algorithm for pattern matching in strings. Say we have a pattern P of length m and also say that we know the alphabet.

When you are asked to compare two strings, you probably just use operator== and if that is forbidden, often we just tend to use a sliding window to compare both character sequences.


The 'easy' way uses a shift of 1, meaning that we move our pattern character by character to the right, comparing both sequences again and again. Horspool's algorithm naturally extends this idea by pre-computing an amount of shifts we may do.

Consider the pattern 777777774. If we compare it with our text and have a mismatch, we can certainly shift our pattern more than one place (ignoring edge cases), since our pattern consists of an overly large amount of sevens. The computation of the maximum shift we can make by character is rather easy:

std::map<char, std::size_t> shifts(const std::string& sigma,
                                   const std::string& pattern)
    std::map<char, std::size_t> tmp;
    for(auto c : sigma)
        tmp[c] = pattern.size(); 
    for(std::size_t i = 0; i < pattern.size() - 1; ++i)
        tmp[pattern[i]] = pattern.size() - i - 1;
    return tmp;

The matching can then be done from left-to-right through the text, but right-to-left through the pattern.

Matthis Kruse
Masters Student

My research interests include programming languages and compilers.