Klein und fein: Der Boyer-Moore-Horspool Algorithmus

Posted by Chris on November 01, 2011

No Gravatar

Zugegeben, ich verstand den Boyer-Moore Algorithmus nicht ganz, was alleine schon ein Grund war, eine einfachere Variante zu implementieren. Der Boyer-Moore-Horspool ist so eine Variante. Glücklicherweise zeigt sich, dass die Horspool-Vereinfachung die Textsuche im durchschnittlichen Fall nicht oder nicht viel langsamer werden lässt (rein subjektive Einschätzung!). Neugierige sollten den Code 1:1 kopieren und verwenden können mit einem beliebigen Java-Compiler. Und bitte einfach den Programmierstil ausblenden, es ging mir nur darum, den Algorithmus zu testen und wenigstens halbwegs zu verstehen. Heute braucht man sich kaum mehr selber um Implementierungen von Suchalgorithmen zu bemühen, die meisten Programmierumgebungen bringen entsprechende Bibliotheksfunktionen seit vielen Jahren mit.

public class BoyerMooreHorspoolSearch {
	private char[] text;
	private int text_len;
	private static final int ALPHABET_SIZE = 256;
 
	public BoyerMooreHorspoolSearch() {
		String str = "overeovere The oxen-brown Fox jumped over the Fox-lake. It (fox) didn't" +
				" know why, but it was kinda fun overemphasized! xxxxxxxxxxx aaaaaaa";
		this.text = str.toCharArray();
		text_len = text.length;
		System.out.println("DEBUG: text_len: " + text_len);
	}
 
	public BoyerMooreHorspoolSearch(String text) {
		this.text = text.toCharArray();
		text_len = this.text.length;
	}
 
	private void search(final String pattern) {
		char[] pat = pattern.toCharArray();
		int pat_len = pat.length;
		int[] occurrenceTable = initOccurrence(pat);
		int pos = 0; // our text index position
		int countloops = 0;
 
		while (pos <= text_len-pat_len) {
			int j = pat_len-1; // our pattern index position
			// As long as the pattern chars are hits (we start at the
			// end of the pattern) => go one backwards (to the left).
			while (j >= 0 && pat[j] == text[pos+j]) j--; // hits
			if (j < 0) report(pattern, pos);
 
			// We get here, when: a) a full hit was found
			// and b) nothing was found or partially pattern was found
			// in either case, we have to check, how far to the right we
			// can jump now.
			//
			// Now we do this in one step: take the jump-length from
			// the occurrenceTable indicating pat_len, if the character
			// of the current pos in text is _not_ in the pattern and
			// giving the difference from the right to the rightmost character
			// in the pattern, if the character at pos in text _is_ in our
			// pattern
			pos += occurrenceTable[text[pos+pat_len - 1]];
			countloops++;
		}
		System.out.println("DEBUG: Loops: " + countloops);
	}
 
	private int[] initOccurrence(final char[] pat) {
		int[] occurrenceTable = new int[ALPHABET_SIZE];
		char a;
		int pat_len = pat.length;
 
		for (a = 0; a < ALPHABET_SIZE; a++)
			occurrenceTable[a] = pat_len;
 
		// We fill the table at the position of the ASCII char
		// (for example occurrenceTable[97] for 'a') with the
		// index value of it's last occurrence in the pattern.
		for (int j = 0; j < pat_len-1; j++)
			occurrenceTable[pat[j]]=pat_len - j - 1;
 
		return occurrenceTable;
	}
 
	private void report(final String pat, final int pos) {
		System.out.println("Found \"" + pat + "\" at position "
				+ (pos+1) + " (index " + pos + ") in the text!");
	}
 
	// Driver
	/**
	 * @param args
	 */
	public static void main(String[] args) {
//		BoyerMooreHorspoolSearch bmh = new BoyerMooreHorspoolSearch("Is there a fox?");
		BoyerMooreHorspoolSearch bmh = new BoyerMooreHorspoolSearch();
 
		bmh.search("Fox");
		bmh.search("fox");
		bmh.search(" fox"); // this should _not_ be found
		bmh.search("ox");
		bmh.search("xxx");
		bmh.search("overe");
	}
Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

Comments

This site is using OpenAvatar based on