Click here to Skip to main content
15,028,402 members
Articles / General Programming / Algorithms
Posted 8 Nov 2010

Tagged as


30 bookmarked

Approximate String Matching - Row-wise Bit-parallelism Algorithm (BPR)

Rate me:
Please Sign up or sign in to vote.
4.67/5 (8 votes)
14 Nov 2010CPOL2 min read
One of the best methods for solving approximate string matching problem


Approximate string matching, also called "string matching allowing errors", is the problem of finding a pattern p in a text T when a limited number k of differences is permitted between the pattern and its occurrences in the text.


The oldest solution to the problem relies on dynamic programming. For example, see Levenstein distance implementation on Code Project ( or in Wikipedia(

An alternative and very useful way to consider the approximate search problem is to model the search with a nondeterministic finite automaton (NFA). Consider the NFA for k = 2 errors. Every row denotes the number of errors seen. Every column represents matching a pattern prefix. Horizontal arrows represent matching a character (i.e., if the pattern and text characters match, we advance in the pattern and in the text). All the others increment the number of errors by moving to the next row: Vertical arrows insert a character in the pattern (we advance in the text but not in the pattern), short diagonal arrows substitute a character (we advance in the text and pattern), and long diagonal arrows delete a character of the pattern (they are e-transitions, since we advance in the pattern without advancing in the text). The initial self-loop allows an occurrence to start anywhere in the text. The automaton signals (the end of) an occurrence whenever a rightmost state is active.


BPR Algorithm

Bit-parallel algorithms are a simple and fast way to encode NFA states thanks to their higher locality of reference. Here is the pseudo-code for this algorithm that is described in the book "Flexible Pattern Matching in Strings" (see References for more information).

BPR (p =, T =, k)
1.    Preprocessing
2.        For c e S Do B[c] <- 0m
3.        For j e 1 ... m Do B[pj] <- B[pj] | 0m-j10j-1
4.    Searching
5.       For i e 0 ... k Do Ri <- 0m-i1i
6.       For pos e 1 ... n Do
7.            oldR <- R0
8.            newR <- ((oldR << 1) | 1) & B[tpos]
9.            R0 <- newR
10.           For i e 1 ... k Do
11.              newR <- ((Ri << 1) & B[tpos]) | oldR | ((oldR | newR) << 1)
12.              oldR <- Ri, Ri <- newR
13.           End of for
14.           If newR & 10m-1 <> 0 Then report an occurrence at pos
15.      End of for

For example, let's search for pattern "rain" in "brain" text with 2 errors. We start with B=











and { R0=0000 R1=0001 R2=0011 }. And here is step by step run:

b - 0000

R0- 0000

R1- 0000

R2- 0011

r - 0001

R0- 0001

R1- 0010

R2- 0100

a - 0010

R0- 0010

R1- 0111

R2- 1110

Occurrence at position 3

i - 0100

R0- 0100

R1- 1110

R2- 11111

Occurrence at position 4

n - 1000

R0- 1000

R1- 1100

R2- 1110

Occurrence at position 5

C# Implementation

public static void BPR(string pattern, string text, int errors)
            int[] B = new int[ushort.MaxValue];
            for (int i = 0; i < ushort.MaxValue; i++) B[i] = 0;
            // Initialize all characters positions
            for (int i = 0; i < pattern.Length; i++)
                B[(ushort)pattern[i]] |= 1 << i;
            // Initialize NFA states
            int[] states = new int[errors+1]; 
            for(int i= 0; i <= errors; i++)
                states[i] = (i == 0) ? 0 : (1 << (i - 1) | states[i-1]);
            int oldR, newR;
            int exitCriteria = 1 << pattern.Length -1;

            for (int i = 0; i < text.Length; i++)
                oldR = states[0];
                newR = ((oldR << 1) | 1) & B[text[i]];
                states[0] = newR;

                for (int j = 1; j <= errors; j++)
                    newR = ((states[j] << 1) & B[text[i]]) | oldR | ((oldR | newR) << 1);

                    oldR = states[j];
                    states[j] = newR;
                if ((newR & exitCriteria) != 0) 
                    Console.WriteLine("Occurrence at position {0}", i+1);




This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Dima Statz
Israel Israel
Name: Statz Dima
Fields of interest: software

Comments and Discussions

GeneralMy vote of 4 Pin
Đỗ Hồng Ngọc23-Nov-10 5:50
MemberĐỗ Hồng Ngọc23-Nov-10 5:50 
GeneralRe: My vote of 4 Pin
Dima Statz23-Nov-10 23:13
MemberDima Statz23-Nov-10 23:13 
Hey WhiteRose1611,

Q: "When we work with char, it should be considered as int but not ushort"
A: Why do you propose to use 32bit integer to store 16bit char? Does it make sense to you?

Q: "And, using a full array (ushort.MaxValue) is not a good idea.
Why don't we use a Hastable or Dictionary? It won't take time too much because pattern.Length is just small normally."

A: Pay attention, if you are going to re-implement it for your personal purposes,
it seems that you did not understand what is going on here. The array length is not function of pattern length, it depends
on char values range which is Pow(2,16). Your another idea of using Dictionary does not make sense also, because it will add
some penalty to execution time.

Good luck, and do not hesitate to contact me if you need any help.
Dima S

GeneralRe: My vote of 4 Pin
codeprojectqas24-Nov-10 0:15
Membercodeprojectqas24-Nov-10 0:15 
GeneralRe: My vote of 4 Pin
Dima Statz24-Nov-10 0:27
MemberDima Statz24-Nov-10 0:27 
GeneralRe: My vote of 4 Pin
Richard Deeming24-Nov-10 9:18
mveRichard Deeming24-Nov-10 9:18 
GeneralRe: My vote of 4 Pin
Dima Statz15-Dec-10 2:59
MemberDima Statz15-Dec-10 2:59 
GeneralMy vote of 5 Pin
Mark Jerde9-Nov-10 2:28
MemberMark Jerde9-Nov-10 2:28 
Questionhow to find the start point of the approximate string [modified] Pin
zxj1987061613-Jun-11 21:42
Memberzxj1987061613-Jun-11 21:42 
GeneralShort and sweet, but.. Pin
leppie8-Nov-10 23:02
Memberleppie8-Nov-10 23:02 
GeneralRe: Short and sweet, but.. Pin
Dima Statz23-Nov-10 23:17
MemberDima Statz23-Nov-10 23:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.