Click here to Skip to main content
15,884,537 members
Articles / Programming Languages / C# 4.0
Tip/Trick

Levenshtein Edit Distance Algorithm

Rate me:
Please Sign up or sign in to vote.
4.97/5 (16 votes)
16 Dec 2013CPOL2 min read 42.7K   1.1K   17   8
Implement the Levenshtein Edit Distance algorithm.

Introduction

This source code is for understanding the Levenshtein Edit Distance algorithm easily. When I began to search this algorithm, I was so scared to implement it. I read all of the articles about this algorithm and I really didn't understand anything. When my research was deepening, I was trying to figure out the secret of this common algorithm. When I finished implementing it, I decided to write an article on CodeProject.com. I am hoping my implementation would be useful for people who work to understand The Levenshtein Edit Distance algorithm.  

Background

This algorithm is so commonly used for Microsoft Outlook or Microsoft Word  Spell Checking systems and Google. So thanks Vladimir Levenshtein for this algorithm. I will not try to explain the algorithm's details to you. The purpose of this article is for you to follow steps and see the flow chart of algorithm easily.  

About the Algorithm 

The edit distance between two strings is defined as the minimum number of edit operations required to transform one string into another. Let’s assume that the first string is named as the target string and the second string is named as the source string. We want to convert the source string to target. An edit is one of three operations: a delete (a character from the source string), an insert (a character from the target string), and a substitute (a character from the source string with a character from the target string). There is a fourth operation, named as match, which does not count as an edit. Consider two input strings "activate" and "caveat". Below you can see one possible transformation. In the example, a transformation is represented by a string consisting of the characters M for match, S for substitute, D for delete, and I for insert. 

D M D S M I M M D

a  c   t  i   v    a  t   e   

    c      a  v e a  t   

Image 1

Figure 1: Levenshtein Edit Distance algorithm GUI

Using the code   

I will share some code snippets of my implementation here:

In our implementation:

  • Delete cost = 0.7
  • Insert cost = 0.7
  • Substitute cost = 1.0

Match does not have a cost. 

C++
Reference = Ref_text_box.Text;
Hypotheses = Hyp_text_box.Text;

distance_table = new double[Reference.Length + 1, Hypotheses.Length + 1];
for (int i = 0; i <= Reference.Length; i++)
    distance_table[i, 0] = i * 0.7;

for (int j = 0; j <= Hypotheses.Length; j++)
    distance_table[0, j] = j * 0.7;

for (int i = 1; i <= Reference.Length; i++)
{
    for (int j = 1; j <= Hypotheses.Length; j++)
    {
        if (Reference[i - 1] == Hypotheses[j - 1])//if the letters are same 
            distance_table[i, j] = distance_table[i - 1, j - 1];
        else //if not add 1 to its neighborhoods and assign minumun of its neighborhoods 
        {
            distance_table[i, j] = Math.Min(Math.Min(distance_table[i - 1, j - 1] + 1, 
              distance_table[i - 1, j] + 0.7), distance_table[i, j - 1] + 0.7);
        }
    }

}
//create table
//Compute Distance
String distance_string = " ";
String distance_result;
int i = Reference.Length, j = Hypotheses.Length;

while (true)
{
    this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;

    if (i == 0 && j == 0)
    {
        this.dataGridView1.Rows[i].Cells[j].Style.BackColor = Color.LightGreen;

        distance_string_textbox.Text = distance_string;
        char[] a = distance_string.ToCharArray();
        Array.Reverse(a);
        distance_result = new string(a);
        distance_string_textbox.Text = distance_result;
        break;
    }
    else if (i == 0 && j > 0)
    {
        this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
        distance_string += "d";//delete
        j--;
    }
    else if (i > 0 && j == 0)
    {
        this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
        distance_string += "i";//insert
        i--;
    }
    else
    {
        if (distance_table[i - 1, j - 1] <= distance_table[i - 1, j] && 
                distance_table[i - 1, j - 1] <= distance_table[i, j - 1])
        {
            this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
            if (distance_table[i - 1, j - 1] == distance_table[i, j])
                distance_string += "m"; //match
            else
                distance_string += "s"; //substitue

            i--;
            j--;
        }

        else if (distance_table[i - 1, j] < distance_table[i, j - 1])
        {
            this.dataGridView1.Rows[i ].Cells[j+1].Style.BackColor = Color.LightGreen;
            distance_string += "i"; //insert
            i--;

        }
        else if (distance_table[i, j - 1] < distance_table[i - 1, j])
        {
            this.dataGridView1.Rows[i].Cells[j +1].Style.BackColor = Color.LightGreen;
            distance_string += "d"; //delete
            j--;
        }
    }
}

Points of Interest

This code snippet will teach to understand the Levenshtein Edit Distance algorithm easily. You can try some words and sentences to compare each other and you can follow the distance table to understand how to compute the distance of your strings.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Student University Of Kocaeli
Turkey Turkey
I am student University of Kocaeli and my department is Electronics and Telecommunication Engineering. It's my last year. So my interests are software developing and hardware developing for embedded platform.

Comments and Discussions

 
BugBug-Report Pin
Mr.PoorEnglish1-Sep-17 21:28
Mr.PoorEnglish1-Sep-17 21:28 
QuestionDifferent Costs Pin
Member 127231496-Sep-16 4:37
Member 127231496-Sep-16 4:37 
QuestionValue 0.7 or 1? Pin
coder_spree1-May-15 1:15
coder_spree1-May-15 1:15 
QuestionWhy 0.7 cost? Pin
10BRG23-May-14 3:34
10BRG23-May-14 3:34 
AnswerRe: Why 0.7 cost? Pin
furkanavcu23-May-14 10:02
professionalfurkanavcu23-May-14 10:02 
AnswerRe: Why 0.7 cost? Pin
furkanavcu7-Feb-15 3:49
professionalfurkanavcu7-Feb-15 3:49 
GeneralRe: Why 0.7 cost? Pin
10BRG3-Mar-15 1:49
10BRG3-Mar-15 1:49 
Generalother algorithm Pin
Robert Frank18-Dec-13 17:23
Robert Frank18-Dec-13 17:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.