Edit Distance using Dynamic Programming

Edit Distance is quite a interesting and popular problem. Here I present an efficient bottom up C++ program to solve it.

Problem – We are given 2 strings. We have to find the “edit distance” or the cost of converting one string to other. We are allowed 3 operations – Insert, Delete, Replace. All 3 have equal cost that is 1 unit.

Example – Edit distance between “abc” and “abd” is 1. We replaced “c” with “d”. You can also see this as “Delete c”, then “Insert d” which would give a edit distance of 2. But since we try to minimize edit distance, we consider it as single operation that is Replace. Few more examples-
EditDistance(“ab”, “bc”) = 2
EditDistance(“man”, “woman”) = 2
EditDistance(“c”, “java”) = 4
EditDistance(“abcd”, “acd”) = 2

One of the above 4 examples has wrong answer! Tell me in comments which one is wrong,  and what is the correct answer ;)

We will use bottom up version of dynamic programming, as it is much cleaner and efficient (than top down). If the strings have length N and M, we will need a table of size (N+1)*(M+1). First we need to initialize the top row i.e TABLE[0][i] = i and first column i.e TABLE[i][0] = i. These are the base cases in which one of the string is empty. Now filling the rest of the TABLE column wise is simple-

TABLE[i][j] = min(
Here cost = 0 if STR1[i] = STR2[j], 1 if unequal

And the C++ function-

int EditDistance(string word1, string word2)
    int i, j, l1, l2, m;
    l1 = word1.length();
    l2 = word2.length();
    vector< vector<int> > t(l1 + 1, vector<int>(l2 + 1));

    for (i = 0; i <= l1; i++)
        t[i][0] = i;
    for (i = 1; i <= l2; i++)
        t[0][i] = i;

    for (i = 1; i <= l1; i++)
        for (j = 1; j <= l2; j++)
            m = min(t[i-1][j], t[i][j-1]) + 1;
            t[i][j] = min(m, t[i-1][j-1] + (word1[i-1] == word2[j-1] ? 0 : 1));
    return t[l1][l2];

The TABLE for “money” and “monkey” looks like->


1 is the final answer.

About these ads
This entry was posted in Algorithm, Programming and tagged , . Bookmark the permalink.

7 Responses to Edit Distance using Dynamic Programming

  1. Jeremy W. Murphy says:

    EditDistance(“abcd”, “acd”) = 1

    Have you seen the C++ implementation on Rosetta Code?
    I have measured it as four times faster than your implementation. So I’m not sure where your claim of “very efficient” comes from. :)

    • vinayakgarg says:

      Thanks for pointing this.
      It would be useful if you also provide the code for comarison along with input. And do mention your OS, compiler and flags passed.

      • Jeremy W. Murphy says:

        I just used a very simple regime for testing and a handful of strings up to about length 10. I’ve also made some changes to the Rosetta Code implementation to make it even faster, which I will try to get uploaded to Rosetta Code.
        Linux, x86_64, gcc-4.7.2, -O2 -march=native -std=c++0x
        This is the guts of it, I’ve left out the window dressing for brevity:

        high_resolution_clock::time_point const t0(high_resolution_clock::now());
        for (size_t i(0); i < 10000000; ++i)
        edit_distance_three(s1, s2); // I have several edit_distance_xxxxx functions.
        high_resolution_clock::time_point const t1(high_resolution_clock::now());
        cout << "Total time: " << duration_cast(t1 – t0).count() << " milliseconds.\n";
        cout << "Average time: " << duration_cast(t1 – t0).count() / 10000000.0 << " nanoseconds.\n";

  2. Brett says:

    Could you explain what line 18 of your code does, please?

  3. ajay says:

    WOW very clearly explained :D Thanks a lot :D

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s