Edit Distance is quite a interesting and popular problem. Here I present an efficient bottom up C++ program to solve it.

**Problem** – We are given 2 strings. We have to find the “edit distance” or the cost of converting one string to other. We are allowed 3 operations – Insert, Delete, Replace. All 3 have equal cost that is 1 unit.

**Example** – Edit distance between “abc” and “abd” is 1. We replaced “c” with “d”. You can also see this as “Delete c”, then “Insert d” which would give a edit distance of 2. But since we try to minimize edit distance, we consider it as single operation that is Replace. Few more examples-

EditDistance(“ab”, “bc”) = 2

EditDistance(“man”, “woman”) = 2

EditDistance(“c”, “java”) = 4

EditDistance(“abcd”, “acd”) = 2

One of the above 4 examples has wrong answer! Tell me in comments which one is wrong, and what is the correct answer ;)

We will use bottom up version of dynamic programming, as it is much cleaner and efficient (than top down). If the strings have length N and M, we will need a table of size (N+1)*(M+1). First we need to initialize the top row i.e TABLE[0][i] = i and first column i.e TABLE[i][0] = i. These are the base cases in which one of the string is empty. Now filling the rest of the TABLE column wise is simple-

TABLE[i][j] = min(

TABLE[i][j-1]+1,

TABLE[i-1][j]+1,

TABLE[i-1][j-1]+cost,

)

Here cost = 0 if STR1[i] = STR2[j], 1 if unequal

And the C++ function-

int EditDistance(string word1, string word2)
{
int i, j, l1, l2, m;
l1 = word1.length();
l2 = word2.length();
vector< vector<int> > t(l1 + 1, vector<int>(l2 + 1));
for (i = 0; i <= l1; i++)
t[i][0] = i;
for (i = 1; i <= l2; i++)
t[0][i] = i;
for (i = 1; i <= l1; i++)
{
for (j = 1; j <= l2; j++)
{
m = min(t[i-1][j], t[i][j-1]) + 1;
t[i][j] = min(m, t[i-1][j-1] + (word1[i-1] == word2[j-1] ? 0 : 1));
}
}
return t[l1][l2];
}

The TABLE for “money” and “monkey” looks like->

1 is the final answer.

### Like this:

Like Loading...

*Related*

EditDistance(“abcd”, “acd”) = 1

Have you seen the C++ implementation on Rosetta Code?

http://rosettacode.org/wiki/Levenshtein_distance#C.2B.2B

I have measured it as four times faster than your implementation. So I’m not sure where your claim of “very efficient” comes from. :)

Thanks for pointing this.

It would be useful if you also provide the code for comarison along with input. And do mention your OS, compiler and flags passed.

I just used a very simple regime for testing and a handful of strings up to about length 10. I’ve also made some changes to the Rosetta Code implementation to make it even faster, which I will try to get uploaded to Rosetta Code.

Linux, x86_64, gcc-4.7.2, -O2 -march=native -std=c++0x

This is the guts of it, I’ve left out the window dressing for brevity:

high_resolution_clock::time_point const t0(high_resolution_clock::now());

for (size_t i(0); i < 10000000; ++i)

edit_distance_three(s1, s2); // I have several edit_distance_xxxxx functions.

high_resolution_clock::time_point const t1(high_resolution_clock::now());

cout << "Total time: " << duration_cast(t1 – t0).count() << " milliseconds.\n";

cout << "Average time: " << duration_cast(t1 – t0).count() / 10000000.0 << " nanoseconds.\n";

Could you explain what line 18 of your code does, please?

Line 18 considers the cost of replacement. So if word1[i-1] == word2[j-1] then the cost of replacement is 0 otherwise it is 1.

Thank you for your quick response!

WOW very clearly explained :D Thanks a lot :D