How are we going to compute minimum edit distance? The standard algorithm is with dynamic programing. Down programming is a

tabular method of computation. And what we’re gonna do, is we’re gonna compute D,

the distance between two strings X and Y, X of length N, Y of length M, by combining

solutions to sub-problems. And combining solutions to sub-problems, is the

intuition of all dynamic programming algorithms. The intuition is, is very

simple. We’re gonna for small prefixes of length I of string X, and J of string Y,

we’ll compute the difference between those strings and we’ll compute our larger

Distances for larger strings based on those previously computed smaller values.

In other words, we’re going to compute the distance I-j between prefixes of string X

of length I and prefixes of string. Y of length J for all I and J, and we’ll end up

in the end with the di, the distance. So let’s look at the actual equation. Here’s

the equation for defining the minimum at a distance. And I’ve given you levenshtein

distance which is the distance when there is a cost of one for insertion, one for

deletions, and two for substitutions. So let’s look at the initialization

condition. So the, we first say that. Any characters in X, so this is the X string

for it, for the I character is an X string. The distance between those and the

null string in Y is the cost of deleting each of those characters. So the cost of

those is, is the length of the string. We’re deleting each character. And

similarly for inserting all the characters into Y. To create the string y. The

distance between the null string x and the string of y is just a length of y. The

insertion cost of y. And then, we’ll do the recurrence relation. So, walking

through string x and walking through string y. We’ll have that the distance in

any particular cell of our matrix is going to be the minimum way of getting to that

cell from three previous cells. If we go from the string I, that’s one shorter so

we’re deleting one more thing in i. To make a J. Or we’re inserting one thing

into J to make it longer. Or we’re substituting between the previous string I

of length I, X of length I minus one and Y of length J minus one. We’re adding in a

new character. If it’s the same in both strings we have a cost of zero. If it’s

different we have the substitution cost of two. And then at the end, the distance

between the two strings is simply the It’s simply the, the DFNM, the upper right

corner of the matrix. So here’s our table. And we can fill in each element of the

table from using this equation that tells us the deletion cost, the insertion

cost and the substitution cost. So let’s do that. I put the equation up here in the

corner. So we want to know what’s the distance between the null string of

intention and the null string of execution, obviously zero. The null string

I to the string nothing is still the cost of deleting an i. That’s one. So now let’s

try to compute whats the cost of converting IN to E. While intuitively we

expect it’s going to be a deletion and a substitution. So let’s see if that works

out. Alright, so the, Element in this cell is the minimum of three values. It’s this

distance plus one, this distance plus one, or this distance plus either two if I and

zero are different, or zero if they are the same. Well, they’re different, so it’s

the minimum of I+1, which is two. One+1 which is two, or zero + two which is two,

so we have two. So we’re gonna write two in this [inaudible]. Similarly, if we

wanna know the distance between I, N and E, it’s the minimum distance, of I-N to

nothing, plus one. So two+1 or three. Or, the different distance between I and E

plus the cost of adding in that N or three. Or the cost of having just an I and

adding in that n to e substitution which is two or three. So again, we have three

here. We have a two and we have a three. And if we continue along this manner,

again, in each case looking at the previous cells and using this equation

over here, we’ll slowly end up with. So, [sound]. And if we continue along this, in

this manner, we’re gonna end up with the following complete table. So. Every cell

in this table, let’s take this cell, tells you the cost of sub, of ca, of at a

distance, of editing the string INTE and turning it into the string EXE and that

means that this value here in the upper right corner is the cost, the at a

distance, between intention and execution, the cost of turning intention into

execution and we see the value eight, which we earlier said was the

levenstein distance. So we have levenstein distance equals eight. That’s our

algorithm for computing minimum edit distance.