2   2   Computing Minimum Edit Distance 5 54
Articles,  Blog

2 2 Computing Minimum Edit Distance 5 54


How are we going to compute minimum edit distance? The standard algorithm is with dynamic programing. Down programming is a
tabular method of computation. And what we’re gonna do, is we’re gonna compute D,
the distance between two strings X and Y, X of length N, Y of length M, by combining
solutions to sub-problems. And combining solutions to sub-problems, is the
intuition of all dynamic programming algorithms. The intuition is, is very
simple. We’re gonna for small prefixes of length I of string X, and J of string Y,
we’ll compute the difference between those strings and we’ll compute our larger
Distances for larger strings based on those previously computed smaller values.
In other words, we’re going to compute the distance I-j between prefixes of string X
of length I and prefixes of string. Y of length J for all I and J, and we’ll end up
in the end with the di, the distance. So let’s look at the actual equation. Here’s
the equation for defining the minimum at a distance. And I’ve given you levenshtein
distance which is the distance when there is a cost of one for insertion, one for
deletions, and two for substitutions. So let’s look at the initialization
condition. So the, we first say that. Any characters in X, so this is the X string
for it, for the I character is an X string. The distance between those and the
null string in Y is the cost of deleting each of those characters. So the cost of
those is, is the length of the string. We’re deleting each character. And
similarly for inserting all the characters into Y. To create the string y. The
distance between the null string x and the string of y is just a length of y. The
insertion cost of y. And then, we’ll do the recurrence relation. So, walking
through string x and walking through string y. We’ll have that the distance in
any particular cell of our matrix is going to be the minimum way of getting to that
cell from three previous cells. If we go from the string I, that’s one shorter so
we’re deleting one more thing in i. To make a J. Or we’re inserting one thing
into J to make it longer. Or we’re substituting between the previous string I
of length I, X of length I minus one and Y of length J minus one. We’re adding in a
new character. If it’s the same in both strings we have a cost of zero. If it’s
different we have the substitution cost of two. And then at the end, the distance
between the two strings is simply the It’s simply the, the DFNM, the upper right
corner of the matrix. So here’s our table. And we can fill in each element of the
table from using this equation that tells us the deletion cost, the insertion
cost and the substitution cost. So let’s do that. I put the equation up here in the
corner. So we want to know what’s the distance between the null string of
intention and the null string of execution, obviously zero. The null string
I to the string nothing is still the cost of deleting an i. That’s one. So now let’s
try to compute whats the cost of converting IN to E. While intuitively we
expect it’s going to be a deletion and a substitution. So let’s see if that works
out. Alright, so the, Element in this cell is the minimum of three values. It’s this
distance plus one, this distance plus one, or this distance plus either two if I and
zero are different, or zero if they are the same. Well, they’re different, so it’s
the minimum of I+1, which is two. One+1 which is two, or zero + two which is two,
so we have two. So we’re gonna write two in this [inaudible]. Similarly, if we
wanna know the distance between I, N and E, it’s the minimum distance, of I-N to
nothing, plus one. So two+1 or three. Or, the different distance between I and E
plus the cost of adding in that N or three. Or the cost of having just an I and
adding in that n to e substitution which is two or three. So again, we have three
here. We have a two and we have a three. And if we continue along this manner,
again, in each case looking at the previous cells and using this equation
over here, we’ll slowly end up with. So, [sound]. And if we continue along this, in
this manner, we’re gonna end up with the following complete table. So. Every cell
in this table, let’s take this cell, tells you the cost of sub, of ca, of at a
distance, of editing the string INTE and turning it into the string EXE and that
means that this value here in the upper right corner is the cost, the at a
distance, between intention and execution, the cost of turning intention into
execution and we see the value eight, which we earlier said was the
levenstein distance. So we have levenstein distance equals eight. That’s our
algorithm for computing minimum edit distance.

Leave a Reply

Your email address will not be published. Required fields are marked *