How are we going to compute interim at a

distance? The standard algorithm is with dynamic programing. Down programming is a

tabular method of computation. And what we’re gonna do, is we’re gonna compute D,

the distance between two strings X and Y, X of length N, Y of length M, by combining

solutions to sub-problems. And combining solutions to sub-problems, is the

intuition of all dynamic programming algorithms. The intuition is, is very

simple. We’re gonna for small prefixes of length I of string X, and J of string Y,

we’ll compute the difference between those strings and we’ll compute our larger

Distances for larger strings based on those previously computed smaller values.

In other words, we’re going to compute the distance I-j between prefixes of string X

of length I and prefixes of string. Y of length J for all I and J, and we’ll end up

in the end with the di, the distance. So let’s look at the actual equation. Here’s

the equation for defining the minimum at a distance. And I’ve given you levenshtein

distance which is the distance when there is a cost of one for insertion, one for

deletions, and two for substitutions. So let’s look at the initialization

condition. So the, we first say that. Any characters in X, so this is the X string

for it, for the I character is an X string. The distance between those and the

null string in Y is the cost of deleting each of those characters. So the cost of

those is, is the length of the string. We’re deleting each character. And

similarly for inserting all the characters into Y. To create the string y. The

distance between the null string x and the string of y is just a length of y. The

insertion cost of y. And then, we’ll do the recurrence relation. So, walking

through string x and walking through string y. We’ll have that the distance in

any particular cell of our matrix is going to be the minimum way of getting to that

cell from three previous cells. If we go from the string I, that’s one shorter so

we’re deleting one more thing in i. To make a J. Or we’re inserting one thing

into J to make it longer. Or we’re substituting between the previous string I

of length I, X of length I minus one and Y of length J minus one. We’re adding in a

new character. If it’s the same in both strings we have a cost of zero. If it’s

different we have the substitution cost of two. And then at the end, the distance

between the two strings is simply the It’s simply the, the DFNM, the upper right

corner of the matrix. So here’s our table. And we can fill in each element of the

table from using this equation that tells us the [inaudible] cost. The insertion

cost. And the substitution cost. So let’s do that. I put the equation up here in the

corner. So we want to know what’s the distance between the null string of

intention and the null string of execution, obviously zero. The null string

I to the string nothing is still the cost of deleting an i. That’s one. So now let’s

try to compute whats the cost of converting IN to E. While intuitively we

expect it’s going to be a deletion and a substitution. So let’s see if that works

out. Alright, so the, Element in this cell is the minimum of three values. It’s this

distance plus one, this distance plus one, or this distance plus either two if I and

zero are different, or zero if they are the same. Well, they’re different, so it’s

the minimum of I+1, which is two. One+1 which is two, or zero + two which is two,

so we have two. So we’re gonna write two in this [inaudible]. Similarly, if we

wanna know the distance between I, N and E, it’s the minimum distance, of I-N to

nothing, plus one. So two+1 or three. Or, the different distance between I and E

plus the cost of adding in that N or three. Or the cost of having just an I and

adding in that n to e substitution which is two or three. So again, we have three

here. We have a two and we have a three. And if we continue along this manner,

again, in each case looking at the previous cells and using this equation

over here, we’ll slowly end up with. So, [sound]. And if we continue along this, in

this manner, we’re gonna end up with the following complete table. So. Every cell

in this table, let’s take this cell, tells you the cost of sub, of ca, of at a

distance, of editing the string inte and turning it into the string exe and that

means that this value here in the upper right corner is the cost, the at a

distance, between intention and execution, the cost of turning intention into

execution and we see the value eight, which we earlier said was the, the

levenstein distance. So we had levenstein distance equals eight. That’s our

algorithm for computing [inaudible] at a

## 9 Comments

## Venkata R Edara

excellent explanation

## Kunal8683

Good one , what if we had one more operation as swap ??

## maqusss

2:20 – "inserting one thing into j" – dont understand what oyu mean. You cant insert anything in prefix of y.

## fasshocker

Hi, I have a question. Why did you insert in row 6, cloumn 4 the value 5. I thought it should be 4. If you compare E to E then the numbers are raised by 0, so the minimum value of 4,5 and 6 is 4. What is my mistake?

## james hudson

i dont get why you take 1 way from the min

## Alej

You are talking about "INTE" and "EXE" i believe.

You only add zero to the entry to the southwest (diagonal left-down) if the letters match.

Going from "INT" and "EX" to "INTE" and "EXE" doesn't change edit distance.

The entry to the left which is "INTE" and "EX" has an edit distance of 4

Going from this to "INTE" and "EXE" adds another edit, which brings the total to 5.

Try writing out the edits between the prefixes.

## DreamTurtle S

I have no idea why you are adding 2 for the case where X(i) != Y(j)

It clearly states in the pseudo code of the Levenshtein wiki that the cost will be "1" if they are not equal (see the else clause in the pseudo code for Levenshtein distance)

I believe you chose to give the cost of substitution 2 because you view it as a combination of delete + insert of a new letter, while the I believe the true Lev method sees substitution as 1 operation with a cost of 1.

## Karan Gill

The total edit distance should be 5.. See INTENtion and EXECUtion

## stephenftw7

This code is wrong. Substitutions don't cost 2.