Rajesh Rao: Computing a Rosetta Stone for the Indus script
Articles,  Blog

Rajesh Rao: Computing a Rosetta Stone for the Indus script


I’d like to begin with a thought experiment. Imagine that it’s 4,000 years into the future. Civilization as we know it has ceased to exist — no books, no electronic devices, no Facebook or Twitter. All knowledge of the English language and the English alphabet has been lost. Now imagine archeologists digging through the rubble of one of our cities. What might they find? Well perhaps some rectangular pieces of plastic with strange symbols on them. Perhaps some circular pieces of metal. Maybe some cylindrical containers with some symbols on them. And perhaps one archeologist becomes an instant celebrity when she discovers — buried in the hills somewhere in North America — massive versions of these same symbols. Now let’s ask ourselves, what could such artifacts say about us to people 4,000 years into the future? This is no hypothetical question. In fact, this is exactly the kind of question we’re faced with when we try to understand the Indus Valley civilization, which existed 4,000 years ago. The Indus civilization was roughly contemporaneous with the much better known Egyptian and the Mesopotamian civilizations, but it was actually much larger than either of these two civilizations. It occupied the area of approximately one million square kilometers, covering what is now Pakistan, Northwestern India and parts of Afghanistan and Iran. Given that it was such a vast civilization, you might expect to find really powerful rulers, kings, and huge monuments glorifying these powerful kings. In fact, what archeologists have found is none of that. They’ve found small objects such as these. Here’s an example of one of these objects. Well obviously this is a replica. But who is this person? A king? A god? A priest? Or perhaps an ordinary person like you or me? We don’t know. But the Indus people also left behind artifacts with writing on them. Well no, not pieces of plastic, but stone seals, copper tablets, pottery and, surprisingly, one large sign board, which was found buried near the gate of a city. Now we don’t know if it says Hollywood, or even Bollywood for that matter. In fact, we don’t even know what any of these objects say, and that’s because the Indus script is undeciphered. We don’t know what any of these symbols mean. The symbols are most commonly found on seals. So you see up there one such object. It’s the square object with the unicorn-like animal on it. Now that’s a magnificent piece of art. So how big do you think that is? Perhaps that big? Or maybe that big? Well let me show you. Here’s a replica of one such seal. It’s only about one inch by one inch in size — pretty tiny. So what were these used for? We know that these were used for stamping clay tags that were attached to bundles of goods that were sent from one place to the other. So you know those packing slips you get on your FedEx boxes? These were used to make those kinds of packing slips. You might wonder what these objects contain in terms of their text. Perhaps they’re the name of the sender or some information about the goods that are being sent from one place to the other — we don’t know. We need to decipher the script to answer that question. Deciphering the script is not just an intellectual puzzle; it’s actually become a question that’s become deeply intertwined with the politics and the cultural history of South Asia. In fact, the script has become a battleground of sorts between three different groups of people. First, there’s a group of people who are very passionate in their belief that the Indus script does not represent a language at all. These people believe that the symbols are very similar to the kind of symbols you find on traffic signs or the emblems you find on shields. There’s a second group of people who believe that the Indus script represents an Indo-European language. If you look at a map of India today, you’ll see that most of the languages spoken in North India belong to the Indo-European language family. So some people believe that the Indus script represents an ancient Indo-European language such as Sanskrit. There’s a last group of people who believe that the Indus people were the ancestors of people living in South India today. These people believe that the Indus script represents an ancient form of the Dravidian language family, which is the language family spoken in much of South India today. And the proponents of this theory point to that small pocket of Dravidian-speaking people in the North, actually near Afghanistan, and they say that perhaps, sometime in the past, Dravidian languages were spoken all over India and that this suggests that the Indus civilization is perhaps also Dravidian. Which of these hypotheses can be true? We don’t know, but perhaps if you deciphered the script, you would be able to answer this question. But deciphering the script is a very challenging task. First, there’s no Rosetta Stone. I don’t mean the software; I mean an ancient artifact that contains in the same text both a known text and an unknown text. We don’t have such an artifact for the Indus script. And furthermore, we don’t even know what language they spoke. And to make matters even worse, most of the text that we have are extremely short. So as I showed you, they’re usually found on these seals that are very, very tiny. And so given these formidable obstacles, one might wonder and worry whether one will ever be able to decipher the Indus script. In the rest of my talk, I’d like to tell you about how I learned to stop worrying and love the challenge posed by the Indus script. I’ve always been fascinated by the Indus script ever since I read about it in a middle school textbook. And why was I fascinated? Well it’s the last major undeciphered script in the ancient world. My career path led me to become a computational neuroscientist, so in my day job, I create computer models of the brain to try to understand how the brain makes predictions, how the brain makes decisions, how the brain learns and so on. But in 2007, my path crossed again with the Indus script. That’s when I was in India, and I had the wonderful opportunity to meet with some Indian scientists who were using computer models to try to analyze the script. And so it was then that I realized there was an opportunity for me to collaborate with these scientists, and so I jumped at that opportunity. And I’d like to describe some of the results that we have found. Or better yet, let’s all collectively decipher. Are you ready? The first thing that you need to do when you have an undeciphered script is try to figure out the direction of writing. Here are two texts that contain some symbols on them. Can you tell me if the direction of writing is right to left or left to right? I’ll give you a couple of seconds. Okay. Right to left, how many? Okay. Okay. Left to right? Oh, it’s almost 50/50. Okay. The answer is: if you look at the left-hand side of the two texts, you’ll notice that there’s a cramping of signs, and it seems like 4,000 years ago, when the scribe was writing from right to left, they ran out of space. And so they had to cram the sign. One of the signs is also below the text on the top. This suggests the direction of writing was probably from right to left, and so that’s one of the first things we know, that directionality is a very key aspect of linguistic scripts. And the Indus script now has this particular property. What other properties of language does the script show? Languages contain patterns. If I give you the letter Q and ask you to predict the next letter, what do you think that would be? Most of you said U, which is right. Now if I asked you to predict one more letter, what do you think that would be? Now there’s several thoughts. There’s E. It could be I. It could be A, but certainly not B, C or D, right? The Indus script also exhibits similar kinds of patterns. There’s a lot of text that start with this diamond-shaped symbol. And this in turn tends to be followed by this quotation marks-like symbol. And this is very similar to a Q and U example. This symbol can in turn be followed by these fish-like symbols and some other signs, but never by these other signs at the bottom. And furthermore, there’s some signs that really prefer the end of texts, such as this jar-shaped sign, and this sign, in fact, happens to be the most frequently occurring sign in the script. Given such patterns, here was our idea. The idea was to use a computer to learn these patterns, and so we gave the computer the existing texts. And the computer learned a statistical model of which symbols tend to occur together and which symbols tend to follow each other. Given the computer model, we can test the model by essentially quizzing it. So we could deliberately erase some symbols, and we can ask it to predict the missing symbols. Here are some examples. You may regard this as perhaps the most ancient game of Wheel of Fortune. What we found was that the computer was successful in 75 percent of the cases in predicting the correct symbol. In the rest of the cases, typically the second best guess or third best guess was the right answer. There’s also practical use for this particular procedure. There’s a lot of these texts that are damaged. Here’s an example of one such text. And we can use the computer model now to try to complete this text and make a best guess prediction. Here’s an example of a symbol that was predicted. And this could be really useful as we try to decipher the script by generating more data that we can analyze. Now here’s one other thing you can do with the computer model. So imagine a monkey sitting at a keyboard. I think you might get a random jumble of letters that looks like this. Such a random jumble of letters is said to have a very high entropy. This is a physics and information theory term. But just imagine it’s a really random jumble of letters. How many of you have ever spilled coffee on a keyboard? You might have encountered the stuck-key problem — so basically the same symbol being repeated over and over again. This kind of a sequence is said to have a very low entropy because there’s no variation at all. Language, on the other hand, has an intermediate level of entropy; it’s neither too rigid, nor is it too random. What about the Indus script? Here’s a graph that plots the entropies of a whole bunch of sequences. At the very top you find the uniformly random sequence, which is a random jumble of letters — and interestingly, we also find the DNA sequence from the human genome and instrumental music. And both of these are very, very flexible, which is why you find them in the very high range. At the lower end of the scale, you find a rigid sequence, a sequence of all A’s, and you also find a computer program, in this case in the language Fortran, which obeys really strict rules. Linguistic scripts occupy the middle range. Now what about the Indus script? We found that the Indus script actually falls within the range of the linguistic scripts. When this result was first published, it was highly controversial. There were people who raised a hue and cry, and these people were the ones who believed that the Indus script does not represent language. I even started to get some hate mail. My students said that I should really seriously consider getting some protection. Who’d have thought that deciphering could be a dangerous profession? What does this result really show? It shows that the Indus script shares an important property of language. So, as the old saying goes, if it looks like a linguistic script and it acts like a linguistic script, then perhaps we may have a linguistic script on our hands. What other evidence is there that the script could actually encode language? Well linguistic scripts can actually encode multiple languages. So for example, here’s the same sentence written in English and the same sentence written in Dutch using the same letters of the alphabet. If you don’t know Dutch and you only know English and I give you some words in Dutch, you’ll tell me that these words contain some very unusual patterns. Some things are not right, and you’ll say these words are probably not English words. The same thing happens in the case of the Indus script. The computer found several texts — two of them are shown here — that have very unusual patterns. So for example the first text: there’s a doubling of this jar-shaped sign. This sign is the most frequently-occurring sign in the Indus script, and it’s only in this text that it occurs as a doubling pair. Why is that the case? We went back and looked at where these particular texts were found, and it turns out that they were found very, very far away from the Indus Valley. They were found in present day Iraq and Iran. And why were they found there? What I haven’t told you is that the Indus people were very, very enterprising. They used to trade with people pretty far away from where they lived, and so in this case, they were traveling by sea all the way to Mesopotamia, present-day Iraq. And what seems to have happened here is that the Indus traders, the merchants, were using this script to write a foreign language. It’s just like our English and Dutch example. And that would explain why we have these strange patterns that are very different from the kinds of patterns you see in the text that are found within the Indus Valley. This suggests that the same script, the Indus script, could be used to write different languages. The results we have so far seem to point to the conclusion that the Indus script probably does represent language. If it does represent language, then how do we read the symbols? That’s our next big challenge. So you’ll notice that many of the symbols look like pictures of humans, of insects, of fishes, of birds. Most ancient scripts use the rebus principle, which is, using pictures to represent words. So as an example, here’s a word. Can you write it using pictures? I’ll give you a couple seconds. Got it? Okay. Great. Here’s my solution. You could use the picture of a bee followed by a picture of a leaf — and that’s “belief,” right. There could be other solutions. In the case of the Indus script, the problem is the reverse. You have to figure out the sounds of each of these pictures such that the entire sequence makes sense. So this is just like a crossword puzzle, except that this is the mother of all crossword puzzles because the stakes are so high if you solve it. My colleagues, Iravatham Mahadevan and Asko Parpola, have been making some headway on this particular problem. And I’d like to give you a quick example of Parpola’s work. Here’s a really short text. It contains seven vertical strokes followed by this fish-like sign. And I want to mention that these seals were used for stamping clay tags that were attached to bundles of goods, so it’s quite likely that these tags, at least some of them, contain names of merchants. And it turns out that in India there’s a long tradition of names being based on horoscopes and star constellations present at the time of birth. In Dravidian languages, the word for fish is “meen” which happens to sound just like the word for star. And so seven stars would stand for “elu meen,” which is the Dravidian word for the Big Dipper star constellation. Similarly, there’s another sequence of six stars, and that translates to “aru meen,” which is the old Dravidian name for the star constellation Pleiades. And finally, there’s other combinations, such as this fish sign with something that looks like a roof on top of it. And that could be translated into “mey meen,” which is the old Dravidian name for the planet Saturn. So that was pretty exciting. It looks like we’re getting somewhere. But does this prove that these seals contain Dravidian names based on planets and star constellations? Well not yet. So we have no way of validating these particular readings, but if more and more of these readings start making sense, and if longer and longer sequences appear to be correct, then we know that we are on the right track. Today, we can write a word such as TED in Egyptian hieroglyphics and in cuneiform script, because both of these were deciphered in the 19th century. The decipherment of these two scripts enabled these civilizations to speak to us again directly. The Mayans started speaking to us in the 20th century, but the Indus civilization remains silent. Why should we care? The Indus civilization does not belong to just the South Indians or the North Indians or the Pakistanis; it belongs to all of us. These are our ancestors — yours and mine. They were silenced by an unfortunate accident of history. If we decipher the script, we would enable them to speak to us again. What would they tell us? What would we find out about them? About us? I can’t wait to find out. Thank you. (Applause)

100 Comments

  • mooveegee

    Rajesh Rao didn't make mention of the existnce of the rongorongo tablets from Easter island, which also contain a verified version of the Indus valley script, with much longer texts, which would give the computers & researchers much longer strings to work with! I hope this finds its way into the dialogue on this fascinating project

  • Hans Van Der Linde

    Well for starters, if it really is a type of clay stamp then they are naturally reading it wrong.
    On the stamp it is from right to left but when stamped then it is obviously from left to right, the question is — WHEN are we looking at the stamp and WHEN are we looking at the stamped "note"

  • Sakthi Kumaran

    indus valley symbols found in #keezhadi excavations in tamil nadu….. its most likely to be the mother of all Dravidian languages….. see this link below…

    https://www.facebook.com/photo.php?fbid=2754561971241342&set=a.129184967112402&type=3&theater

    https://www.hindustantimes.com/india-news/new-study-connects-tamil-nadu-with-indus-valley-civilisation/story-ESlR55vEIZQPvq2Q0jXeVP.html?fbclid=IwAR1p3-6oKzHgDa2idUACybBC7JMktp_cZEdx5tZn5Ex0AZhFJKyvCCpzHHo

  • Cyan Diaz

    oooooo, new script, I love scripts. I'm going to write that font in and decipher it for you now. i'll get the sample texts from the British Museum. They may if anybody does have it.

  • zohar

    那些不是印度人的祖先, 就像黄帝不是中国人的祖先一样, 蚩尤也不是苗族的祖先, 看看他们的样子就知道他们是被毁灭不存在于地表的, 古印度有核武, 就是说文明到一个很高的科技, 还是会被毁灭, 除了上帝, 谁有能力把地表都清理干净的?

  • ANILKUMAR PINGILI

    That seal :Fish means through water, it's definitely not ox nor cow it's buffalo cuz it's horns bent , that animal is with bisexual organs ( transporting male and female animals ) and those are not tied , upper count is for male , lower for female animals , Finally it's written in left to right .

    And

    Through this video

    Those symbols

    Crabs , birds , doves , peacocks , fishes ( types like cat fish etc …) Tortoise , ,pots , mud pots …

    THOSE ARE "TRADE " SYMBOLE LEFT OVER NOW .

    Might we lost main things buried in Earth still .
    Ok

    Like toady we see upside only = glass.

    Fragile = glass .

    Green Dot for veg .

    Not in flight symbols ( for batteries etc )

    Bye

    Etc .

  • JP Abraham

    The biggest lie he said was the indus valley civilization does not belong the the south indians or the north indians but belong to all uf us.Thats a biggest LIE ! Indus valley civilization is a Tamil civilization.Tamils ruled all of India in the ancient days .This moron should get his head checked !

  • Muzaffar Ahmed

    Brahui is spoken in vast land of pakistan still from Sindh province to across the border of afghanistan Quetta city is capital of brahui

  • SENTHIL MURUGAN

    Someone if you know how to contact this man pleass reply or send him this message
    In the excavations of keezhadi, tamilnadu, India(south india) various scripts were found enscribed on pottery and stones that resemble the indus script in a very high proportion. If he could analyse that too with his computer model perhaps we can decipher the whole language.
    P. S: those scripts dated back to 2700 years

  • adamdgr80

    Great research! i think more scholars need to work on deciphering the Indus Valley Script. Most of the world ignores it, that's sad.

  • rajkumar kumar

    Ancient Tamil Civilization has been found in Keezhadi, Tamil Nadu State, South India. It may be in same time period as Indus Valley or it may be older than Indus Valley. The world has to find entire history of it..

  • Ashok Piano lover

    Elu meen, aaru meen is Tamil numbers 7, 6 … That's cool to know that Indus valley civilization used Tamil sounds in their language, if it's true.

  • Mike Moloch

    Give this man a bottle of water FFS! I DO NOT want to hear the ASMR version of his dry sticky tongue trying to break free from his palette! JTFC!

  • Mithun Rts

    Check Keezhadi (கீழடி), Adichanallur (ஆதிச்சநல்லூர்) archeological sites in Tamil Nadu. They strongly resemble Indus Valley. There are inscriptions in Tamizhi (ancient Tamil script) too.

  • Narendra

    In each every ancient civilization peoples talk about stars and star systems, may be he is in right direction to solve this language puzzle

  • bluemooner heart

    How is it that Sino-Tibetan language family was ignored in the hypothesis (as if they don't exist in India, and people migrate for change in habitat)? The speakers did not all come from China but from India. Maybe, studying these languages from India (especially northeast India) atleast and their folkloristics could also help in a hypothesis 4.

  • experience science

    This civilisation was the native civilization of India🇮🇳. DRAVIDIAN… LONG LIVE THE NATIVE DRAVIDIAN…. Don't be slave, learn our true HISTORY.

  • Balu N

    It is told that Dravidian languages are…
    1)Tamil… Very old. Archeological evidence 900BC.
    2)Kannada…6th century
    3)Telugu…10th
    4)Malayalam…15th
    5)Tulu…No idea
    So, The speaker mentions which language as Dravidian language?

  • Padmanaban Nagarajan

    These symbols are in thamili(Tamil) look keeladi archaeological site relation with Indus valley civilization u can read tamil with that u can't read sanscrit because it has no letters till 7nth century

  • THE MULTIVERSE LORD

    It is written from left to right because it is a seal . So after getting stamp the print will come backwards then it can be readed from left to right

  • Bob Gillis

    "… it is the last major undecipherable script in the world."

    I would beg to differ; as far as I know, the Mayan hieroglyphs have not been deciphered.

  • JP Abraham

    Most indian writers and historians are dis-honest. Totally untrusted . The north indians elite are are always seen to take on an idelogical or political view to serve their vested interest.Tony Joseph is another fraud denying the ancient indian civilization as being a Tamil civilization.

    The rejection of Indo-Aryan Migration theory are the Hindutva people that are scared of being accused as the descendents of Aryan invaders/migrants.

    They are fearful of being seen of an ethnicity not native to India different from the native dravidians .They were also fearful to be seen with a culture inferior to the Tamil civilization .The tamils are the authentic indians

  • Rovingscot

    Recently there was a documentary about Alexander following the river to India it is thought to be river Sersvati whether it had already dried up of still flowing there was a geological event which blocked and or changed its course
    If you follow the course of Indus and where it moves away towards Mohejo Dar and Harapa and other civilisation of the period it follows abandoned cities of the period to Indian Gujrat and then to the Arabian Sea this must have been course of river zsersvati

  • Ramanathan RNJ

    Everyone will encourage your findings which relates Indus civilization containing closer Tamil language ties ( dravidian ancestry) except Archeological survey of India ….

  • Ali Ismael

    So India threw the indus valley away to the Pakistanis. Intelligent Indians !! 4000 years ago Sanskrit existed, but no one could read Indus script, not a single head ! Some timeline glitch somewhere !

  • Vetrivel

    https://youtu.be/kwYxHPXIaao?t=812 – fourth row – Fish to Birds – Evolution of life. This is same as how current era(2019 ) understands.

  • Akhil Seth

    sign language is used for deaf ppl , why can't that be used as a text language itself? , and if it was in 3d..i mean like books for blind.. so it can solve the purpose for all at once. very creative way.

  • Ateeq Ahmed

    Nice presentation but do use the correct map. You have shown Pakistan part of Kashmir in India which is incorrect depiction of actual situation.

  • United مسّلم

    Hindu gods came from turkmenistan
    Great Arya Family
    Allma Iqbal Arya Kashmiri Brahman
    Muslim… people need to see His Phd papar to know about Great Aryan Family & their love for spiritually /Metaphysics. ANATOLIA turkey to pakistan including united Kashmir & United Punjab expect some our Brothers ..THE whole Arya family is monitheist Abrahmi/Muslim.
    DEAR Local/Darvidian up bihar bengal & south We are Sorry for all Injustice done in Past to till to day. Now its ur turn to negate these turkmenistani gods….
    سچ کہہ دوں اے برہمن! گر تو برا نہ مانے
    تیرے صنم کدوں کے بت ہو گئے پرانے
    صنم کدہ: بت خانہ، مندر۔
    May I tell the truth O Brahman! If it does not displease you! The idols of your temple have become anachronistic.

    ONE God One Human Nation
    نکتہ ئی میگویم از مردان حال
    امتان را "لا" جلال "الا" جمال
    میں صاحب حال بزرگوں کی بات بتاتا ہوں ، امتوں کے لیے لا جلال ہے اور الا جمال (لا سے غیر اللہ کی نفی ہے اور الا میں اللہ تعالے کے سامنے جھک جانا۔
    I tell thee a significant point known only to the people of ecstasy: For nations, negation expresses power, affirmation expresses beauty.
    (People of ecstasy, mardan-i-hal: Persons who pass through different states in their spiritual experiences. This phrase stands here for people engaged in dynamic activity in contrast to mardan-i-hal, people who only talk and do nothing. Jalal: Power. Jamal: Beauty.)
    ہر کہ اندر دست او شمشیر لاست
    جملہ موجودات را فرمانرواست
    جس کے ہاتھ میں لا کی شمشیر ہے وہ ساری موجودات کا فرمانروا ہے۔
    He who has the sword of negation in his hands is the ruler of all the universe.
    (Translated by Bashir Ahmed dar)

  • senthilrajan K

    remember this is not dravidian language. This is tamil. DOnt hide the Past by names. Dravidian is just 300 yrs old. tamil is 20000 yrs old.

  • anonymous opinions

    The combination of strong oral traditions, largely perishable materials on which to write on, lack of massive empires and humid and hot weather all contribute to such few Ancient Indian writing samples.

  • Aurovrata Venet

    What about the Saraswati River part of the civilisation. Much recent work in the last 30 years show the Indus civilisation should really be renamed the Saraswati River civilisation.

Leave a Reply

Your email address will not be published. Required fields are marked *