May 01, 2003

pattern recognition in natural language

The following snippet of text has been circulating on the net for a while: "... randomising letters in the middle of words [has] little or no effect on the ability of skilled readers to understand the text. This is easy to denmtrasote. In a pubiltacion of New Scnieitst you could ramdinose all the letetrs, keipeng the first two and last two the same, and reibadailty would hadrly be aftcfeed. My ansaylis did not come to much beucase the thoery at the time was for shape and senqeuce retigcionon. Saberi's work sugsegts we may have some pofrweul palrlael prsooscers at work. The resaon for this is suerly that idnetiyfing coentnt by paarllel prseocsing speeds up regnicoiton. We only need the first and last two letetrs to spot chganes in meniang."

You can also try writing texts replacing vowels with hyphens, or as I did once, replacing vowels with schwas in a digitized recording of a spoken text. L-ng--g- -s -n -m-z-ngl- c-mpl-x th-ng.

[Addendum 09/29/03: See these entries also c-n y-- r--d th-s?, ashslay ottedday, visual word recognition, and 50-millisecond segments.]

Posted by jim at May 1, 2003 01:43 PM
Comments

I just came across that one (Sept 03). I noticed that the sample given slowed me down about 20%. I attributed the ability to still read it more to the brain's powerful predicitive algorithms, and the redundancy built into natural language.

It's also true that we use the shape of the words (but don't tell the natural language people about that - things are bad enough as it is). Cover the bottom half of a line of text, and you can still make out 80% to 90% of it. Cover the top half, and you're out of luck.

I took your sample and converted it to upper case (easy in UNIX). Here's the result:

RANDOMISING LETTERS IN THE MIDDLE OF WORDS [HAS] LITTLE OR NO EFFECT ON THE ABILITY OF SKILLED READERS TO UNDERSTAND THE TEXT. THIS IS EASY TO DENMTRASOTE. IN A PUBILTACION OF NEW SCNIEITST YOU COULD RAMDINOSE ALL THE LETETRS, KEIPENG THE FIRST TWO AND LAST TWO THE SAME, AND REIBADAILTY WOULD HADRLY BE AFTCFEED. MY ANSAYLIS DID NOT COME TO MUCH BEUCASE THE THOERY AT THE TIME WAS FOR SHAPE AND SENQEUCE RETIGCIONON. SABERI'S WORK SUGSEGTS WE MAY HAVE SOME POFRWEUL PALRLAEL PRSOOSCERS AT WORK. THE RESAON FOR THIS IS SUERLY THAT IDNETIYFING COENTNT BY PAARLLEL PRSEOCSING SPEEDS UP REGNICOITON. WE ONLY NEED THE FIRST AND LAST TWO LETETRS TO SPOT CHGANES IN MENIANG.

It doesn;t seem to be that much harder to read - but that's probably because I already know the text.

Someday soon, I'm going to write a program to do that sort of shuffling, and pick samples from the millions of pages of text on etext. One problem will be when the shuffle results in another real word (lots of words have meaningful anagrams).

A logical next step would be to leave out a certain percent of letters (like your "e" example, except actually omitting them). That would give something like

"Lngg is n amzngl cmplx thng"

Now we're getting into the area of instant messaging. I think there's a connection.

Posted by: Mike on September 17, 2003 11:15 AM

Mike: I have to agree with you. I think mostly the first text and this work because of the cues (first and last letters, same length of the word, actual letters there but jumbledm and just as importantly the ability to predict the following words increasing as the sentence winds down to it period). Keep me informed of any further developments.

Posted by: jim on September 17, 2003 05:48 PM

Actually I am pretty sure it is my research people are talking about as it was my letter to the New Scientist. It was research for my PhD and the UNievrsity was Nottingham, PhD submission 1976, available on line I am told, title The Significance of Letter Position in Word Recognition. It shows clearly that word shape (the theory at the time) is not that significant as shuffling the letters changes the shape substantially. I did 36 or so experiments with all kinds of changes, keeping first and last in position, or first two and last two letters. Should be easy to write a programme which deos this now. For me it meant learning machine code and compressing everything to get it in. NOw I just help people invent things.

Thanks for all the interest in my PhD after all these years. At the time I could not get publication as the reviewers were the ones whose theories my research results contradicted!

Oh well.

Graham

Posted by: Dr Graham Rawlinson on September 19, 2003 01:43 AM

Hello.
I'm French, rather fluent in English, but it's still not my mother language. I first read such a scrambled text in French, and I read it quite as fast as usual (maybe 2/3 of the speed).

But here, for an english scrambled text, I had several difficulties : I recognized most of the words easily, but at least 10% of them required a significant delay, as to solve a puzzle.

I think that UNSKILLED readers (even in their mother language) do have such delays, such difficulties to read NORMAL texts : the word recognition is then often replaced by raw sound recognition.

Jerome.

Posted by: Jerome on September 19, 2003 01:57 AM

Dr Rawlinson: Nice to hear from you. The text of the dissertation is not available online. What I accessed was its bibliographical information. Nice hearing from you.

Posted by: jim on September 19, 2003 07:49 AM

ouy rae lal iidots...

Posted by: dd on September 19, 2003 08:00 AM

Your anonymous pain grieves me. Get well soon.

Posted by: jim on September 19, 2003 10:17 AM

I am also bilingual French / Persian , living in UK for 2 years with medium fluent level of English (knowledge of German and Spanish). I also first get the text in French and read it easily. The English version I found it much more difficult as my vocabulary is much more limited in English. I have spent 2 years with my family in Germany before arriving in UK my husband is Australian, Fluent in German and French that is our common language spoken at home. Our youngest daughter, now 5 and a half, was attending a German kindergarden six months before we moved, she was speaking more or else fluently. When in UK she had to learn English and first ignored the new language and keep speeking in German. Than after 3 months she definitly switshed to English and now it seems that she has never spoken any word of German. When this summer I bought her a speaking book in French she leastened to it and than asked me if it was German. Even if I try to keep the French at home because of the homeworks, medias and English freinds the French is loosing his predominance.
I am also suprise that she is in a highest group on reading in her class (year 1) she reads very well but I have noticed that she in fact guess more the words than she reads them. Sometimes the words she reads dont correpond at all with the letters but she only tries to guess by the meaning of the text. She has an older sister 7 years old in Year 3 now who was considered as slow, not able to focus.... but after work with her and having changed her school , now she has a good results and works well. I regret that even living in the Surrey with lots of multinational families, and sending our children to private schools with a few numbers of pupils, the teachers are not prepared to deal with bilingual pupils.
Thanks.

Posted by: Taubman Tara on September 22, 2003 05:51 AM

I tried to looked into my child's dyslexia in the late 80's and re read an article by Chin-Chance (197x) in Scintific American which looked at word length, shape, etc. It demonstrated the shift from immature to mature reading techniques which relied more on begin-end. He (she?) did some saccade work to show how the eye jumped to fixed places in the word, regardless of word length. I think anyone interested in this should search it out.
I have a photocopy which I can't find, but will get it digited.

Posted by: andy on September 23, 2003 09:28 AM

Here's a small programm (with source) for testing and examples.
http://www.planet-source-code.com/vb/scripts/showcode.asp?txtCodeId=48723&lngWId=1

Posted by: Paul on September 23, 2003 11:16 PM

On
http://lsto.gmxhome.de/hardlycrypt.html
you will find a three-lines-perl script and and java script implementation to convert a text online.
Try it online!

Sorry, it's in German.

Daniel

Posted by: Daniel on October 14, 2003 02:03 PM

As many of you know Hebrew is commonly written without vowels and has been for more than 2000 years. Until I saw this information about the meme on a website I never realized English might eventually economize on the letters as well, or at least some other form of short hand. What do you think about this?

Posted by: Jason E Schaitel on October 22, 2003 01:58 PM

Jason: Yes, languages have a lot of rundancy built into them. There are also writing systems that have even less to do with sound (e.g., Chinese).

Posted by: jim on October 23, 2003 07:33 AM
Post a comment