Three Letters Re: Forget Codes: Using Constructed Languages for Secure Communication

Jim:
The article on constructed languages [by Snow Wolf] was fascinating. Just two concerns: An outsider might be able to crack your code based on repeated grammar. As was mentioned in the letter, “sentences follow the common subject-verb-object pattern”. This pattern is predictable and could help a very intelligent decoder. Also your activity can be observed after communication, helping one define terms.

Both of these concerns can be mitigated with re-aligning, as mentioned in the letter. So take care not to overlook that step.

Finally, if every tip in this article (such as re-aligning and custom grammar) were practiced, and on top of this was layered a nice encryption method, such as was described in the 9/11/12 letter, you’d seriously give an enemy a run for their money!

I know this is true, for during WWII, Navajo-speakers were employed for code talking; that is, the messages were first translated into Navajo and then encrypted. Navajo almost fully qualifies as a constructed language. The following is from Wikipedia:

“Navajo was an attractive choice for code use because few people outside the Navajo themselves had ever learned to speak the language. Virtually no books in Navajo had ever been published. Outside of the language itself, the Navajo spoken code was not very complex by cryptographic standards and would likely have been broken if a native speaker and trained cryptographers worked together effectively. The Japanese had an opportunity to attempt this when they captured Joe Kieyoomia in the Philippines in 1942 during the Bataan Death March. Kieyoomia, a Navajo Sergeant in the U.S. Army, but not a code talker, was ordered to interpret the radio messages later in the war. However, since Kieyoomia had not participated in the code training, the messages made no sense to him. When he reported that he could not understand the messages, his captors tortured him. Given the simplicity of the alphabet code involved, it is probable that the code could have been broken easily if Kieyoomia’s knowledge of the language had been exploited more effectively by Japanese cryptographers. The Japanese Imperial Army and Navy never cracked the spoken code.”

Jim,
The recent submission, “Forget Codes…” while interesting, seems to neglect one rather important point: what the author is suggesting IS a code, and a fairly simple one at that!

Rather than substituting symbols for letters or letters for each other, this code is substituting words for other words. That the substituted words are made up isn’t of any consequence at all.

What is proposed is thus a substitution cipher and like all such ciphers, can and will be cracked by a determined individual or group. It is more complex than the simple Caesar Ciphers we used as children to keep our “secret clubs” secret, but it’s not a secure cipher by any means.

All that is needed to crack it is a sufficient collection of enciphered phrases and some indication of their meaning. These meanings could be gotten by intercepting the enciphered communication and observation of events before or after the communication. The group using the code could even be baited by an enemy into using words – for example, if I walk down the road near their BOL and drop a handful of ammunition on the ground, I can bet the encoded word “ammunition” will be used by their patrol when they report back in. Knowing their word for ammunition could be valuable, no? If the situation is such that I could safely allow myself to be observed while walking down the road, I might also get the words “man”, “stranger” or “dropped”. From there the process of deciphering unknown words snowballs.

Using the examples provided by the author:

puq tf urr (There’s a man in the house.)

cg wzn (A stranger is coming.)

igy cg tf urr (Shoot the stranger in the house.)

aok cg tf f (Watch out for a stranger in a vehicle.)

puq fh bx tf urr (A man with a gun is in the house.)

…and with NO reference to the key, which is now out of sight, I can see that the word “house” is used in sentences 1, 3 and 5. The only code words used in all three sentences are tf and urr. One of those means house. Those sentences also have something else in common, as there is another word repeated – that is the state of “in-ness” – being in the house. A look at sentence 4 disambiguates: it is lacking a reference to “house” and is also missing the word “urr”. Urr thus means “house” leaving “tf” to refer to in-ness. As further confirmation it refers to someone who is “in” a vehicle and contains the word “tf”. Tf thus definitely means “in”. A little more thought along the same lines reveals that the “man” in sentences 1 and 5 is represented by the word “puq” and that the remaining words in sentence 5, “fh bx”, mean “with a gun”. A larger sample would be needed to tease those two words apart. It would probably only take another sentence or two before the word “with” appeared without “gun”, answering that question.

The plaintext is the key! Given enough samples, the key can be extracted from the text.

This cipher could be very useful if dealing with a short-term situation with a transient enemy but would become useless against a long-term neighboring enemy very quickly and suggesting that it could resist the efforts of a government is craziness.

The only way a cipher like this can remain secure is if all of the facts conveyed using it also remain unknown to the observer. This is a common weakness among substitution ciphers. Whether it is letter, digram or trigram frequency analysis for letter substitution ciphers – or analysis of the use and reuse of code words for word substitution ciphers – the weakness is the same. With a more secure cipher knowing some of the plaintext (or in this case, the information conveyed) doesn’t get you even one step closer to deciphering the /next/ bit of text.

Those interested in the subject of encryption would do well to check out “Cryptanalysis – A Study of Ciphers and Their Solution” by Helen Fouche Gaines. It is a well regarded “beginning to intermediate” text on many cipher schemes, some quite difficult to crack. Applied Cryptography by Bruce Schneier contains great coverage and explanations of security and encryption, especially with regard to electronic communication.

Finally, as far as I know there is only one cipher that is known to be unbreakable if properly implemented, and that is the “one time pad”. When I say unbreakable I mean unbreakable even by the wealthiest and most powerful governments. It is extremely simple but suffers from a few difficulties and limitations, the primary one being that the keys must be exchanged before any encoding can take place. Two others are that it requires the generation of a very random collection of data used as the encryption key (the pad) and pads can *never* be reused (or you’ll introduce the very same weakness I illustrated above). It is well worth looking into and if you decide to use it, generate and exchange pads *now*. If you can’t build a device to collect cosmic noise for random data then decent pad data can be (or used to be) gotten from www.random.org. In the event someone intercepts your pad data, it is unlikely they will also be the person out to raid your BOL!

Best, – Matt R.

James:
I was surprised to see you publish the article on “Forget Codes: Using Constructed Languages,” it has to be one of the single most dangerously flawed pieces of writing I have seen on your web site. It seems based on an understanding of cryptography and mathematics set shortly after the Victorian era of heiroglyphics decryption. We have come a LONG way since then. The author is WRONG, and following his advice leaves ones communications completely vulnerable. I do not leave my argument up to a difference of style or opinion. I do not base my argument on petty infighting of Glock vs Everything else, or other arcane arguments that appear on Internet fora. My argument is based on undergraduate level mathematics and statistics. 

Yes, constructed language will serve to keep conversation “secure” in the setting of overheard conversations in the local watering hole. For that matter, I can’t follow the conversation of the waiters at my local Cantonese restaurant. Constructed language might even serve a small groups security purposes in the local AO. However, make no mistake, the concept is tragically flawed when discussed in terms of security and cryptography.

By its very nature, what is being discussed is a substitution cipher. Yes, the author suggested playing some games in the construct and lingui/grammatical foundations. There is also an attempt to change “hash” on a pseudo-random basis. Or to even change keys on some time period (t.) Ultimately, should we follow the authors advice and not even substitute for each word in the dictionary, but instead a common subset of oft used nouns and actions, we are talking about a frequency breakage of a mere (in the authors suggestion) 300 factors. Lets be generous and quadruple this to 1200 words. Or change the hash 3 times, and come up with a factor of 3600. We would not even require computer horsepower to break this “code” using modern mathematics. It can be done by anyone with a basic background in statistics, a few pages of notepaper, and 5-10 pages of message intercepts or transcripts to analyze.

I heartily agree with the philosophy of grounding ourselves in secure communications. But please dear reader, do NOT create a security system that is based on radically flawed assumptions. Heck, do not even trust me on this topic. If you are serious about security, do your own research. You will likely find, that the constructed language concept was debunked shortly after Turing moved beyond water filled tubes and the first computers began number crunching. I should also note that there are now linguistic breakages, as opposed to purely statistical (I hate his politics, but Chomsky is brilliant on language commonalities.) Turing machines used brute force, now we have algorithms to assist, along with the Moore’s Law logarithmic increase in brute force of computing power.

If you are truly interested in secure communication, there are many excellent and free resources.
-The book Cryptonomicon by Stephenson is an excellent novel, and contains an appendix on creating a Solitaire code based on decks of cards.
Bruce Schneir, one of the worlds experts on cryptography has an excellent blog, and free monthly newsletter. In it, he discusses politics, security theater, snake oil ideas in security, cryptography, software, etc. Free, excellent, and from one of the modern day godfathers in the field.
Human Rights Watch (say what you will about their politics) has an excellent resource for folks working in hostile environments, who require secure comms from the field.
PGP and Open PGP (likely breakable by large resources such as the NSA) are free, and there are numerous reputable resources discussing its implementation.
-Read up on Onion routing (not entirely secure, but a good step amongst many needed,) one time pads (very secure, but laborious, and should be implemented with a second authentication factor,) key lengths, and hash functions.
Open source philosophy of security i.e. public testing of all mathematical and programming functions. Also see: ISECom.

In closing, I could completely pull apart the suggestion of security via constructed language using mathematical arguments, and logical analysis. Lets just leave it at this – PLEASE do some research before you accept that suggestion as gospel to be deployed in securing your loved ones. My entire purpose is to save lives, and letting that article stand is like me not shouting FIRE in a burning building. It is a flawed course of action, potentially fatally.

Wishing gods blessings, of peace and health to all. – CypherPunkPrepper

JWR Replies: I agree completely that substitution ciphers and constructed languages only provide a very weak form of encryption. They might suffice if your opponent is just a criminal looter gang, but they absolutely will not hold up to the scrutiny of any government agency.