The Voynich Ninja
The incompatibility of Voynichese with natural human language - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The incompatibility of Voynichese with natural human language (/thread-3124.html)

Pages: 1 2 3 4 5 6


The incompatibility of Voynichese with natural human language - RenegadeHealer - 09-03-2020

"The text of the Voynich Manuscript is incompatible with natural human language." I have seen some variation on this conclusion in a lot of recent papers and commentary. This belief seems very much in vogue in the past ~10y, especially among scientists well experienced in wielding statistical tools to analyze information. I wouldn't say it's a consensus yet. But it does seem to be an increasingly popular view among VMS scholars who tolerate absolutely no deviation from the scientific method. Should this incompatibility come to be a consensus among those who truly understand the properties of Voynichese, a serious belief that Voynichese represents some way to write some natural human language will become an immediate indicator of someone who needs to do much more reading, and drop any cranky preconceptions, before adding anything of value to the conversation.

I have no background in coding or information science. My knowledge of statistics is rudimentary, and my knowledge of linguistics is only as a lifelong amateur enthusiast. In reading the latest by Alin Jonas, JKP, Marco Ponzi, Torsten Timm, Brian Cham, Donald Fisk, and a few others, I tried my best to wrap my head around the metrics of natural language specimens, and how these authors' specimens differ markedly from those of the VMS. I've been impressed with these efforts, and think all of these authors make a good case that Voynichese as a vessel capable of holding natural human linguistic communication is, as yet, an unsupported premise. This conclusion doesn't go unchallenged, but I've noticed that increasingly, supporters of this incompatibility are comfortable meeting most challenges with some variation of, "You don't really understand what you're arguing against." or "You don't know what you don't know." I know that I don't know. I could believe that the compatibility vs. incompatibility of Voynichese with natural human language is a false equivalence, and that proponents of compatibility are only numerous and vocal because of how few people have truly taken the time to understand how the book's glyphs are arranged. On the other hand, I'm open to the possibility that proponents of incompatibility are something like an ideological echo chamber, who reach the same conclusion only because they start from the same set of assumptions, and don't keep company with researchers who don't share those same assumptions. To someone who isn't really qualified to argue either side, it feels like a bit of a Rashomon effect.

Reddit.com has a forum called You are not allowed to view links. Register or Login to view.. The idea is admitting your ignorance, and humbly inviting an expert to provide friendly, simplified, and layperson-accessible explanations to technical questions. It seems that if Voynichese is highly unlikely to represent natural human language, there should be a way of explaining this to fledgling researchers and enthusiasts, in a way that any person of average intelligence could grasp. Incompatibalists, I know you're very tired of trying to explain your conclusion to would-be Voynicheros who just don't want to hear it. So just a link is fine. If incompatibility is indeed robustly supported, what post or other short piece of writing should be promoted as required reading for newbies who *do* want to hear it, and *don't* want to waste their time building a theory that's already been duly ruled out?

One important consideration of this issue, is what it would take to falsify the statement "Voynichese is incompatible with natural human language". To the benefit of anyone mounting a sincere and well-informed challenge to the incompatibility idea, it's a negative statement. All that's really needed to falsify a negative statement is one good contrary example. For example, if I say "There are no black swans," all one has to do is find me one black swan to prove me wrong. Similarly, all someone would have to do to falsify "Voynichese is incompatible with natural human language" is find one example of human writing (which can be reliably and replicably converted to human speech and vice versa), which measures similarly to Voynichese on all of the relevant metrics. This is easier said than done of course. It's not helped by an unfortunate paradox: Those most qualified to perform and interpret meaningful statistical analysis on written language are also likely to have acquired these skills through an education that put blinders on their ideas of typical written language, that they may not even be aware of. Human language of the written kind is used in a lot of ways that have no literary merit and seldom make it into the historical record, after all.

In summary, I think the arguments for Voynichese's incompatibility with natural human language are not as widely and well understood as they deserve to be, and I don't think that's entirely due to willful ignorance. How can this viewpoint be better worded and promulgated, so that anyone out there with the chops to dispute it (or confirm it!) understands what they're replying to?


RE: The incompatibility of Voynichese with natural human language - davidjackson - 09-03-2020

It's clearly incompatible with natural human language.

But that doesn't mean it isn't a natural human language in some way.

Let me simplify.

What I mean, is that if you try to compare Voynichese to any natural human language (which means Latin, Greek, old French, medieval High German, Hebrew, whatever) it just doesn't fit. And the rhythm of Voynichese isn't like any natural human language. I don't think there are any sensible theories out there still arguing that it is a "natural human language".

But that doesn't mean it's nonsense. We can postulate dozens of "solutions" where it is legible.

It's a cipher, or a code. It's an artificial language. It's a potpourri of natural language with an individualistic shorthand (which I think is probably the solution a lot of the more serious minded currently have at the back of their minds at the moment). It's a natural language with shorthand. It's Latin written backwards and put through a mirror with some codes. It's really Elvish, but we've killed all the Elves. etc.

Or, of course, it could be nonsense.

The trouble is - we cannot currently prove or disprove any of these "solutions". Why? They are too generic. Each option has loads of "sub solutions": they are a matrix of solutions and every single sub solution has to be examined on its own merits.

When you plump for a specific solution, then that method can be analysed and proven or discarded. But then the theory author can make a small modification, and bring it back, and we have to start again. It's like a big game of whack-a-mole Big Grin

I suppose one day we'll find we've gone through every single possible solution, and then we can sit back and feel slightly content we never actually "solved" it. (Or we actually "solve" it, and have to find a new hobby!)


RE: The incompatibility of Voynichese with natural human language - Koen G - 09-03-2020

It's like David says, you can't really prove that the VM does not in some way contain natural language. It's like proving a negative.

How I would put it is this: we know almost for certain that the way Voynichese is written, the way characters form patterns, the way words do or do not form patterns, does not correspond to the way other written language behaves. But as David says, there are many plausible possibilities for some linguistic meaning to still be in the text.


RE: The incompatibility of Voynichese with natural human language - farmerjohn - 09-03-2020

To my mind mind the question is mostly terminological, but there is a bit more... answering it implicitly suggests that if it is natural then we will apply methods specific for natural languages, if unnatural - then methods specific for unnatural languages. So then answering it is not very informative, but rather restrictive.

As an example, there is a book consisting only of words beginning with the letter A. Is it written in natural language?

More trickier example. Imagine that VMS was obtained from some text in natural language using trivial transformation T:
natural language -> T -> Voynichese
It’s often shown that T cannot be simple substitution.
Now imagine that VMS was written using several transformations:
natural language -> T1 -> T2 -> ... -> Voynichese
Assume some Tn is simple substitution, but the composition T1 -> T2 -> .... is not. Can we say VMS is written using simple substitution cipher? How restrictive would be the definite answer here?

Overall VMS most probably is in superposition of several states and lacks clear colors.


RE: The incompatibility of Voynichese with natural human language - -JKP- - 09-03-2020

In my opinion, if you take the glyphs character by character (as individual entities) and read the spaces as literal, then the patterns do not resemble natural language in terms of frequency, or in terms of the glyph positions within words, or in terms of proximity to neighboring glyphs.

If there is natural language in the VMS, some kind of processing seems to me to be absolutely essential to unravel or reconstruct it (I don't know which and it may even be a two-step process, or perhaps there is something in the shape variations that carries meaning that is not being recorded in transcripts).

In other words, I do not believe that the glyphs are chosen for convenience to represent sounds in a language that might be extinct or might not have had a written alphabet. You can't just read it by relating the characters to different sounds. I think it is a devised character set (in the sense of being analytical rather than evolving from spoken words).


Something is MISSING if one tries to do a literal interpretation into natural language, and something is INCONSISTENT about the way the patterns line up (it reminds me of ciphers where you shift the corresponding key every few words or sentences EXCEPT it doesn't seem applicable here because the positional-patterns would shift as well and they don't).


After saying this, I do sometimes wonder, however, if there is an occasional real-word here or there (e.g., loanwords from another language?) but it's so easy for a large block of text (200 folios) to include a certain percentage of natural-language patterns by sheer coincidence, that I think this might be a chimera, an artifact of the text-generation process. I also sometimes wonder if there are a few phrases artfully hidden within filler, but if they are, they are dashedly difficult to tease out.


RE: The incompatibility of Voynichese with natural human language - -JKP- - 09-03-2020

RenegadeHealer Wrote:In summary, I think the arguments for Voynichese's incompatibility with natural human language are not as widely and well understood as they deserve to be, and I don't think that's entirely due to willful ignorance. How can this viewpoint be better worded and promulgated, so that anyone out there with the chops to dispute it (or confirm it!) understands what they're replying to?


I think this post deserves a separate answer because it's a good question.

.
Part of the problem is that most of the existing transcripts turn Voynichese into something THAT LOOKS like natural language.

Take the Takahashi transcription, for example, the most commonly used transcript for "solutions" and for computational attacks.

The Takahashi transcription uses the EVA alphabet, which was designed to be easy to remember and type. How was it made easy to remember and type? BUT CONVERTING A CONSONANT-LOOKING SHAPE INTO A VOWEL-LOOKING SHAPE. The tokens look like more like words, which are easier to scan and stick more easily in our heads. This encourages the brain to think of the glyphs as consonants and vowels.


So then what happens? Voynichese takes on the look of a language. We see "words" like oteedy. But there is no oteedy in the VMS. It's o‡cc89, which is going to look less like a word to most people.


How this influences statistical studies...

So take a vowel-consonant-balance study that uses the Takahashi transcript. Attacks are based on artificial "words" like "oteedy", and for reasons I don't understand many of the researchers assume "o" and "e" actually are vowels. Even if the VMS were a simple substitution cipher (which I doubt) YOU CAN'T MAKE THIS ASSUMPTION. A one-letter shift (from abcde to bcdef) immediately changes the "vowel-consonant" appearance of even the simplest ciphers.


As long as transcripts are designed in a way that makes Voynichese LOOK like language, new researchers will probably start out with assumptions about its structure that have more to do with the transcription alphabet than they do with actual VMS text.


RE: The incompatibility of Voynichese with natural human language - Koen G - 09-03-2020

That's a good point, JKP. But if you replace everything in Voynichese to what it would be in a normal MS, don't you still get something language like? I mean o stays o, a is a, bench is "ci" or "cr" or whatever, ending abbreviations are developed, 8 is d... you wouldn't be too far off from something that's pronounceable. Your biggest problems would be minims and gallows probably (which admittedly form a large part of the text).


RE: The incompatibility of Voynichese with natural human language - -JKP- - 09-03-2020

It's language-like because we want to see it that way. Years of reading trains our brain to see it that way.

That's why I wrote o‡cc89 rather than using the EVA font. We really should be seeing them as symbols rather than letters until we know what they are. They might be numbers. They might be something else.

At some point perhaps they convert to letters, but as it stands, it does not make sense to assume they are words, and changing "c"-shape to "e" gives people the wrong impression. There are no "e" shapes in the VMS.


RE: The incompatibility of Voynichese with natural human language - RenegadeHealer - 10-03-2020

@davidjackson, Koen G, farmerjohn, and -JKP-, that's very helpful, and clears up a lot of the confusion I (and probably many others) have had with the incompatibility premise. I see a common theme in all four of your responses, and so a more precise way to state the conclusion could go something like this:

  • The patterns of glyph arrangement in the VMS are not consistent with any known samples of unprocessed natural human language.
David and Koen, I agree with you that the statement "The VMS's text contains no meaning" needs to be teased apart from the statement "The VMS's text contains no unprocessed natural human language." It's very easy for a layperson to conflate these two statements, especially when there are researchers who believe the former (and whose work is motivated to varying degrees by this belief), but have found support for the latter statement in their work. The only way to falsify "The VMS's text contains no meaning" is a reliable and replicable extraction of meaning from the text. As long as such an extraction remains elusive or contestable, the possibility of meaninglessness is always on the table. Incompatibility with unprocessed natural language, however, is a much more parsimonious claim which is much easier to falsify, as I described. And so the fact that it has resisted all attempts at falsification makes it a fairly robust statement.


The questions that follow from this, in my mind:
  1. if Voynichese is the output of a transformation (or a series of transformations) applied to natural human language, as farmerjohn phrases it, what could this transformation consist of?
  2. What are the clues that it was a specific sort of transformation, as opposed to another?
  3. If the likely transformation algorithm can be ascertained, is it one that lends itself to reverse engineering, given the text of the extant VMS folios as the only raw data? (I.e. no key or other documents written in this script ever found)
I did some reading recently on You are not allowed to view links. Register or Login to view., which is an interesting read for anyone working with undeciphered writing. The argument basically states that language understandable to only one person cannot be deemed meaningful. It seems to me that if the VMS performed the function of real human symbolic communication, but was purposely designed to hide this meaning very well from all but a select few (or only one person), then barring the discovery of additional written materials connected to the VMS, one could argue that it has become meaningless, even if it wasn't meaningless at the time it was created.

By way of analogy, let's say I posted a message on the Internet, using a complex encryption tool like a PGP key, and published it using identity-obfuscating technologies like a VPN and the Tor browser. It seems to me that if I never made the PGP key available to anyone (including the intended recipients of the message), and successfully obfuscated myself as the source of the message, then my encrypted message would be meaningless. The mathematical operations involved in decrypting and tracing the source of the message would be so impractical as to be essentially impossible. Therefore, there would be no hope for anyone attempting to extract meaning from it, and it would be indistinguishable from a stochastically generated string of nonsense for all who tried.


RE: The incompatibility of Voynichese with natural human language - Koen G - 10-03-2020

It could certainly be the case that the VM has become meaningless - its possible meaning lost forever. I am increasingly concerned that this may in fact be the case.

There are some ancient scripts we still can't read, even though we have more data. We know the type of script and the some types of texts, we know they were used in administration etc. We know the archaeological context. 

For the VM, we don't know what kind of script we are dealing with (syllables? Alphabet? Weird cipher? Abbreviations?) We don't know what the subject is, if any. We don't know where it was made, we don't know the language or even language family spoken by its makers (marginalia in "bad German" are not conclusive evidence...). We have nothing to go on.

Even when you know much more background than we do about the VM, and even when it is guaranteed that there is meaning, it is very hard to decipher an unknown script.