21-05-2026, 07:56 AM
(20-05-2026, 11:26 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(20-05-2026, 06:22 PM)kckluge Wrote: You are not allowed to view links. Register or Login to view.All of which is getting into the weeds. The point is that *if* Voynichese were a cipher that breaks up words into smaller chunks, the process that breaks them up is unlikely to be syllabification (and likely isn't deterministic in general?) due to the extremely low TTR that results.
The TTR of Voynichese can be reduced as much as you need... by simplifications, equivalences, re-spacing. However the babble-like sequences of similar words would not produced plausible Latin (or Chinese).
Let me start by saying that we are in vehement agreement that the distribution of the number of words between instances of words with a length-normalized edit distance below some threshold (which is a fancy way of saying "babble-like sequences of similar words") is an incredibly significant statistical signature of the Voynich text that any theory about how the text is generated needs to reproduce. It's for Stolfi to address that issue -- quantitatively -- in the context of explaining/defending his theory.
As for the TTR stuff...I'm acutely aware that this is the "Chinese" theory thread (and not, for instance, the (AFAIK non-existant) "verbose cipher" theory thread), bur moderators and readers please bear with me because what follows will, in fact, wrap back around to being on topic...
When you say, "The TTR of Voynichese can be reduced as much as you need... by simplifications, equivalences, re-spacing" I'll at least conditionally agree. You could, for example, assume that all the glyphs are either nulls or homophones of a single underlying glyph, in which case the TTR would be 1/NumTokens. While that may be true, it's not a terribly useful observation, and the underlying problem (at least from my POV, YMMV) is that methodologically it's a bass-ackwards way of thinking about the issue for reasons that:
1) tie into the whole "if you have a linguistic/cryptographic theory about the Voynich text, then actually applying that theory to translate/decipher the Voynich text is literally the last thing you should be doing, not the first" position that people have probably seen me rant about any number of times in the old mailing list/comments on Nick's blog/the Ninja, and
2) offers a good chance to provide an illustration of what I think methodologically is a better approach, that
3) results in a critique of Stolfi's approach here
In the context of the experiment I was talking about, the question is *not* "can you fiddle with the Voynich text to lower the TTR?" The questions is "if you assume that Voynichese represents a natural language (in this case, Latin) where the words have been broken apart into syllables and then enciphered with a (probably to no one's surprise) verbose cipher, how do the statistical characteristics of that cipher text match up with the characteristics of Voynichese?" In that context, can you do things to the enciphered text that would raise the TTR? Sure, but the problem is to find a way to do that such you both
1) raise the TTR to a level that is quantitatively consistent with the Voynich text, without
2) (and this is the really important kicker) screwing up the quantitative agreement of the cipher text with any of the *other* characteristics of the Voynich text.
Could you raise the TTR of the enciphered Latin -- by the quantitatively necessary amount -- by (say) introducing homophones? Sure -- but in reducting the predictability of the next glyph what is that going to do to the entropy values? Is that going to mess with how Zipfian the "word" frequency distribution is? These things aren't independent variables that you can arbitrarily fiddle with in isolation, they're coupled. To the extent that the answer to those questions are (probably) "raise them unacceptably" and "yes", the low TTR -- completely independently from any other problems like "babble-like sequences of similar words" -- suggests (as a preliminary result, to the extent that Latin may or may not be typical) that we can probably rules out "it's enciphered syllables in a natural language text." Or at least "...in an Indo-European language text."
Whether or not there is a different way of breaking words in an underlying natural language text apart and enciphering them that produces an ciphertext that is quantitatively consistent with all the "greatest hits" properties of Voynichese is an open question.
All of which leads to wrapping this back around to discussing Stolfi's theory (and tying it into the whole "babble-like sequences of similar words" text signature issue). It's not *impossible* that Stolfi has stumbled into a solution with his approach, but that's certainly not where I'd be putting my money in Kalshi or Polymarket at this point. IMHO -- and he is obviously convinced otherwise -- I think it is far more likely that he has fallen prey to the siren call of the "crib" that has lured so many mariners sailing on the treacherous seas of the Voynich Mss to their doom. What he *should* be doing -- again, IMHO -- is taking texts in one or more SE Asian languages (dealer's choice), assigning some scheme for representing them with Voynich glyphs, simulating whatever "confused ignorant scribe" error processes he thinks are there, and then showing that you wind up with something that -- quantitatively, and for all the "greatest hits" properties (including "babble-like sequences of similar words") -- looks like Voynichese.
Please excuse any typos/word skips --it's late...