[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text

[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text (/thread-5718.html)

Pages: 1 2 3 4 5 6 7

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 07-05-2026

(06-05-2026, 09:09 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.The underlying structure of the texts is also very difficult to reconcile with a generator—in fact, it is precisely this structure that points to an actual language that just happens to have this structure by chance.

I agree. The "gibberish" theories have several big problems.

Quote:The differences between the sections would be easiest to explain with different lists.

AFAIK, changes of topic and style alone could account for those differences.

Quote:The LAAFU features are stronger than previously described—the initial letters produce significantly longer and significantly shorter lines for p vs. o. They favor certain subsequent letters/bigrams and tend to reject others. So something is going on here,

Yes, there are many statistical and structural anomalies around line breaks. But, again, that does not imply LAAFU. Even the trivial line breaking algorithm will generate such anomalies -- a fact that was observed only recently. All the observed anomalies may be just side effects of the actual line breaking "algorithm" used by the Scribe, which includes things like abbreviations, stretching and shrinking of spaces, fancification of glyphs, etc. This possibility is discussed in You are not allowed to view links. Register or Login to view. and is far from having been adequately explored.

All the best, --stolfi

RE: How should we deal with LLMs on the forum? - nablator - 07-05-2026

(06-05-2026, 09:09 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.If it were a hoax, why would anyone go to the trouble of arranging it differently in every section? That is completely unnecessary for a hoax.

Getting a particular result does not require intentional, conscious choices rather than emergent properties from a generation process, with some degree of freedom in initial conditions (seeds).

(07-05-2026, 08:23 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.All the observed anomalies may be just side effects of the actual line breaking "algorithm" used by the Scribe, which includes things like abbreviations, stretching and shrinking of spaces, fancification of glyphs, etc.

Natural languages did not need any algorithm in manuscripts, some did not even use hyphens, even in printed books (incunables) they put hyphens at the end of the line ~50% of the time when they broke a word in two, if they used hyphens at all. The fact that the VMS is different argues against a natural language.

Example: You are not allowed to view links. Register or Login to view.: no hyphens but syllables were not broken, which requires minimal adjustment if you want perfect alignment on the right, not a requirement in the VMS.

RE: How should we deal with LLMs on the forum? - JoJo_Jost - 07-05-2026

(07-05-2026, 09:09 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Getting a particular result does not require intentional, conscious choices rather than emergent properties from a generation process, with some degree of freedom in initial conditions (seeds).

Yeah, I get that—you could possibly explain Currier A and B as a drift, but not the significant differences between the sections. Or am I wrong about that?

RE: How should we deal with LLMs on the forum? - ReneZ - 07-05-2026

(07-05-2026, 09:22 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.but not the significant differences between the sections

They can be quantified to some extent, but not explained. At least not yet.

RE: How should we deal with LLMs on the forum? - nablator - 07-05-2026

(07-05-2026, 09:22 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.
(07-05-2026, 09:09 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Getting a particular result does not require intentional, conscious choices rather than emergent properties from a generation process, with some degree of freedom in initial conditions (seeds).

Yeah, I get that—you could possibly explain Currier A and B as a drift, but not the significant differences between the sections. Or am I wrong about that?

For example Q13 has few ody, even fewer oda, and almost no eod (maybe just one, or none because there is a small space in oteeo dy You are not allowed to view links. Register or Login to view. ) unlike Q20. How odd is that (pun intended)? Large discrepancies between large sections are impossible to explain away as normal statistical fluctuations of the same Currier B language; there must be something causing them. It could be a consequence of variable constraints set by initial conditions and generation rules, I don't see why not.

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 07-05-2026

(07-05-2026, 09:09 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Natural languages did not need any algorithm in manuscripts

Sorry, by "trivial algorithm" I meant "if the next word fits in the current line, write it there, otherwise break a new line and write it there." As discussed in that linked thread, this algorithm alone creates anomalous statistics (glyphs, digraphs, words, etc) over the first few words of each line.

The Scribe's "algorithm" was more complicated than that, since it apparently included the options of abbreviating certain words (like iin to m), stretching and shrinking spaces, etc. And he may have done other things, like changing forms of glyphs at the start of the line. It has not been shown yet that the line break anomalies cannot be explained as side effects of a non-trivial but plausible line-breaking "algorithm".

Quote:[Many medieval manuscripts broke words across lines without hyphens.] The fact that the VMS is different argues against a natural language.

I don't follow the logic.

Quote:Example: You are not allowed to view links. Register or Login to view.: no hyphens but syllables were not broken

Well, if the Scribe's previous experience with Latin had conditioned him to avoid splitting syllables, and he did not understand the Voynichese script, then he would have avoided splitting words on the VMS, since he could not tell where the syllable boundaries were.

Or if he had been told by the Author that each Voynichese word was in fact a single syllable...

All the best, --stolfi

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 07-05-2026

(07-05-2026, 10:48 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.Large discrepancies between large sections are impossible to explain away as normal statistical fluctuations of the same Currier B language

Please see a You are not allowed to view links. Register or Login to view. on the A/B thread.

RE: How should we deal with LLMs on the forum? - nablator - 07-05-2026

(07-05-2026, 03:27 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I don't follow the logic.

There was no need to modify the text in a "complicated" way so it would fit in the available space in any text of any manuscript or book, ciphered or nor, especially when it didn't need to be perfectly justified (the VMS isn't) so why would there be a need for a "complicated" Scribe's "algorithm" for writing Voynichese if it represents normal text of an unknown language?

Quote:Well, if the Scribe's previous experience with Latin had conditioned him to avoid splitting syllables, and he did not understand the Voynichese script, then he would have avoided splitting words on the VMS, since he could not tell where the syllable boundaries were.

The "trivial algorithm" and a little flexibility in space size would be sufficient. No perfect justification is required, so no complicated word breaking algorithm is needed.

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 07-05-2026

(07-05-2026, 05:35 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.There was no need to modify the text in a "complicated" way so it would fit in the available space in any text of any manuscript or book, ciphered or nor

Many "professional" scribes apparently strove to get even right margins, with varying amounts of effort and success. Even in You are not allowed to view links. Register or Login to view. the Author/Scribe used hyphenation, filler dashes, and letter flourishes towards that goal. Scribes could also use abbreviations for that purpose, as in You are not allowed to view links. Register or Login to view. (line 6 from bottom).

Quote:especially when it didn't need to be perfectly justified (the VMS isn't)

The Scribe obviously tried to produce an even right margin. The gaps and overflows are generally only a couple of characters wide.

Quote:so why would there be a need for a "complicated" Scribe's "algorithm" for writing Voynichese if it represents normal text of an unknown language? [...] No perfect justification needed

The Author naturally wanted the book to look as nice and "professional" as possible. (Needless to say, it ended up very far from that ideal.) That was the whole point of making the copy on vellum, and having all those decorative images. Writing the text with even right margins would have been part of that goal.

Quote: no complicated word breaking algorithm needed.

The Scribe's line breaking "algorithm" definitely was more complicated than the trivial one. The simplest explanation for the prevalence of m at line end is that it is an abbreviation for some longer ending, probably iin, that the Scribe could use anywhere, but used specifically when some word ending in iin would not fit in the current line but the same word with m would. There may be other abbreviations used for the same purpose, like ld .

My experience transcribing the text left me with the impression that glyphs as well as spaces get compressed near the end of the line, which may cause words to be joined in the transcription file. Conversely, spaces seemed stretched out at line start, causing some words to be split in certain contexts, These accidents would create anomalies in some statistics. (I wonder if someone has tried to verify this hunch, e.g. by counting glyphs within two 20 mm wide bands along the right and left rails.)

The Scribe may have had some knowledge of how the script worked. For one thing, he apparently was allowed/instructed to replace some letters by p of f on parag head lines. This knowledge could have enabled and caused him to modify the spelling around line breaks. As a hypothetical example, he might have been told that he could omit a word-initial y after a word-final y, for being redundant; except after a line break, when the word-initial y would have to be written regardless -- leading to an excess of word-initial y there.

All the best, --stolfi

RE: How should we deal with LLMs on the forum? - JoJo_Jost - 07-05-2026

That's a really interesting theory - but does it actually fit the distribution of m?