[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text

[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text (/thread-5718.html)

Pages: 1 2 3 4 5 6 7

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - Grove - 08-05-2026

Is there anything more obscure than the VMS? ?

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 08-05-2026

(08-05-2026, 12:54 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.
(06-05-2026, 06:48 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.That "code switching by line" is the LAAFU hypothesis.
No, that's not the LAAFU hypothesis. The LAAFU hypothesis is merely that lines are a "functional unit" -- which is to say that each line is composed in such a way that different positions within it display different characteristics, and that these differences can shed light on how the underlying system works and are worth documenting and exploring for that reason.

Sorry, I stand corrected.

However, another way of saying what you wrote above is that the encoding is sensitive to line breaks. Thus the LAAFU hypothesis would allow the encoding to be different for each line, even though it does not imply it.

Quote:What the existence of such patterns might reveal is another question beyond that -- and, indeed, the question of this newly broken-off thread. But the investigation itself should, I believe, be analogous to something like frequency analysis: we first find out what all the patterns are, and then we try to account for them.

That "agnostic" approach is good as a first step. But after observing the existence of anomalies, trying to precisely describe them is likely to be mostly a waste of time. A more effective way to proceed is to formulate hypotheses about their cause and designing the best test for refuting either the hypothesis or the competing ones. That is how one would proceed if Fourier analysis showed an anomalous vibration on an engine.

Quote:1. There's evidence that the text was written to fill available space on the specific pages we have (I agree).

Yes. It seems very unlikely that the Author could have predicted the line breaks (and figure intrusions -- which, IIUC, trigger anomalies similar to those of line breaks). See page You are not allowed to view links. Register or Login to view. and f112v, for example. Clearly the Author -- not just the Scribe -- intended the line breaks to be selected based on the available space.

Quote: 2. Someone would have been crazy to write directly on parchment -- there'd be too much risk of mistakes that would need to be corrected -- so there must have been an earlier draft.

I agree too. But, more than simple "quillos" ("typos, but with a quill"), composing a text usually involves lots of deleting, rewriting, crossing out, changing and transposing words and sentences, etc. So there must have been at least one "draft" version of the VMS prior to the final copy.

And another reason for creating a draft, and recruiting a scribe to put it to vellum, is that not everybody would be able to write small letters in a nice handwriting.

And yet another reason is that writing nice letters by hand is slow and tedious work.

Quote:3. The text we have is so riddled with mistakes that whoever wrote it must not have understood what they were writing -- so this must have been a copyist Scribe separate from the Author.

The presumed spelling errors (like words with anomalous structure) are just one item of evidence for this "Ignorant Scribe" hypothesis. There are also layout mistakes, like that two-column text on f34r.

Also, if one assumes that the Scribe was distinct from the Author, then it is more likely that the Scribe was taught only the graphic alphabet -- not the encoding or language. Not certain, sure; just more likely. Why would the Author bother to teach the encryption method or foreign language to that hypothetical assistant?

Quote:4. That last scenario is incompatible with meaningful line patterning because line breaks originate in the copy and wouldn't have been present in the earlier draft; therefore there must not be any meaningful line patterning, and any line patterning must be superficial and ultimately insignificant.

Yes.

Said another way, the LAAFU hypothesis, together with the assumption that line breaks are determined by line length, implies that (1) the "functional" aspect of line beaks is a matter of encoding, not a matter of contents (like verses of poetry, items of a catalog, etc); and (2) whoever put pen to vellum knew the encoding, and applied it on the fly, after choosing the line breaks.

I find that scenario very unlikely, given the other considerations above.

Quote:5. If we can show through simulations that some rule-based protocol for introducing line breaks into any text could ever produce statistical anomalies of any kind at the starts and ends of lines, then we can conclude that this is the correct explanation for any and all such anomalies in the VMS, without needing to account for specific positional differences any more concretely.

That is much stronger than what I meant.

Rather: since even the trivial line-breaking algorithm has been shown to create statistical anomalies around the line breaks, the mere existence of such anomalies does not prove the LAAFU hypothesis. There is still the alternative hypothesis that all the anomalies are consequences of the Scribe's line-breaking "algorithm", which is almost certainly more sophisticated than the trivial one.

Therefore, evidence for LAAFU would have to be anomalies that cannot be explained as side effects of any plausible line-breaking algorithm.

Conversely, if a simulation of some plausible line-breaking algorithm were to generate anomalies like those seen in the VMS, the LAAFU hypothesis could be ignored as unnecessary.

Quote:A more rigorous exercise in support of the line-break hypothesis might involve taking some actual section of the VMS and presenting for it (1) a hypothetical, statistically flat "author's draft" version of the text without line breaks, or with different line breaks; and (2) a simple set of rules for converting that text into a "scribe's copy" that displays line-based patterns identical to (or very close to) the ones we actually see.

(08-05-2026, 01:11 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.And then a follow-up experiment: take the inferred "author's draft" and run it through the same set of copying / line-break rules, but now with the length available for each line increased to something like 1.3 times its current capacity. The line breaks will now mostly fall in different places. Does the "copied" result still display the same line-positional patterns as before?

Yes. The flat version (1) could be obtained by taking each parag, discarding the head line, undoing the abbreviation of iin to m (and/or any other guessed abbreviation), and joining the lines into a single token stream. That token stream would then be fed to the assumed line-breaking algorithm with various page widths.

@quimqu and myself have been doing such experiments in some thread out there. We observed that the line-breaking algorithm does generate anomalies at the new breaks. But, as of the last posts, we still had not managed to erase the anomalies of the old breaks. Clearly there is more going on than just iin -> m.

All the best, --stolfi

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - tavie - 09-05-2026

Agree with Patrick. When people refer to 'LAAFU', they are not necessarily claiming the code switches from line to line. I don't know anyone who believes that, and I don't recognize the types of LAAFU set out You are not allowed to view links. Register or Login to view.. LAAFU was a phrase used by Currier, and we can't be sure exactly what he meant by it. Certainly, a possible inference is that the system varies line by line. But that jars with the consistency of word types elsewhere in the line.

Several people on here simply use the phrase to mean that Voynichese is not flat, and that something unusual is going on at the line level that needs to be investigated. Emma May Smith You are not allowed to view links. Register or Login to view.

On the "line break algorithm", if the scribe did regularly break the line rather than find a way to squeeze in a longer word (which has not been proved and should be carefully considered how it interacts with claims of abbreviation at line end as well), we still need to explain

why line start word types often do not resemble the word types that the scribe failed to squeeze in (i.e. ones underperforming at Line End)
why several line start word types are rarely seen in the middle of the row (we can see why they won't be at Line End but why not elsewhere?)
why daiin is more common at Line End and chol is not in Herbal A despite both being similar size (and indeed why initial d word types are more common the closer you get to Line End)
why the line start word appears to be impacted in many cases by the word above it

This points to multiple "mechanisms" at play, not merely a line break algorithm. Altering the composition of the line start word that has been pushed from the end of the line above is not a side effect of it being shunted down: there's a logic to longer words being demoted to the start of the next line but not for this to automatically cause their composition to be altered. Something - or somethings - else must be behind this.

There's still likely some medium and high hanging fruit up in the tree for us to pick that might shed further light on these "somethings", and that's not a waste of time.

As a final note for any newcomers, we have accumulated a few LAAFU threads now which are worth reading:

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - MarcoP - 09-05-2026

(09-05-2026, 12:44 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.2. why several line start word types are rarely seen in the middle of the row (we can see why they won't be at Line End but why not elsewhere?)

In particular, it's interesting that line-start words contain bigrams (like You are not allowed to view links. Register or Login to view.) that are quite rare elsewhere.

You are not allowed to view links. Register or Login to view. was that "there could be a process which adds [y, d] to words beginning [ch, sh] when they are in a linestart position".

It's unclear that Voynichese is a cipher, so we don't know if there is an underlying text. Personally, I doubt that properties like this could derive from the hypothetical underlying text.

Filename: yd-bench.jpg Size: 635.75 KB 09-05-2026, 08:42 AM

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - Jorge_Stolfi - 09-05-2026

(09-05-2026, 12:44 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.On the "line break algorithm", if the scribe did regularly break the line rather than find a way to squeeze in a longer word

That would be the "trivial" line-breaking algorithm (TLA).

Quote:... multiple "mechanisms" at play, not merely a line break algorithm

You mean, not just the plain TLA. But it is pretty clear that the Scribe's line-breaking "algorithm" was more complicated than that.

Quote:why line start word types often do not resemble the word types that the scribe failed to squeeze in (i.e. ones underperforming at Line End) [...] Altering the composition of the line start word that has been pushed from the end of the line above is not a side effect of it being shunted down: there's a logic to longer words being demoted to the start of the next line but not for this to automatically cause their composition to be altered.

Not so. A confirmed effect of the TLA, in any text and any language, is that the first token of each line will tend to be longer than average. It means that the frequency of any word type at line-start will probably be different from its frequency at other places along the line. That in turn implies that most character and n-gram statistics will be different there, too. Because the frequency of a character or digraph in some text is largely determined by its presence or absence in the most common words in that text.

For instance, words that start with qo tend to be longer than words that don't start with qo. That alone would cause the frequency of qo at line-start to be higher than elsewhere.

Quote:why daiin is more common at Line End

Earlier I claimed that TLA would also cause the last few words of each line to be shorter than average. But I was wrong; TLA by itself has no effect on word type probabilities at line-end, only at line-start. And it also does not, by itself, affect the word distribution of the second token of each line.

However, even the plain TLA can have such side effects if there is any correlation between the lengths of consecutive tokens. For instance, if a certain "short" word W (like "the" in English) tends to be followed by "long" words, TLA will cause the frequency of W to be enhanced at line-end.

By a similar process, if there is a positive correlation between the lengths of consecutive words, the second token of a line will be longer than average, too; although not so much as the first token. And that effect will extend to the next few tokens, but with decreasing magnitude.

Quote:why several line start word types are rarely seen in the middle of the row (we can see why they won't be at Line End but why not elsewhere?) [...] why initial d word types are more common the closer you get to Line End)

Again, the Scribe's line breaking algorithm (SLA) is definitely more complicated than the plain TLA. There are good reasons to suspect that one of its extra features is the optional abbreviation iin -> m if it would let the Scribe squeeze one more word in. But we don't know what other tricks it has.

Quote:and chol is not in Herbal A despite both being similar size

Different sections are about vastly different topics. That alone is expected to cause large differences in the frequencies of all word types -- not just "content" words but also "function" words. And that in turn is expected to cause large differences in character and digraph statistics. As for Herbal-A and Herbal-B, the simplest explanation is that they were sourced from two separate herbals, each with its peculiar style, vocabulary, etc. Possibly, but not necessarily, in two different dialects.

For that reason, it is wiser to keep LAAFU investigations separate from language A/B investigations. And to limit all statistics to one type or text (parags) from just one section, treating Herbal-A and Herbal-B as separate sections.

Quote:why the line start word appears to be impacted in many cases by the word above it

Indeed, this is not an expected consequence of TLA by itself. But there may be other explanations that do not imply LAAFU.

As noted above, even plain TLA should cause qo-words to be more common at line-start than elsewhere. But now suppose that the grammar and/or the nature of the paragraphs is such that there cannot be more than one qo-word per sentence (or that the qo-words will tend to cluster together), and the average sentence length is 20 words. Since a full-width line has about 10 words, these hypothetical facts together would largely prevent two consecutive lines from starting with qo.

And note also that Herbal has many parags with narrower rails, where this possible explanation would work even better.

(09-05-2026, 08:43 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.In particular, it's interesting that line-start words contain bigrams (like You are not allowed to view links. Register or Login to view.) that are quite rare elsewhere. You are not allowed to view links. Register or Login to view. was that "there could be a process which adds [y, d] to words beginning [ch, sh] when they are in a line-start position".

I recently proposed such a hypothetical mechanism somewhere. Suppose that the language has semantic tones (like Mandarin) or semantic pitch (like Swedish), and [aoy] denote pitch levels low-mid-high. If a word ends with y pitch and the next one starts with y pitch, there would be no need to write the y twice. Thus the Scribe may have been told that, in such cases, he can omit the second y. Unless there is a line break, in which case the second y should be written to help the reader.

All the best, --stolfi

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - Stefan Wirtz_2 - 09-05-2026

I would see m as the „ending variant“ of z .
Most, but not all vord-ending letters show a final slash or curl at their own end.
As well as I understand y being the end variant of a, maybe even of o.
g could be some end variant of d;
but I am not completely sure whether g and m are really two different characters, or just only some m, but written in different styles of the scribe(s).
At all, these are candidates for positional variation.
I am well aware that d and o do also an „ending jobs“ and y, m and g are appearing at non-final positions also.
So these characters may fulfill some extra task in addition, but without leaving their main meaning.

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - DG97EEB - 09-05-2026

(09-05-2026, 10:06 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.I would see m as the „ending variant“ of z .
Most, but not all vord-ending letters show a final slash or curl at their own end.
As well as I understand y being the end variant of a, maybe even of o.
g could be some end variant of d;
but I am not completely sure whether g and m are really two different characters, or just only some m, but written in different styles of the scribe(s).
At all, these are candidates for positional variation.
I am well aware that d and o do also an „ending jobs“ and y, m and g are appearing at non-final positions also.
So these characters may fulfill some extra task in addition, but without leaving their main meaning.

Oh dear Stefan, you seem to have upset Diane somehow....

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - Stefan Wirtz_2 - 09-05-2026

(09-05-2026, 10:34 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.Oh dear Stefan, you seem to have upset Diane somehow....

What? Which Diane?
With those few letters here? How?

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - tavie - 10-05-2026

(I don't know either but let's try to stay on topic and away from any off-forum drama, please!)

RE: [split] What can the structural peculiarities of the VMS tell us about the nature ... - DG97EEB - 10-05-2026

(09-05-2026, 11:03 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.
(09-05-2026, 10:34 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.Oh dear Stefan, you seem to have upset Diane somehow....
What? Which Diane?
With those few letters here? How?

Apologies, I naively assumed that you'd know what I meant. I was having a bit of fun. Diane O'Donovan of course who has a habit of naming individuals on her blog, which I personally find very rude. She posted something last night and referenced you directly. My comment was intended to be amusing..Sometimes I forget that British humour doesn't always translate, especially our of context. You are not allowed to view links. Register or Login to view.