The Voynich Ninja

Full Version: A family of grammars for Voynichese
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(21-08-2025, 05:05 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.SLOT 1: ch sh y
SLOT 2: eee ee e q a
SLOT 3: o
SLOT 4: iii ii i d 
SLOT 5: y p f k l r s t cth ckh cph cfh n m

Slot 4 looks interesting to me. I cannot think of a context in the manuscript where both i+ and d would appear valid, I wonder what lead to them being grouped together. Maybe this will be clarified in the per-section grammars. Since these grammars are mutually incompatible (perform poorly on other sections, as you said), probably the generic grammar is a mix of individual grammars.

Also interesting that o has its own slot. And y appears in two slots, looks like it's the only character that does this. As far as I remember, there used to be some ideas about there being two distinct y glyphs - word initial y and word ending y, that look almost the same, but function differently.

I think this work is very important for possible deciphering attempts.
(21-08-2025, 08:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(21-08-2025, 05:05 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.SLOT 1: ch sh y
SLOT 2: eee ee e q a
SLOT 3: o
SLOT 4: iii ii i d 
SLOT 5: y p f k l r s t cth ckh cph cfh n m

Slot 4 looks interesting to me. I cannot think of a context in the manuscript where both i+ and d would appear valid, I wonder what lead to them being grouped together.

Remember that being grouped together means they are alternative one to each other in that position. But d is interesting, I tried with many different characters allowed to move freely (r s t k l m n cth ckh...) and only 'd' did not cluster with all the other 'consonants'.


(21-08-2025, 08:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. Maybe this will be clarified in the per-section grammars.

I hoped that too, but I could come to nothing which was satisfying (the always seems to cluster by itself though, in both Currier languages, but in different positions). I cannot really say much at the moment, it's been a couples months since I last reasoned on the grammars (in the previous post I just wanted to report a negative), but as promised I'll rummage again into the data and give some examples and thoughts.

(21-08-2025, 08:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. Since these grammars are mutually incompatible (perform poorly on other sections, as you said), probably the generic grammar is a mix of individual grammars.

Yes, surely it is. But I hoped that the grammars of the Currier B sections (which comprises the biggest part of the text) were more similar between themselves and to the 'whole VMS' grammar than they actually turned out to be. But it's also true that the looping mechanism does not help in understanding how much two grammars are different, ie. [a b c d] and [d a b c], when looped, are in practice the same grammar.


(21-08-2025, 08:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Also interesting that o has its own slot. And y appears in two slots, looks like it's the only character that does this. As far as I remember, there used to be some ideas about there being two distinct y glyphs - word initial y and word ending y, that look almost the same, but function differently.

I did not check much the behaviour of o, I'll give it a look tomorrow. y is interesting: all the 'best' grammars calculated on the whole VMS have an y both in the first and the last slot (hundreds of them) and I thought it gave a good support to the initial-y final-y theory. But incredibly, the grammars calculated on each section almost always have just one y. I don't know why: it can be just an artefact of the 'looping', or it may depend on the basic grammar structures I feeded to the Monte Carlo engine as seeds or on the VMS not following after all the inital-y final-y theory.


(21-08-2025, 08:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think this work is very important for possible deciphering attempts.

I thank you a lot but I'm not that optimistic at this point. Everything progressed well until I hit the usual wall of haziness and endless intractable complications, without any clear foothold in sigh. But at least I think I systematized a bit the field of slot grammars with the critique of the efficiency metric and I explored a bit a pretty interesting and fascinating grammar landscape. If anything of this can be of help to others, I'll be very glad.

For my part, I'll write down the results I got with the study of the sections in a future post (feel free to ask if you want more informations or raw data or whatever), then I'll go on with my current pausing of slot grammars and words chunkification studies, hoping to stumble on some new idea I am actually able to test, someday.
Just very briefly, I would see one of the main purposes of 'looping' to be to avoid having the same character in several slots. This does not seem to be achieved, and I suspect (just guessing) that this is the result of trying to achieve a too high coverage.

I think that one of the main problems with the Naibbe approach (which the slot approach does not suffer) is the need to be able to split words into parts, where the different possibilities for each part are somewhat limited in number. I don't think it can be done. Even a short slot system will easily generate large numbers of such apparent parts. No need to list them...
(22-08-2025, 12:39 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Just very briefly, I would see one of the main purposes of 'looping' to be to avoid having the same character in several slots. This does not seem to be achieved, and I suspect (just guessing) that this is the result of trying to achieve a too high coverage.

Trying to achieve too high a coverage could indeed be the problem. I remember you already remarked this here some time ago, and I actually made some tests by considering only the word types which appear a minimum number of times in the text (just hapax legomenas make up ~69% of the word types), which automatically decreases coverage (and may help with scribal/transcription errors). But I did not find anything which stood out as possibly significant.

(22-08-2025, 12:39 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I think that one of the main problems with the Naibbe approach (which the slot approach does not suffer) is the need to be able to split words into parts, where the different possibilities for each part are somewhat limited in number. I don't think it can be done. Even a short slot system will easily generate large numbers of such apparent parts. No need to list them...

My approach too, ultimately, aimed to split words in chunks, in the hope that the chunks could be the basic units of the encoding method (or of the meaningless text generator, or whatever). I tried to do this with the Nbits metric, while the Naibbe approach used (if I'm not mistaken) a trial-and-error strategy. I would frame this generic approach like this: under the hypotheses VMS words are composed of chunks (which may, or may not, be true), is it possible to reconstruct objectively what those chunks are? It's not at all clear to me if this is mathematically possible and under which conditions, ie. a text like [qok qokedy edy qok edyqok edyedy qokqok] looks readily tonekizable as [qok, edy], while a jumble of random characters is obviously hopeless. The VMS lies somewhere between. Natural languages lie somewhere between too, and in this case they lend itself to be tokenized, in syllables (*), but I can't really say if this also the case for the VMS, or what the objective function could be, nor if it's possible to find a tokenization scheme which stands out above all others (which I failed to find with my looping grammars and Nbits). As you say, it might well be impossible.

(*) I actually made some tests of the 'looping slot grammar' approach on natural languages, using grammars inspired to the onset-nucleus-coda structure of phonetic syllables. Iirc it worked reasonably well for Latin and Italian, not as well with English.
Sorry for the delay in posting the grammars, but I've caught a fever and just can't do it now. Be patient.
(23-08-2025, 07:28 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Sorry for the delay in posting the grammars, but I've caught a fever and just can't do it now. Be patient.

Get well soon!

I don't think there is any hurry. Certainly not as far as I'm concerned.
These are the better grammars I found (for the whole VMS and for the different sections):

[attachment=11329]

The most notable thing is [l r s k t p f ckh cth cph cfh] always seem to cluster together  (only [r] does not cluster, in the Balneological section). [n m] often go together with the other 'consonants', while [d] is always separated.

These are how the 'best' grammars parse some words into chunks (I had hoped the chunks always stayed the same, that would have been promising, but it did not happen):
[attachment=11330]
(22-08-2025, 12:39 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Just very briefly, I would see one of the main purposes of 'looping' to be to avoid having the same character in several slots. This does not seem to be achieved, and I suspect (just guessing) that this is the result of trying to achieve a too high coverage.

I think that one of the main problems with the Naibbe approach (which the slot approach does not suffer) is the need to be able to split words into parts, where the different possibilities for each part are somewhat limited in number. I don't think it can be done. Even a short slot system will easily generate large numbers of such apparent parts. No need to list them...

To chime in here, there is enormous benefit to the slot approach, so it's no wonder that so many people over the years have converged on slot-like explanations of how VMS word types are constructed. It's clearly getting at something important. But as you know, whether slot grammars were used to construct VMS word types is distinct from whether individual slots themselves represent units of meaning.

To reconcile the VMS's observed entropy, token length distribution, and type length distribution with the statistics of pretty much any known European language, a VMS-mimic cipher often has to be verbose in its mappings between plaintext alphabet letters and Voynichese glyphs. In the context of many proposed slot grammars, this immediately implies that individual plaintext letters are mapped to glyph strings formed by groups of adjacent or near-adjacent slots. Taking Zattera (2022), for example, the "prefix" qok is formed from slots 0, 1, and 3. If one VMS slot mapped approximately to one plaintext letter—which would undoubtedly make the whole thing easier to keep in one's head and readily use—then it'd be challenging, and therefore improbable, to achieve the VMS's anomalously low entropy.

After having gone through the process of designing the Naibbe cipher, my personal suspicion is that the VMS slot grammars provide compressed, high-level summaries of how lists of various affixes were iteratively constructed. In other words, the VMS's creator(s) could have easily and iteratively created the affixes okolk, otolt, qok, qolk, qot, and qolt, which can then be collectively described using a four-slot structure: q|o|l|k/t. This doesn't mean that the slot grammar was formalized prior to the affixes' creation, but rather that the slot grammar summarizes the soft "rules" that were intuitively used during the affixes' creation. Along these same lines, I suspect that one could make pretty serious headway toward a slot-grammar formalism for Polygraphia III.

With regards to breaking up words, I admit it's tricky, but there could theoretically be rules in place to aid the reader, such as initially reading words left-to-right as concatenations of the smallest possible number of the longest possible valid affixes.
Just to avoid misunderstanding, I do not think that the creator(s) of the Voynich MS used a slot approach to generate their vocabulary. I strictly see it as a potential model to (hopefully) better understand the composition of the words.

The same is truefor the Naibbe approach. It is (can be) a model that may tell us something about the text.
One huge benefit of Mauro's approach is that it's possible to objectively compare the goodness of fit of different grammars and the approach has very few assumptions. Basically, as long as the transcription is considered correct, I think it's possible to identify an optimal grammar for any section of the manuscript. It's not very clear how to interpret the results, but for many tentative cipher schemes it can be possible to quickly cross-check them against the grammar to see whether the scheme makes sense.

It's interesting that q and a are grouped together everywhere except Pharma and Herbal B. I wonder why ch and Sh are absent from Pharma and Cosmological.
Pages: 1 2 3 4 5 6 7 8 9 10