A family of grammars for Voynichese

A family of grammars for Voynichese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: A family of grammars for Voynichese (/thread-4418.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

RE: A family of grammars for Voynichese - Jorge_Stolfi - 30-12-2025

(29-12-2025, 03:33 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.Don’t e’s only precede d’s , or o’s or a’s

I feel that the unpaired e is a suffix for benches or gallows, rather than a prefix for other glyphs; but right now I don't have a strong justification for that feeling.

Offhand, I would say that there seem to be many words that begin with d, but few that begin with ed.

If my model is valid, the pair ed should occur "only" in the suffix part of the word, after a gallows or bench. It should "never" occur before a gallows or a bench.

Quote:[and is it not true] that a’s only precede r’s l’s and n’s (or i benches)?

There seem to be many constraints between what I call the "O" glyphs (a o y) and the "K" (non-"O") elements around them. My model does not say anything about that, since I strip the "O"s before parsing the other elements according to the crust-mantle-core pattern. Maybe some "O"s are indeed prefix or suffix modifiers for the "K"s, and should be incorporated in them (like I included the e suffix). For instance, qo should maybe count as a single element. I don't know; I should look into that.

But note that, according to my model, one can have one or two "O"s before the first "K" or after the last "K". So it cannot be that "O"s are always prefixes of "K", or always suffixes.

Also there seems to be a good number of tokens that are just "O"s without any "K"s. Maybe mostly y as an isolated word? And maybe also o with a dubious space after it?

All the best, --stolfi

RE: A family of grammars for Voynichese - Bluetoes101 - 31-12-2025

It probably also doesn't help that transcriptions are not entirely accurate, especially in regards to a, o, y but also others.
The MSI scans clearly shows several mistakes on the first paragraph of the MS, so I'm not sure how literally stats can be taken from transcriptions for this sort of in-depth analysis.

Examples using - ZL v. 3b.

"ase"

Filename: ase.JPG Size: 17.81 KB 30-12-2025, 11:56 PM

When I was working on my curve-line(ish) stuff, I ran into a lot of issues with errors and re-transcribed pages then tallied up how much the changes benefitted me overall. The scored was generally around 0 (as I would find a fair few examples that worked against me too), but its worth baring in mind that stats derived from a transcription for very specific examples may not always be entirely accurate. Maybe subjective, but personally I find this to be "ytaiin" considering surrounding "a"s. You can see the inward curving right side stroke, maybe it is "o", yet it is "ataiin" for stats.

Filename: at.JPG Size: 73.68 KB 31-12-2025, 12:07 AM

This is just to say that if you wish to understand how ch differs from ih, I found you have to sit and look at the page rather than what stats based on transliterations say. Does the scribe have "flicky" i's that might be mistook for c for example, if there's enough evidence of this in surrounding text there is a decent argument the transliteration is incorrect.

Another example of this may be found in the first paragraph also.

Filename: ii.JPG Size: 64.1 KB 31-12-2025, 12:15 AM

Top one is ii, the other is ic (sh with modification added - "ydaraishy"). Why?

Another from the first paragraph - "soiin"...

Filename: soi.JPG Size: 33.84 KB 31-12-2025, 12:26 AM

You have to go look at the text for stuff like ih rather than ch, the transliterations and stats derived from them are not accurate.

RE: A family of grammars for Voynichese - ReneZ - 31-12-2025

(30-12-2025, 10:49 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Offhand, I would say that there seem to be many words that begin with d, but few that begin with ed.

If my model is valid, the pair ed should occur "only" in the suffix part of the word, after a gallows or bench. It should "never" occur before a gallows or a bench.

In my (only partially published) approach, I agree and I 'tentatively explain' it by positing that sequences of e are post-fixes of ch, Sh and the gallows. ('Tentatively explain' just means: 'model').

Example: chol can be 'split up' as ch - ol

chedy cannot be split up as ch - edy but as che - dy

RE: A family of grammars for Voynichese - Grove - 31-12-2025

F85r1 has a few words where e isn’t preceded by a gallows or bench. They follow an o.
Soeeedy
Oeeseary
Oees

I’m hoping they aren’t the only examples of e-series as a prefix to d o s or a.

RE: A family of grammars for Voynichese - ReneZ - 31-12-2025

(31-12-2025, 12:51 AM)Grove Wrote: You are not allowed to view links. Register or Login to view.F85r1 has a few words where e isn’t preceded by a gallows or bench. They follow an o.
Soeeedy
Oeeseary
Oees

Indeed. These are also seen in zodiac labels, especially around Libra.
(This sort of seasonal behaviour is also interesting in itself).

I have some ideas about this as well, but I'm still wondering about that. Also, all of these hypotheses are almost impossible to 'prove'.

RE: A family of grammars for Voynichese - Grove - 31-12-2025

(31-12-2025, 01:41 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(31-12-2025, 12:51 AM)Grove Wrote: You are not allowed to view links. Register or Login to view.F85r1 has a few words where e isn’t preceded by a gallows or bench. They follow an o.
Soeeedy
Oeeseary
Oees

Indeed. These are also seen in zodiac labels, especially around Libra.
(This sort of seasonal behaviour is also interesting in itself).

I have some ideas about this as well, but I'm still wondering about that. Also, all of these hypotheses are almost impossible to 'prove'.

True, and I’ve just arrived with my decades old baggage!

RE: A family of grammars for Voynichese - Jorge_Stolfi - 31-12-2025

(31-12-2025, 12:51 AM)Grove Wrote: You are not allowed to view links. Register or Login to view.F85r1 has a few words where e isn’t preceded by a gallows or bench. They follow an o.
soeeedy

oeeseary

oees

I’m hoping they aren’t the only examples of e-series as a prefix to d o s or a.

In my model, ee is a bench, like ch and sh, and may have an e suffix. So I parse the first and third words into elements as

{s}{o}{eee}{d}{y} with a valid CMC pattern, DXD
{o}{ee}{s} with a valid CMC pattern, XD

The word oeeseary fails my model not because of the ee but because of the se, since I don't allow an e suffix for s alone (only on sh). My guess is that the se is a scribal error for sh. That is, the word should have been {o}{ee}{sh}{a}{r}{y} which fits my model with the valid CMC pattern XXD.

Incidentally, I see a lot of evidence of retracing on that page (or most any other page), which must have contributed to errors. For instance, just below that oeeseary there is an ykeeeody where the y was incorrectly retraced as a. A transcriber could easily miss the faint tail of the original y (as the Retracer apparently did) and read it as a. But that is another thread...

All the best, --stolfi

RE: A family of grammars for Voynichese - Grove - 31-12-2025

If ee is a bench then is ii a bench as well? I think it is clear that transcriptions are error prone due to fading and overwriting by a potential retracer who either had no idea what the correct character should be or was correcting a perceived error. Do we have any idea of the dating difference between original and retraced inks?

A little struggle for me is where the ee bench is in an eee because an e suffix could belong to the preceding character giving e ee or following this ee bench giving ee e. How does one know what character gets the suffix?

Happy New Year!

John

RE: A family of grammars for Voynichese - Jorge_Stolfi - 31-12-2025

(31-12-2025, 01:46 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.If ee is a bench then is ii a bench as well?

Including ii as a bench ("X") would allow words like iiky or iiody, which do not seem to occur. Thus I accept i, ii, and iii only as parts of the coda ("N") elements, which in the CMC model can occur only at the end of the word (apart from following "O", maybe).

Quote:Do we have any idea of the dating difference between original and retraced inks?

I have no firm answer for that. However, the motivation for retracing presumably was that the original traces had faded to the point that they were hard to read. Thus I suppose it must have been several decades to a couple of centuries after the original writing. That would be consistent with the Retracer being ignorant not only about the meaning of the text and figures, but even about the Voynichese alphabet -- as indicated by the nature of his mistakes.

On the other hand, I doubt that the Jesuits cared enough about this book to commission such a restoration. Thus I would guess that the major restoration happened between ~1470 and ~1630. The worried owner could have been Marci, Barschius, Sinapius, ... or someone else before them.

Quote:A little struggle for me is where the ee bench is in an eee because an e suffix could belong to the preceding character giving e ee or following this ee bench giving ee e. How does one know what character gets the suffix?

Good observation; indeed my definition of "element" is ambiguous in that point. My parser breaks the ambiguities as follows: if the next two EVA letters are ch, sh, or ee, followed by a single e, then those three letters are one element. If the ch, sh, or ee is followed by two or more e, then the next element is just the two letters.

That is,

chdy is {ch}{d}{y}
chedy is {che}{d}{y}
cheedy is {ch}{ee}{d}{y}
cheeedy is {ch}{eee}{d}{y} (not {che}{ee}{d}{y}) which would be the other possibility)
cheeeedy is {ch}{ee}{ee}{d}{y} (not {che}{eee}{d}{y})

This ambiguity is only relevant for the CMC model if there is a {ch} or {sh} followed by four e in a row, as in the last example; because, with my disamb rule, the result has three benches, which is forbidden, instead of two, which is OK. But I don't see any such word in my transcription. There are several words with four e, like deeees, but that is not ambiguous since the only possible parsing is {d}{ee}{ee}{s} - which is CMC-valid.

All the best, and Happy New Year! --stolfi

On the "ir" codas - Jorge_Stolfi - 26-02-2026

In previous posts I had proposed that the elements of the Voynichese script were

"O" the /circles/ {a} {o} {y},
"Q" just {q}.
"D" the /dealers/ {d} {l} {r} {s}.
"X" the /benches/ {ch} {sh} {ee} with an optional 'e' suffix.
"G" the /simple gallows/ {k} {t} {p} {f} with optional 'e' suffix.
"H" the /platform gallows/ {cth} {ckh} {cph} {cfh} with optional 'e' or 'h' suffix.
"N" the /codas/ {n}, {in}, {iin}, {iiin}, {m} {im} {iim} {iiim}, {ir}, {iir}, {iiir}.

Together with the seven-slot ("crust-mantle-core",CMC) model, this model for
the elements implies many restrictions on the words, such as "the 'ed' digraph cannot occur at the start of a word" and "the i glyph must be followed by i, n, m, or r".

I had included {iiir} and {iiim} for symmetry with {iiin}, but had already observed that they practically do not occur:

113.500000 0.00091 {n} 868.250000 0.00697 {m}
1665.500000 0.01336 {in} 40.000000 0.00032 {im}
3779.000000 0.03032 {iin} 15.000000 0.00012 {iim}
159.000000 0.00128 {iiin} 1.0 0.00001 {iiim}

487.750000 0.00391 {ir}
130.500000 0.00105 {iir}
1.0 0.00001 {iiir}

(The numbers are total counts in running text, and frequencies relative to all elements.)

It is also worth noting that {iir} is less common than {ir}, even though the pattern of {in} and {iin} is the opposite.

I was unhappy with those {ir} and {iir} elements anyway because I could not let {r} alone be a coda element, like {n} and {m} were. Making {r} a coda element would have created a serious ambiguity in the parsing, since the final r in words like otar could be parsed either as a dealer (slot 6) or a coda (slot 7), and the choice would affect the limit on the number of dealers of the CMC model. Excluding {r} from the codas would avoid that ambiguity but made the set of elements less symmetrical.

But now I have become convinced that many, if not all, of the hundreds of ir endings are in fact scribal errors for iin. These errors could have been caused by the Author's sloppy "cursive" handwriting on the draft, where an in could easily be confused with an r:

Filename: 2026-02-11-234905-daiin-to-dair.jpg Size: 19.36 KB 26-02-2026, 01:17 PM

Thus I have decided to remove the elements {ir}, {iir}, and {iiir} endings from my model. When parsing words into elements, I may label any word that ends in ir as invalid, or "auto-correct" it by replacing the final ir by iin, as appropriate.

If ir is in fact a valid ending for Voynichese words, the auto-correction of ir to iin will introduce some errors. But I am convinced that it will also fix many errors, thus it is unlikely to make the puzzle harder to crack.

However, that decision now implies an ambiguity about the r without preceding i at the end of words like otar. The "sloppy draft" argument should apply here too: presumably many of those final rs are actually ins. Should I declare them illegal too? That would exclude a lot of common words. Should I auto-correct them too to in? That may be a sensible choice, but (unlike the case of final ir) I don't have any evidence that those are mostly errors. I will let them stand for now...

All the best, --stolfi