The Voynich Ninja

Full Version: Atoms and elements of the Voynichese alphabet
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Voynich texts still spark off discussions about their structure, especially about the length of Vords, the understanding of gaps and possible sentences or chapters.
At three occasions, Voynichese characters appear as lined-up, but single symbols. We don‘t know the purpose of these line-ups.
These sources are the second-outer ring of folio 57v, the second leading column of 66r, and the leading column of 76r.
Whatever the purpose of those lines of singled symbols are, they obviously are used for some stand-alone function here.

[Image: 26e20ed6561747157d6caea0c67d5218.jpg]

[Image: d4f8549d8c759b90762fdb25760c74f3.jpg]

A bit surprising, "air" appears here between the line of single letters; most people consider this as a combination of up to 3 different Voynichese characters a, i and r,  but we have to face here the fact of "air" being some "atomistic", stand-alone sign. It appears also (very rarely) in the main text.

Therefore, it is set together with the single characters and some combinations of them in this alphabet table:

[Image: 6db97ce1389c7568d8d62c9550585a61.jpg]

Some of the most frequent characters are not used in 57,66, and 76, but can be found at various positions in main text, so we can assume for them "having a life on their own" -- these are framed in blue colour.

Most people think of cFe, cPe, cTe, cKe, ee, eee, ii, iN, ir, iiN, iir also being combined characters = red framed here.

But at least cTe appears stand-alone or in an end position in text (also very rare), so that might be an element with it's own function as well:
(middle line, mid position)
[Image: 42d41cb685fd9b33b9c8b09035937135.jpg]

(lowest Vord)
[Image: e63a0f09d2db5056eac8e36118d17bb1.jpg]

At position 9 in the image of "4x17 sequences" is clearly made a difference between f (first 2 sequences) and p (last 2 sequences); this was discussed already, we have to live with the fact that they are not completely identical. But for identification of stand-alone letters this doesn't matter here.
But I would see the strange < at position 16 just as a bit harsh variant of c  like in the other 3 sequences, not some new combination or symbol.

So, in total the table shows 28 single characters, 7 rather clear combinations and 4 combinations with (maybe optional) an own function.
Voynichese Alphabet may sum up to 39 letters; even if we take out the 6 rarest symbols, which appear nearly nowhere else in the script anymore, it is still a number or 33 characters available and in (more or less) use.

But can there be even more?
I did not the variants of "d" here, like 8, D, j or such: those things appear to me like variations of style & scribe, not so much as own symbols.

Here is some kind of "8" party happening:

[Image: 459f9b296646f6f8235566ba3302b24c.jpg]

Looks like all be done by the same scribe, but he did not care too much for exact writing of "8". I wouldn't enhance an alphabet by these.
... deleted.....
(15-06-2026, 04:02 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.A bit surprising, "air" appears here between the line of single letters; most people consider this as a combination of up to 3 different Voynichese characters a, i and r,  but we have to face here the fact of "air" being some "atomistic", stand-alone sign. It appears also (very rarely) in the main text.

.....

Most people think of cFe, cPe, cTe, cKe, ee, eee, ii, iN, ir, iiN, iir also being combined characters = red framed here.

....

But can there be even more?

If 'in', 'iin', 'ir', 'iir', then why not 'im', 'iim', 'ig', 'il', 'iil', 'is', 'iis', 'ik', 'iik' etc.?
(15-06-2026, 04:02 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.A bit surprising, "air" appears here between the line of single letters; most people consider this as a combination of up to 3 different Voynichese characters a, i and r,  but we have to face here the fact of "air" being some "atomistic", stand-alone sign. It appears also (very rarely) in the main text.

I believe that every ending ir is a rather common error by the Scribe, caused by him mis-reading an in in the Author's sloppy handwriting as r.  (But not every r is such a "quillo"!) 

That would explain why the ending is is quite rare, while ir is quite common and r and s are otherwise similar: because ien, unlike iin, is not a valid ending.

Quote:Most people think of CFe, CPe, CTe, CKe, ee, eee, ii, in, ir, iin, iir also being combined characters [...] But can there be even more?

In my view a single e is a modifier that can follow k, t, Ch, Sh, ee, CTh, CKh, CPh, and CFh (but not p and f or any other letter), turning those nine letters into nine additional ones: ke, te, Che, She, eee, CThe, etc. 

On the other hand I believe that p and f are not distinct letters, but only ornate versions of other letters or digraphs -- sort of like our capital letters, but used in rather different ways.

And ee may be just a lazy version of Ch with the ligature omitted (like a Latin "i" missing the dot).

And CTHh, CKHh etc are CThe, CKhe etc with an accidentally over extended ligature line.

And m is a Scribal abbreviation for iin (and maybe for in too). 

And b, u, g are not distinct letters, but merely malformed versions of other letters.  Ditto for Ih,IKh etc, and any other i that is not in an in or iin or iiin.

And q or qo is not a letter, but a symbol that means "and" -- like English "&" but with the grammar of Arabic "wa-".

So my proposal for the Voynichese alphabet is

a   o   y    d   l   r   s 
Ch   Sh   ee   Che   She   eee
k   t   ke   te
CKh   CTh   CThe   CKHe
n   in   iin   iiin(?)

Those in the last row can occur only at the end of a word.  There are many other restrictions but not worth mentioning now.

All the best, --stolfi
(15-06-2026, 08:26 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.If 'in', 'iin', 'ir', 'iir', then why not 'im', 'iim', 'ig', 'il', 'iil', 'is', 'iis', 'ik', 'iik' etc.?

Maybe, why not?

But the main differences are:
- in and iin appear rather often in VMS; mostly, but not exclusively in an end position. The are following very often some "a", but there are some exceptions also. They never show up stand-alone. 
n is functioning as stand-alone letter in columns and main text. 
So I see in and iin as combined, but as new characters with an own "value", calling it element of the basic VMS alphabet.

- nearly the same for "ii"

- ir and iir are technically combinations of r with i and ii. I wouldn't have considered these as atoms or elements, but since air has a fixed stand-alone meaning and use, I took these as values of their own also into the table; they appear also in main texts sometimes without an "a", but remain as redframed second-class letters in my table.

- why not im, iim, ig, il, iil, is, iis, ik, iik
I understand those strings as combinations; did not see a stand-alone usage yet, and not a very high rate of use within text. (may call those 'molecules', if you want to). Finally, all Vords are combinations of basic alphabetical elements: I tried to concentrate and condense those latter ones here.
(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I believe that every ending ir is a rather common error by the Scribe, caused by him mis-reading an in in the Author's sloppy handwriting as r.  (But not every r is such a "quillo"!)

No, I don't do that.
Claiming that one of the most common combinations is just a steady error while trying to write an even more common combination?
That would be an assumption I can't follow.

[..]
(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In my view a single e is a modifier that can follow k, t, Ch, Sh, ee, CTh, CKh, CPh, and CFh (but not p and f or any other letter), turning those nine letters into nine additional ones: ke, te, Che, She, eee, CThe, etc. [..]

I don't have a concept of "modifiers" in an alphabet at this moment. Could be possible, or just not.
In my view, e is following c  to --> ce, it also sets ee and eee. At least eee will have another meaning than just 3x e, because we do not really have a lingual understanding for 3times of a single letter.
e (or ce!) surrounds the gallows, and I think something like iFi ist just an unprecise writing of eFe, so I took such combinations not into table.
But nothing more for me.

(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.On the other hand I believe that p and f are not distinct letters, but only ornate versions of other letters or digraphs -- sort of like our capital letters, but used in rather different ways. [..]

The 4x17 sequence where only p and f are disturbing the perfect order, is tempting to dump the difference between both gallows. I wouldn't do that, both versions occur many times in texts, even together within same lines. Such effort to write 1 character in different ways...?

[..]
(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And m is a Scribal abbreviation for iin (and maybe for in too).

Not the slightest hint anywhere for that understanding, isn't it?

(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And b, u, g are not distinct letters, but merely malformed versions of other letters.  Ditto for Ih,IKh etc, and any other i that is not in an in or iin or iiin.

Would agree for b being just an n, and u being one of the very few real typos in text.
But scribe(s) made too much effort to distinct between m and g, like here:

[Image: f81a639b3cc9fe2692599bb3f8ec5640.jpg]

Completely different breed of letters, from one and the same scribe.
We see g quite often, and clearly expressed by writing -- that is nothing like m at all...
(16-06-2026, 12:34 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.is, iis, ik, iik

These character strings go against the curve line system rules that seem to govern many words in the manuscript. e stroke characters seem to like to follow other e stroke characters. The same with  i stroke characters.

Some links on this:

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
(16-06-2026, 01:15 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.
(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I believe that every ending ir is a rather common error by the Scribe, caused by him mis-reading an in in the Author's sloppy handwriting as r.  (But not every r is such a "quillo"!)
Claiming that one of the most common combinations is just a steady error while trying to write an even more common combination?  That would be an assumption I can't follow.

I have statistical and other reasons to suspect that ir is a variation of iin.  Starting from the fact that frequencies of the endings ir, iir, and iiir are roughly proportional to those of iin, iiin, and iiiin (the last in each set being zero).  

As I pointed out elsewhere, the n glyph is very often drawn by the Scribe himself with a round bottom rather than a sharp corner.  If a round n is raised a little and touches the preceding, i one gets a well-drawn r.  Thus my guess is that this accident often happened in the Author's draft, due to his poor handwriting; and that led the Scribe to often write ir instead of iin. Since iin is very common, that made the erroneous ir very common too.

Quote:
Quote:On the other hand I believe that p and f are not distinct letters, but only ornate versions of other letters or digraphs -- sort of like our capital letters, but used in rather different ways.

The 4x17 sequence where only p and f are disturbing the perfect order, is tempting to dump the difference between both gallows. I wouldn't do that, both versions occur many times in texts, even together within same lines. Such effort to write 1 character in different ways...?

The Scribe obviously felt it necessary to embellish the first word or two of each parag.  I have seen manuscripts where the entire first line of each parag is written in a different font.  In the VMS, the puffs (p and f) are overly common in the first line of each parag.  And someone posted here another manuscript where the Scribe would now and then replace an ordinary letter, even in the middle of a word, with a tall letter that looked a lot like p or f, only more elaborate; and apparently the version of this tall letter was chosen randomly, irrespective of the letter that it replaced.  Putting all that together, I think that the hypothesis "puffs are ornate versions of other Voynichese letters" is quite plausible -- and I am not bothered by our failure to identify which letters they stand for.

Quote:
Quote:And m is a Scribal abbreviation for iin (and maybe for in too).
Not the slightest hint anywhere for that understanding, isn't it?

In my transcription file, The pair am occurs ~810 times at the end of a word, and only ~20 times elsewhere.  That ratio is similar to that of aiin, that occurs ~3890 times at the end of a word but only ~40 times elsewhere. I take that as a hint not proof, of course) that m may be equivalent to iin.  

However, m occurs 760 times at the end of a line (counting figure intrusions as ends of lines), and only ~340 times at any other place. Whereas iin occurs ~530 times at the end of a line, but ~3550 in all other places.  I take these numbers as hint that, if m is indeed equivalent to iin, it was used by the Scribe instead of iin mostly when space was tight (or he was feeling lazy).

Quote:But scribe(s) made too much effort to distinct between m and g

We don't know whether the Scribe was choosing one or the other deliberately, or was just alternating at random.  

The letters p and f may or may not have a hook at the end of the horizontal arm  Those hooks required a bit of extra effort to write; does that mean that they are significant?  Most Voynichologists seem to have accepted the view of the early transcribers, that the hooks are just meaningless ornamentation.   Why not have the same view of the difference between g and m?

Anyway, g is so rare that mapping all g to m will not make decipherment much harder. We routinely accept that u = v in Latin manuscripts, even though they may be distinct in some cases.  On the other hand, if they are indeed the same, counting them separately will only introduce meaningless and distracting noise in the data. 

But I admit that g may be distinct from m.  In my file, g occurs ~110 times at the end of a word and only ~10 times elsewhere; but ~90 times at the end of a line and ~30 elsewhere.  So, even if it is distinct from m, it looks like another abbreviation that the Scribe used when he was short of space (or energy).  Maybe an abbreviation of aiin, oiin, or daiin?

All the best, --stolfi
(16-06-2026, 08:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(16-06-2026, 01:15 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.
(16-06-2026, 02:58 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I believe that every ending ir is a rather common error by the Scribe, caused by him mis-reading an in in the Author's sloppy handwriting as r.  (But not every r is such a "quillo"!)
Claiming that one of the most common combinations is just a steady error while trying to write an even more common combination?  That would be an assumption I can't follow.

I have statistical and other reasons to suspect that ir is a variation of iin.  Starting from the fact that frequencies of the endings ir, iir, and iiir are roughly proportional to those of iin, iiin, and iiiin (the last in each set being zero).  

The proportional frequency relationship between "iir" and "iin" that you use as evidence for scribal error is not specific to this pair. It is a universal feature of the Voynich text. Every pair of similar frequently used words shows the same proportional relationship (see Timm 2014):

chedy (501) → lchedy (119) → olchedy (38) → qolchedy (10)
shedy (426) → lshedy ( 42) → olshedy (23) → qolshedy ( 2)

daiin (863) / dain (211)
daiir ( 24) / dair (106)
 aiin (469) /  ain ( 89)
 aiir ( 23) /  air ( 74)

chol  (396) / shol  (186)
cheol (172) / sheol (114)

If proportional frequency between similar words means "scribal error," then "dain" is an error for "daiin," "shol" is an error for "chol," "lchedy" is an error for "chedy," and so on for thousands of word pairs throughout the manuscript. The entire vocabulary becomes errors.

These proportional frequencies are explained by the observation that similar words co-occur within the same contexts in the Voynich text. If "shedy" is used on the same pages as "chedy", it is hardly surprising that their word frequencies correspond. This is also what quimqu's recent Levenshtein distance 1 analysis confirmed: similar words show +95% page similarity and +123% paragraph similarity over frequency-matched controls. Similar words appear together — and their frequencies reflect that co-occurrence.

"These observations make it possible to predict the occurrence and the frequency of similarly spelled words. For instance, if it is known that 'chedy' is frequent, it is possible to predict that 'shedy' is also frequently used although less frequently than 'chedy'" (Timm 2014, p. 6).
(16-06-2026, 09:40 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The proportional frequency relationship between "iir" and "iin" that you use as evidence for scribal error is not specific to this pair.

The proportionality is not between "iir" and "iin" but between ir, iir, iiir (1, 2, 3 i) and iin, iiin, iiiin (2, 3, and 4 i):

  ir   607  iin   4109
  iir  167  iiin   173
  iiir   2  iiiin    1

That is, frequency-wise -ir (n+1 is) is much more like -iin (n+2 is) than -in (n+1 is).

Those are the facts.  Scribal error because of visual similarity between in and r is my proposed explanation for those facts.

Quote: Every pair of similar frequently used words shows the same proportional relationship (see Timm 2014):

chedy (501) → lchedy (119) → olchedy (38) → qolchedy (10)
shedy (426) → lshedy ( 42) → olshedy (23) → qolshedy ( 2)

daiin (863) / dain (211)
daiir ( 24) / dair (106)
 aiin (469) /  ain ( 89)
 aiir ( 23) /  air ( 74)

chol  (396) / shol  (186)
cheol (172) / sheol (114)

Sorry, I don't see how those numbers support your claim (that similar words have similar or proportional frequencies).  If anything, they refute it.   For instance, by the numbers above, the ratio chedy : shedy is just ~1.2, but the ratio lchedy : lshedy is ~2.8.  

Quote:These proportional frequencies are explained by the observation that similar words co-occur within the same contexts in the Voynich text.

Even if these claims were correct, the second claim would not be an explanation for the first; it would be another observation, a detailing of it.  

Your proposed explanation is that the VMS text was generated by your proposed copy-and-mutate method, and the alleged proportionalities are a mechanical consequence of that. Well, as I wrote before, I see several problems with that theory.

All the best, --stolfi
Pages: 1 2