The Voynich Ninja
Character Classes - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Character Classes (/thread-762.html)

Pages: 1 2 3 4 5 6


Character Classes - Emma May Smith - 19-09-2016

Some characters in the Voynich script behave like one another. For example, an instance of [ch] can usually be replaced with [sh]. We could consider them to be members of the same 'class' of characters. We might think of numerous tests by which such similarity could be measured. What are the character classes that have been proposed, or we could agree on?

The characters [ch, sh] look alike and act alike. Stolfi placed [ee] with them, but I'm not certain that is valid because they cannot always replace one another.

The 'gallows' characters, which themselves can be subdivided into smaller classes dependent on the presence of a bench and/or two legs. This is the most commonly referred to group, but is it really valid?

Stolfi suggested the 'dealers', [d, l, r, s], which often occur in similar contexts. Within that class I would suggest that [d, s] and [l, r] are natural subgroups.

Stolfi also suggested the 'circles', [a, o, y], which in his theory were highly mobile. I think that if we consider [a] to be a variant of [y], then [y/a, o] make a fairly coherent class of character.

I think that being able to see characters as members of classes helps in analyzing the text because it gives us another level of insight. If we take some or all of the classes mentioned above as valid, we can see that words such as [oteal] and [ykeeor] have some structural similarity.


RE: Character Classes - ThomasCoon - 19-09-2016

I agree with you on almost all of these. My research in the VMS centers around copying the text by hand (I have 100 pages copied) so I see these patterns.

I think the Gallows make up a group because of the patterns of letter around them - they may even be the same character:

[k] [t] [p] [f]

[ok] [ot] [op] [of]

[ko] [to] [po] [fo]

[yk] [yt] [yp] [yf?]

[kch] [tch] [pch] [fch]

[ckh] [cth] [cph] [cfh]

[ke] [te] [pe] [fe]

(At the same time, there is nothing in the text like: [kr] [tr] [pr] [fr], or [kl] [tl] [pl] [fr], [kiiin], [tiin], etc.)


So these four appear in the same set of combinations and also don't appear in the same set of combinations.

Mary D'imperio says that the "conclusion seems inescapable" that [p] = [f] because they correspond in the famous repeating strings on page f57r.


Also, one thing I wonder about: [ot] and [ok] appear frequently, but so do [yk] and [yt]:
I've been wondering they are the same, and the author uses two variants to disguise the text.


RE: Character Classes - Emma May Smith - 19-09-2016

There must be distinction between the gallows group [k, t] and [f, p]. The latter are much more restricted in where they appear within the text and almost never take [e] directly after them.


RE: Character Classes - ThomasCoon - 19-09-2016

(19-09-2016, 03:39 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.There must be distinction between the gallows group [k, t] and [f, p]. The latter are much more restricted in where they appear within the text and almost never take [e] directly after them.

It is true that there are restrictions in where they appear, but that distinction may be wholly orthographic (an ornamentation for the first lines in paragraphs). If not, is there a linguistic explanation why a phoneme could appear commonly in the first ~10 words uttered and then only sporadically afterwards?

You're right about [pe] and [fe]. I didn't notice that. If [ch] = [ee], that could solve the problem, but I agree that's not proven.

Also, another popular "environment" for these letters:
[okch] (613 occurrences)
[otch] (590 occurrences)
[opch] (359 occurrences)
[ofch] (83 occurrences)


RE: Character Classes - Emma May Smith - 19-09-2016

Surely the different placements of [k, t] and [f, p] is a reason to put them in slightly different classes? The whole idea is to break down the script into smaller groups according to characters which work most similarly. So while all gallows are similar, there needs to be a subdivision within that group.

One of the potentially useful things from classification of characters is judging assigned values of different theories. So we might expect [ch, sh] to have similar values, and [k, t] too. The characters [f, p] would need to have some slightly bigger difference between them and [k, t].


RE: Character Classes - ThomasCoon - 19-09-2016

(19-09-2016, 07:51 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Surely the different placements of [k, t] and [f, p] is a reason to put them in slightly different classes? The whole idea is to break down the script into smaller groups according to characters which work most similarly. So while all gallows are similar, there needs to be a subdivision within that group.

One of the potentially useful things from classification of characters is judging assigned values of different theories. So we might expect [ch, sh] to have similar values, and [k, t] too. The characters [f, p] would need to have some slightly bigger difference between them and [k, t].

Ah, I'm sorry, I misinterpreted - if your aim is to break up the Voynich script, then I agree that [p] and [f] do unique things. This doesn't automatically conclude that they should have different spoken values than [t] and [k], but that is a possibility also.


RE: Character Classes - Diane - 19-09-2016

I don't know if members have heard of Philip Neal's Regex.

Perhaps there is some way to find the original posting to Santacoloma's mailing list, but otherwise, I wrote a brief note about it.  Neal's observations have always proven enduring; I expect the same will be true of this.

btw - when I said I wasn't sure if it was his original view, I was being cautious - because I hadn't asked.  Neal's manners on such things are impeccable - no possible cause to doubt him, himself!

You are not allowed to view links. Register or Login to view.


RE: Character Classes - ReneZ - 20-09-2016

This had escaped my attention. I should add it to the relevant page at my site.

A similar, but not identical, regular expression is described by Philip on  You are not allowed to view links. Register or Login to view. .
That one is specific for the B language.

One interesting aspect of such a regexp is that it could generate the binomial distribution
of word type length, as described by Stolfi.


RE: Character Classes - MarcoP - 20-09-2016

Philip Neal Wrote:The regular expression is something like
^[qd_][aoy_][lr_][ktpfKTPF_][CS_][eE_][d][ao_][lrmn_][y_]

(C =ch; S =sh; E =ee; KTPF = the complex gallows)
...
In other words, you chose any one character from each set in square brackets and rewrite the_ as zero, for instance qo_k_Ed__y -> qokeedy. The null character _ can occur anywhere....

If the null character can appear anywhere, should the regex have [d_] instead of [d]?
As it is written, the regex cannot generate Voynichese words that don't contain "d" (actually, the majority of the words).

If I understand correctly, the regex also misses the words in which "ch" or "sh" occur directly or indirectly before one of the gallows (about 20% of the words).

It would be interesting to see how much of Voynichese can be captured with such a simple expression....


In general, I think the problem of defining word structure (similarly to Philip Neal's regex) is contiguous but not identical to the identification of "character classes". For instance, in Latin the phonetically similar "n" and "m" have very different positional statistics. The two letters have roughly the same number of occurrences, but "n" appears as the last letter in about 1% of the words that contain it, while "m" appears as the last letter in about 50% of the words with at least an "m".


RE: Character Classes - Koen G - 20-09-2016

Marco, you are if course right about the ď, that must have been a typo. 

Perhaps the addition of a relatively small number of brackets could fix the regex.

If such an expression is found that captures, say, 95% of words, this could also be used to test deviant words and where they occur. I bet in labels Smile

What you say about Latin m and n is very interesting. I wouldn't have thought that such extreme statistics were found in Latin...