The Voynich Ninja

Full Version: Voynichese Verifier
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
If some computer program were created, call it the 'Voynichese Verifier'

It would quantify how much some text was comparable to voynichese[#1].

what would be the minmum requirements for the code to check ?

-check all words against voynichese dictionary
-entropy levels

What else?

Note#1
Obviously it would not be 100% as voynichese is an unknown but it would be consistent and work as a rough guide
For instance, if you wanted to compare the output of Timm&Schinners 'self-citation' method Vs Mike Roes generic word.
- average word length
Do you mean for the text to be "comparable to Voynichese" words, or Voynichese text?

If the latter, I'd add the alliteration effects, as well as various line patterns, i.e the behaviour we see that depends on the position of the word in both the line and the paragraph.  But the latter is not always easy to compare, since there are differences between scribes, and between quires that seem to be by the same scribe.
Ah yes, i meant Voynichese text like a block of 400 or so vords.

alliteration you mean like -Jackson sequences or quasi-repetition-  yep ( the code to check for that...phew...monster )
Line patterns , like line-start, paragraph-start vords/glyphs ? Fair point ( maybe a bit much for Version 0.1 Undecided )
(27-09-2022, 07:15 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.If some computer program were created, call it the 'Voynichese Verifier'

It would quantify how much some text was comparable to voynichese[#1].

what would be the minmum requirements for the code to check ?

-check all words against voynichese dictionary
-entropy levels

What else?

Note#1
Obviously it would not be 100% as voynichese is an unknown but it would be consistent and work as a rough guide
For instance, if you wanted to compare the output of Timm&Schinners 'self-citation' method Vs Mike Roes generic word.

I can describe what I have used, without claiming that it is the best method.

I have computed the similarity of bigram distributions. These are basically the (approx.) 25 * 25 numbers that make up the colourful square pictures on You are not allowed to view links. Register or Login to view. , and which add up to one. Works by far the best if spaces are included in the frequencies. (Word length distribution is then also partially captured).

For the similarity, I used the You are not allowed to view links. Register or Login to view., which is between 0 and 1, where higher values mean greater similarity. The characters have to be sorted from high to low individual character frequency. One can force the space character to be in the first position, but usually this is where it ends up anyway.

This is sensitive enough to notice the differences between the Currier languages and their dialects.
Hi Rene,
years ago I experimented with Bhattacharyya distance (DB on You are not allowed to view links. Register or Login to view.) but I dropped it because it falls in the 0...infinite range. On the other hand, the Bhattacharyya Coefficient (BC) fits your description, so I guess this is what you are referring to: range 0...1 "where higher values mean greater similarity".



Ideally, trigrams could add something, for instance, by including spaces as suggested by Rene, one could measure the high frequency of y.q and other word-boundary effects. If one wanted to include line-boundary effects, the approach proposed by Rene could be extended to include a symbol for Line Breaks.

One could also consider the MATTR measures You are not allowed to view links. Register or Login to view.. These of course require a sufficient number of words, but the most peculiar feature of voynichese is computed on very small windows (w=5) and 400 words should be enough for that.

For whatever measure, there is the problem mentioned by Tavie of the the different "dialects", e.g. Currier A, Currier B, pharmacese (the small-plants aka pharma section belongs to A but has a distinctive high frequency of EVA:eol) etc. One should compare the candidate text with the individual dialects, but the boundaries between the dialects are blurred. A possibility is picking sections where both the illustrations and the scribe are homogeneous (e.g. "Herbal A", "pharma", "Q13").
(28-09-2022, 08:48 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.For whatever measure, there is the problem mentioned by Tavie of the the different "dialects", e.g. Currier A, Currier B, pharmacese (the small-plants aka pharma section belongs to A but has a distinctive high frequency of EVA:eol) etc.
There is a similar problem inside Currier A when it is analyzed by pages. Two samples.
The frecuency of You are not allowed to view links. Register or Login to view. is very low or even null in some pages like in You are not allowed to view links. Register or Login to view. and f36r, and very high in other pages like You are not allowed to view links. Register or Login to view. and f31v.
The frecuency of 89 You are not allowed to view links. Register or Login to view.. None in f25v, very common in You are not allowed to view links. Register or Login to view. and f26v.
(02-10-2022, 07:18 PM)Juan_Sali Wrote: You are not allowed to view links. Register or Login to view.There is a similar problem inside Currier A when it is analyzed by pages. Two samples.
The frecuency of You are not allowed to view links. Register or Login to view. is very low or even null in some pages like in You are not allowed to view links. Register or Login to view. and f36r, and very high in other pages like You are not allowed to view links. Register or Login to view. and f31v.
The frecuency of 89 You are not allowed to view links. Register or Login to view.. None in f25v, very common in You are not allowed to view links. Register or Login to view. and f26v.

It seems you have been misled by the fact that, in the herbal, A and B bifolios have been mixed in an apparently random way. The differences you point out are not "inside Currier A", they are some of the differences between A and B. You are not allowed to view links. Register or Login to view.; f31r, f31v, f26r, You are not allowed to view links. Register or Login to view. are You are not allowed to view links. Register or Login to view., while f14v, You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. belong to A. According to Lisa Fagin Davis' analysis, the scribes are also different: scribe2 for f31 and f26, scribe1 for f14, f36 and f25.
I've been going through some 15th C heraldic texts recently. I can't read that stuff, maybe one in a hundred can be validated. [Bregenz is unusual.] One of the few things that works, however, is the use of the German 'von'. It's about *how* the word was written. In some texts it is 'vo' with a horizonal bar above the 'o', as an abbreviation. On other examples, the final curve of the 'n' descends well below the line. And in others again, the final 'n' rises like the Voynichese 'am'.

They are only the same word because they *can be read*, not because of their similar orthography.

So, if the VMs 'languages' are correlated to the 'scribes', then some of the differences in the languages are potentially either due to spelling and/or orthography. Is that how things look??
(03-10-2022, 08:29 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.[Bregenz is unusual.]

unusual ?
Bregenz is a City 80 km from me.
You are not allowed to view links. Register or Login to view.

and alemannisch:
You are not allowed to view links. Register or Login to view.
Pages: 1 2