The Voynich Ninja

Full Version: Experimental replica of VMS properties with a given corpus
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
(11-04-2019, 11:00 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.@Anton: Sorry, You are not allowed to view links. Register or Login to view. again the corpus.

I did an entropy calculation. If spaces are counted as characters, I get:

Code:
single-char:    4.014
bigram:         7.256
conditional:    3.242

If spaces are just treated as separators, and only characters within words are considered, then:

Code:
single-char:    3.992
bigram:         7.101
conditional:    3.109

Looking at Table 2 on this page:  You are not allowed to view links. Register or Login to view.
the above values match almost exactly with the Latin texts of Pliny and Mattioli.
My calculations:

h0 = 4.70
h1 = 4.01
h2 = 3.24

Same values as Rene's Smile

h0 - h1 = 0.69
h1 - h2 = 0.77

For the VMS (Herbal), h1 - h2 is 1.5 ... 1.9 see here: You are not allowed to view links. Register or Login to view.

@ JKP

Actually it makes sense to take any contemporary text of a similar field (herbal, pharma, alchemy...) and make a transcription of ten thousand characters to have at least some benchmark to begin with
In principle, it should be possible to find a decent OCR of an early printed book with abbreviations. Of course, the OCR will not correctly transcribe the abbreviations, but it could work on "regular" characters [one should edit the abbreviations, hopefully with a semi-automatic process]. The quality of the You are not allowed to view links. Register or Login to view. is terrible, but maybe there are other texts with better scans and better results.

[attachment=2808]


archive.org OCR:
vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#
No I think that's not suitable. A printed book would be something only moderately abbreviated, and we can produce virtually the same result by taking e.g. bi3mw's text and introducing some common abbreviations ourselves. I have done that before and apart from slight increase of entropy there was nothing interesting.

What we need is some heavily abbreviated stuff, which would be a manuscript.
(12-04-2019, 09:13 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....

archive.org OCR:
vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#

That is pretty bad OCR, and it's not even handwritten text.

vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes
vero ca'delabri q' procedebat directe in altu' inferius h'ns tres pedes. Et sursu[m] he'b at sex calamos egredie'tes
vero candelabri qui procedebat directe, etc.
Heavily abbreviated text has a lot of variation in the characters.
Most texts weren't heavily abbreviated (I'm not sure all the scribes even knew all the abbreviations). They used the abbreviations I listed above and sometimes only half of those.

I'm not sure it has to be heavily abbreviated, but I'm definitely interested in hearing what Anton has in mind.
(12-04-2019, 09:47 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.What we need is some heavily abbreviated stuff, which would be a manuscript.

Any Pseudo-Apuleius that would suit you?
How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

A page? 10 pages? 20? 50? 100?
I also quickly made the bigram plot for the "Corpus_2" text:

[attachment=2814]

It is very similar to the other ones for Latin on You are not allowed to view links. Register or Login to view. .
Quote:Any Pseudo-Apuleius that would suit you?

If you ask my personal preference, then I'd be interested in some High German text, not in a Latin one. Something like this: You are not allowed to view links. Register or Login to view.  This one is only moderately abbreviated, but I have seen another MS on e-codices recently, very abbreviated, but can't recall what it was.

Quote:How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

A page? 10 pages? 20? 50? 100?

That's not a question of a number of pages, but rather a number of characters. I think ten thousand would be quite enough. Several tens of thousands are definitely enough.
Pages: 1 2 3 4 5 6 7