The Voynich Ninja
Experimental replica of VMS properties with a given corpus - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Experimental replica of VMS properties with a given corpus (/thread-2737.html)

Pages: 1 2 3 4 5 6 7


RE: Experimental replica of VMS properties with a given corpus - ReneZ - 12-04-2019

(11-04-2019, 11:00 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.@Anton: Sorry, You are not allowed to view links. Register or Login to view. again the corpus.

I did an entropy calculation. If spaces are counted as characters, I get:

Code:
single-char:    4.014
bigram:         7.256
conditional:    3.242

If spaces are just treated as separators, and only characters within words are considered, then:

Code:
single-char:    3.992
bigram:         7.101
conditional:    3.109

Looking at Table 2 on this page:  You are not allowed to view links. Register or Login to view.
the above values match almost exactly with the Latin texts of Pliny and Mattioli.


RE: Experimental replica of VMS properties with a given corpus - Anton - 12-04-2019

My calculations:

h0 = 4.70
h1 = 4.01
h2 = 3.24

Same values as Rene's Smile

h0 - h1 = 0.69
h1 - h2 = 0.77

For the VMS (Herbal), h1 - h2 is 1.5 ... 1.9 see here: You are not allowed to view links. Register or Login to view.

@ JKP

Actually it makes sense to take any contemporary text of a similar field (herbal, pharma, alchemy...) and make a transcription of ten thousand characters to have at least some benchmark to begin with


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 12-04-2019

In principle, it should be possible to find a decent OCR of an early printed book with abbreviations. Of course, the OCR will not correctly transcribe the abbreviations, but it could work on "regular" characters [one should edit the abbreviations, hopefully with a semi-automatic process]. The quality of the You are not allowed to view links. Register or Login to view. is terrible, but maybe there are other texts with better scans and better results.

   


archive.org OCR:
vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#


RE: Experimental replica of VMS properties with a given corpus - Anton - 12-04-2019

No I think that's not suitable. A printed book would be something only moderately abbreviated, and we can produce virtually the same result by taking e.g. bi3mw's text and introducing some common abbreviations ourselves. I have done that before and apart from slight increase of entropy there was nothing interesting.

What we need is some heavily abbreviated stuff, which would be a manuscript.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 12-04-2019

(12-04-2019, 09:13 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....

archive.org OCR:
vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#

That is pretty bad OCR, and it's not even handwritten text.

vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes
vero ca'delabri q' procedebat directe in altu' inferius h'ns tres pedes. Et sursu[m] he'b at sex calamos egredie'tes
vero candelabri qui procedebat directe, etc.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 12-04-2019

Heavily abbreviated text has a lot of variation in the characters.
Most texts weren't heavily abbreviated (I'm not sure all the scribes even knew all the abbreviations). They used the abbreviations I listed above and sometimes only half of those.

I'm not sure it has to be heavily abbreviated, but I'm definitely interested in hearing what Anton has in mind.


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 12-04-2019

(12-04-2019, 09:47 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.What we need is some heavily abbreviated stuff, which would be a manuscript.

Any Pseudo-Apuleius that would suit you?


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 12-04-2019

How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

A page? 10 pages? 20? 50? 100?


RE: Experimental replica of VMS properties with a given corpus - ReneZ - 12-04-2019

I also quickly made the bigram plot for the "Corpus_2" text:

   

It is very similar to the other ones for Latin on You are not allowed to view links. Register or Login to view. .


RE: Experimental replica of VMS properties with a given corpus - Anton - 12-04-2019

Quote:Any Pseudo-Apuleius that would suit you?

If you ask my personal preference, then I'd be interested in some High German text, not in a Latin one. Something like this: You are not allowed to view links. Register or Login to view.  This one is only moderately abbreviated, but I have seen another MS on e-codices recently, very abbreviated, but can't recall what it was.

Quote:How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

A page? 10 pages? 20? 50? 100?

That's not a question of a number of pages, but rather a number of characters. I think ten thousand would be quite enough. Several tens of thousands are definitely enough.