• Experimental replica of VMS properties with a given corpus
  • RE: Experimental replica of VMS properties with a given corpus

    ReneZ > 12-04-2019, 08:44 AM

    (11-04-2019, 11:00 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.@Anton: Sorry, You are not allowed to view links. Register or Login to view. again the corpus.

    I did an entropy calculation. If spaces are counted as characters, I get:

    Code:
    single-char:    4.014
    bigram:         7.256
    conditional:    3.242

    If spaces are just treated as separators, and only characters within words are considered, then:

    Code:
    single-char:    3.992
    bigram:         7.101
    conditional:    3.109

    Looking at Table 2 on this page:  You are not allowed to view links. Register or Login to view.
    the above values match almost exactly with the Latin texts of Pliny and Mattioli.
  • RE: Experimental replica of VMS properties with a given corpus

    Anton > 12-04-2019, 09:07 AM

    My calculations:

    h0 = 4.70
    h1 = 4.01
    h2 = 3.24

    Same values as Rene's Smile

    h0 - h1 = 0.69
    h1 - h2 = 0.77

    For the VMS (Herbal), h1 - h2 is 1.5 ... 1.9 see here: You are not allowed to view links. Register or Login to view.

    @ JKP

    Actually it makes sense to take any contemporary text of a similar field (herbal, pharma, alchemy...) and make a transcription of ten thousand characters to have at least some benchmark to begin with
  • RE: Experimental replica of VMS properties with a given corpus

    MarcoP > 12-04-2019, 09:13 AM

    In principle, it should be possible to find a decent OCR of an early printed book with abbreviations. Of course, the OCR will not correctly transcribe the abbreviations, but it could work on "regular" characters [one should edit the abbreviations, hopefully with a semi-automatic process]. The quality of the You are not allowed to view links. Register or Login to view. is terrible, but maybe there are other texts with better scans and better results.

       


    archive.org OCR:
    vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
    Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
    baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#
  • RE: Experimental replica of VMS properties with a given corpus

    Anton > 12-04-2019, 09:47 AM

    No I think that's not suitable. A printed book would be something only moderately abbreviated, and we can produce virtually the same result by taking e.g. bi3mw's text and introducing some common abbreviations ourselves. I have done that before and apart from slight increase of entropy there was nothing interesting.

    What we need is some heavily abbreviated stuff, which would be a manuscript.
  • RE: Experimental replica of VMS properties with a given corpus

    -JKP- > 12-04-2019, 09:51 AM

    (12-04-2019, 09:13 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view....

    archive.org OCR:
    vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes 
    Ocbaftili.tresci: pno latcre-^r eres ei allo^d^ furiusobUq.^uoufcB Etmgcret ad alttmdxnebafhCJln 
    baftiU to quamwz qpbi infUr mios.l^uos aUd pom^s oicnt.5lta ® boo apbi pofm coira almfa#

    That is pretty bad OCR, and it's not even handwritten text.

    vero cadebbtl^ pnacccebat birccte in alm mfenus bns trespcdcs.£t farfsj bSiat fcjrcalamos Cijrcdietes
    vero ca'delabri q' procedebat directe in altu' inferius h'ns tres pedes. Et sursu[m] he'b at sex calamos egredie'tes
    vero candelabri qui procedebat directe, etc.
  • RE: Experimental replica of VMS properties with a given corpus

    -JKP- > 12-04-2019, 09:58 AM

    Heavily abbreviated text has a lot of variation in the characters.
    Most texts weren't heavily abbreviated (I'm not sure all the scribes even knew all the abbreviations). They used the abbreviations I listed above and sometimes only half of those.

    I'm not sure it has to be heavily abbreviated, but I'm definitely interested in hearing what Anton has in mind.
  • RE: Experimental replica of VMS properties with a given corpus

    MarcoP > 12-04-2019, 10:31 AM

    (12-04-2019, 09:47 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.What we need is some heavily abbreviated stuff, which would be a manuscript.

    Any Pseudo-Apuleius that would suit you?
  • RE: Experimental replica of VMS properties with a given corpus

    -JKP- > 12-04-2019, 10:44 AM

    How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

    A page? 10 pages? 20? 50? 100?
  • RE: Experimental replica of VMS properties with a given corpus

    ReneZ > 12-04-2019, 10:58 AM

    I also quickly made the bigram plot for the "Corpus_2" text:

       

    It is very similar to the other ones for Latin on You are not allowed to view links. Register or Login to view. .
  • RE: Experimental replica of VMS properties with a given corpus

    Anton > 12-04-2019, 11:01 AM

    Quote:Any Pseudo-Apuleius that would suit you?

    If you ask my personal preference, then I'd be interested in some High German text, not in a Latin one. Something like this: You are not allowed to view links. Register or Login to view.  This one is only moderately abbreviated, but I have seen another MS on e-codices recently, very abbreviated, but can't recall what it was.

    Quote:How much text do you feel it should be (this is a question for everyone) to be useful in terms of comparison?

    A page? 10 pages? 20? 50? 100?

    That's not a question of a number of pages, but rather a number of characters. I think ten thousand would be quite enough. Several tens of thousands are definitely enough.