The Voynich Ninja
Experimental replica of VMS properties with a given corpus - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Experimental replica of VMS properties with a given corpus (/thread-2737.html)

Pages: 1 2 3 4 5 6 7


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 02-12-2019

Hi @Marco, would you estimate that the sum of the abbreviations is more or less 7.5% of the total text ?


RE: Experimental replica of VMS properties with a given corpus - Koen G - 02-12-2019

Thanks, Marco! I don't want to imagine how much work it was to get these results.

The TTR difference is not huge, but still larger than I would have expected - it definitely has an effect. 

When I saw the shifted graph, my reaction was the same as your remark that the correction is likely excessive. The transformation is from raw transcription towards normalized text, but we don't quite know what an EVA text would look like if it were normalized. Or to what extent we are normalizing it already in the first place?

Still, it's good to know the kind of shift normalization causes.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 02-12-2019

(02-12-2019, 12:31 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view....
[Image: attachment.php?aid=3756]
...

I agree that these may be different. One of my transcripts records the differences (and uses a font that shows them).


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 02-12-2019

(02-12-2019, 12:58 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Hi @Marco, would you estimate that the sum of the abbreviations is more or less 7.5% of the total text ?

Hi Matthias, there are different ways to answer your question, I hope one of these is what you mean:
Code:
.                            abbreviated  non-abbreviated  %
abbreviated words in ms           1344          790      63.0% (of total)
abbreviation characters in ms     1661         9571      14.8% (of total)
ms length vs printed length      11232        13048      86.1%

14.8% of the characters correspond to abbreviation marks, rather than symbols of the Latin alphabet.
The abbreviated text is 13.9% shorter than the printed text. So it seems that averagely a single abbreviation character stands for two ordinary characters. Also, much of abbreviating actually is about saving horizontal space by representing characters as superscripts: if one writes ī ('i' with a macron, i_ in my notation) for "in", two symbols are still used, but now they appear one above the other, instead than one after the other. In other words, I expect saved horizontal space  to be considerably more than 14%.

(02-12-2019, 02:15 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.When I saw the shifted graph, my reaction was the same as your remark that the correction is likely excessive. The transformation is from raw transcription towards normalized text, but we don't quite know what an EVA text would look like if it were normalized. Or to what extent we are normalizing it already in the first place?

Hi Koen,
I would add that VMS values should probably be shifted by a smaller amount because they are smaller than Bonaventura's values: e.g. I could have computed the shift as a percentage, rather than an absolute value. The shifted graph is just a way to visually present what the problem is and of course must not be taken too seriously.
As you say, we don't know much about how normalized Voynichese could look like. This subject is somehow related to the recent ch/sh discussion started by Davidsch. Also, I wonder if quasi-reduplication was really meant to be the same as reduplication (of course assuming that there is some meaning somewhere). I doubt we can say much about these problems without fully understanding the text.
I am more optimistic about the fact that EVA does not add much spurious normalization (i.e. TTR reduction) to the original text; conversely it's almost sure that all Voynich transcriptions contain many hundreds of errors. The number of uncertain spaces in the Zandbergen-Landini transcription is a good indication of how serious the problem is.

________________

About the effort for this transcription, I should probably refrain from complaining: the number of characters in these ten pages is not much more than 5% of the VMS. Others have done much more work than I did! Since I typically work on texts that I am trying to understand, it could be that knowing the meaning beforehand made things more tedious than usual, taking away the pleasure of "discovery". Another side of the problem is that I defined the encoding system as I went, changing my mind a few times about some details. Having a clear transcription system in mind before one starts would certainly be more efficient.


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 02-12-2019

@Marco: I meant the abbreviations pro rata in the manuscript (14.8%) . Thank you for the list !


RE: Experimental replica of VMS properties with a given corpus - Koen G - 02-12-2019

(02-12-2019, 04:44 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.it could be that knowing the meaning beforehand made things more tedious than usual, taking away the pleasure of "discovery". Another side of the problem is that I defined the encoding system as I went, changing my mind a few times about some details. Having a clear transcription system in mind before one starts would certainly be more efficient.

If you're up for it you could always repeat the exercise on a new section of the Trinity herbal  Wink


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 04-12-2019

An example of how TTR increases because different occurrences of the same word are abbreviated differently. Here "secundum" is abbreviated as:
  • scdm (with a macron)
  • s;m (using a different abbreviation symbol and only writing the first and last characters)
  • 2m (similar to the English 2nd)



RE: Experimental replica of VMS properties with a given corpus - -JKP- - 04-12-2019

Yes, very true. Sometimes the same word will be abbreviated three different ways in the same line.


RE: Experimental replica of VMS properties with a given corpus - nickpelling - 04-12-2019

(02-12-2019, 10:58 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.Here's the sample I posted. This was generated in about 15 minutes, but I could probably get closer to Voynichese if I took more time:

Word initial ii-, yal-, yor-, eeol- and qool- (that each appear in this sample) are all rarities in the "real thing", as are word initial yar- and yol- . Note that qoor- appears slightly more often than qool- [8 times rather than 5 times], but they're both pretty rare.

In short, word initial y- seems (as so many other Voynichese glyphs) to have its own idiosyncratic set of rules that govern what glyphs immediately follow it. But that's probably a topic for another thread. ;-)


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 10-12-2019

You are not allowed to view links. Register or Login to view. has diplomatic transcriptions of eight different manuscripts of Chaucer's work (selectable by the short IDs on the top right). This looks like a useful resource to get a grasp on the impact of scribal preferences.

A few examples for the first lines of the Reeve's Tale:

Hg:
AT Trompyngtoū / nat fer fro Cantebrygge
Ther gooth a brook / and ouer that a brygge
Vp on the which brook ther stant a Melle

El:
AT Trumpyngtoū / nat fer fro Cantebrigge
Ther gooth a brook / and ouer that a brigge
Vp on the which brook / ther stant a Melle

Cp:
AT Trumpyngtoū nought fer fro Cantabregge
Ther goþ a brook and ouer þat a bregge
Vppon þe whiche brook þer stant a Melle

Ha4:
AT Trompyngtoū nat fer fro Cantebrigge
Ther goth a brook and ouer þat a brigge
Vpon þe whiche brook þer stant a melle