The Voynich Ninja
Experimental replica of VMS properties with a given corpus - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Experimental replica of VMS properties with a given corpus (/thread-2737.html)

Pages: 1 2 3 4 5 6 7


Experimental replica of VMS properties with a given corpus - bi3mw - 11-04-2019

For some time I have been dealing with a fundamental question. Have you ever tried to use a ( medieval ) corpus to generate text that has similar characteristics to the VMS using cryptographic methods? The horse is saddled here, so to speak, from the other side. I mean no random text, but text that can be decrypted again (meaningful). For example, is it possible to replicate the steep curve of word lengths ? Are immediate,multiple word repetitions possible ?

It is therefore a matter of experimentally replicating the known properties of the VMS with a given corpus ( modeling ). Does that make sense in your opinion or is that too experimental ?


RE: Experimental replica of VMS properties with a given corpus - nablator - 11-04-2019

Quote:Have you ever tried to use a ( medieval ) corpus to generate text that has similar characteristics to the VMS using cryptographic methods?
Yes. Generating encrypted pseudo-Voynichese is very helpful, not only to reassure oneself that the method seems to works, but also to create data for decryption: the result can be checked easily, as the plaintext is known. Tongue


RE: Experimental replica of VMS properties with a given corpus - Koen G - 11-04-2019

So... which steps are certainly required to do it?


RE: Experimental replica of VMS properties with a given corpus - Anton - 11-04-2019

I haven't seen even statistical calculations across medieval corpuses of comparable topics. Say, take an early 15th c. German pharmacopaeia and calculate entropies.

This is due to absence of transcriptions, of course.


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 11-04-2019

I always use this, cleaned extract from "Regimen sanitatis Magnini Mediolanensis (vol. 1)". Thematically quite appropriate, but an excerpt from a composite manuscript would of course be better Wink

You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 11-04-2019

Quote:bi3mw: Have you ever tried to use a ( medieval ) corpus to generate text that has similar characteristics to the VMS using cryptographic methods?

Yes.

The sample I posted in René's thread about generating Voynichese was an attempt (a quick one) to create text using a repeatable, explainable system that resembles Voynichese as closely as possible (I think I could get a little closer if I took more time to polish it), and I'm fairly sure the sample I chose to encrypt was from a medieval text.

Here's the sample I posted. This was generated in about 15 minutes, but I could probably get closer to Voynichese if I took more time:

[Image: attachment.php?aid=2722]

There is one thing in particular that I was not able to recreate in such a short time that gives away that it is not perfect Voynichese, but since it is natural language and it can be readily decrypted, perhaps it's good enough for the purposes of example.


RE: Experimental replica of VMS properties with a given corpus - Anton - 11-04-2019

Hi bi3mw,

This sample still needs some normalization - such as to remove linebreaks and convert all to lowercase (unless we want to count upper and lower as different characters).

If you are interested, I can calculate the entropy values tomorrow, I have Matlab scripts for that. (I could do today, but it's late and I'm going to sleep Smile )

But I don't expect anything extraordinary here, since the excerpt is already expanded into "normal" Latin. What would be more characteristical and authentic would be to use abbreviated Latin or German example, since that was the way the stuff was written back then.


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 11-04-2019

@Anton: Sorry, You are not allowed to view links. Register or Login to view. again the corpus.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 11-04-2019

(11-04-2019, 10:39 PM)Anton Wrote: You are not allowed to view links. Register or Login to view....


But I don't expect anything extraordinary here, since the excerpt is already expanded into "normal" Latin. What would be more characteristical and authentic would be to use abbreviated Latin or German example, since that was the way the stuff was written back then.

Yes, good point. It might be a good idea to have both an expanded and abbreviated version in each language.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 12-04-2019

bi3, I've been thinking about Anton's point about abbreviated text.

If there were an alternate text with abbreviations, which ones should they be?
  • There were zillions of abbreviations.
  • The VMS seems more consistent than inconsistent (there are many common patterns repeated throughout the manuscript).
  • Not all scribes abbreviated heavily.

So, my thought was, there were certain abbreviations that almost everyone used. Maybe it would be helpful to have a list of those.


The extremely common ones in numerous languages were
  • y (con- com- -us -um)
  • p with a strike through the descender (per-)
  • p with a loop through the descender (pro-)
  • -by (usually stood for -bus)
  • dz or bz (rotated "m" that looks like zee usually stood for -em or -um and its  homonyms and lazy scribes sometimes used it for -rum)
  • 4 ("-rum" symbol is actually an "r" with a fancy tail but looks a bit like a modern 4 with an open top)
  • a symbol that looks like 2 (usually superscripted) represented -ur or -tur
  • k (abbreviation for Item in Latin, Italian, German, Czech, and English but sometimes meant other things in other languages)
  • -ris/-tis/-cis/-gis - this is actually just an -is symbol, a loop with a tail, added to various letters. They resemble VMS g and m.
  • The "smoke" symbol (a wiggly, vertical, or hooked macron) usually represented re/er/ir/ri (in early medieval manuscripts this was drawn like an upside-down EVA-l in later medieval, it was like a smoke symbol, a straight macron, or a curved macron, depending on scribe)
  • The macron, or curved macron indicated missing letters. There was usually no difference in meaning between straight or curved macrons, scribes did whatever was comfortable to write. The macron most often represented m or n but it could be almost anything, including multiple letters. It could also cut through ascenders.
  • The tail was in a sense a connected macron. If the missing letters were near the end of the word, it was easier to draw a tail rather than lifting the pen and drawing a line over the letters.
The Latin language also included the following common ones
  • single-or double-character abbreviations for the very common "q" words like qui, quo, quodam, quomodo, quibus, etc., which were usually a q with a strike-through or loop through the stem, a small "o" or the common bz (-bus) ending
Spanish very often included a c or co abbreviation for con (this sometimes shows up in other languages, as well).

.
Even if a scribe used abbreviations sparingly, they usually used the ones above. In fact, y was so common, it was sometimes added to the end of the alphabet in pen tests. Note that it was added to the end of the alphabet on VMS You are not allowed to view links. Register or Login to view. at the bottom of the column text.

There is also evidence in the VMS, that whoever designed appears to have been familiar with 9, m g, macrons, and tail abbreviations (even if they mean something different in the VMS).


This might seem like a lot of abbreviations, but it's actually only a dozen or so that most scribes used fairly consistently even if they didn't use the others.