The Voynich Ninja
Experimental replica of VMS properties with a given corpus - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Experimental replica of VMS properties with a given corpus (/thread-2737.html)

Pages: 1 2 3 4 5 6 7


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 12-04-2019

To get an idea of the magnitude of the frequency of abbreviations, I have read You are not allowed to view links. Register or Login to view. .

On page 7 is written:

Quote:...
While 46 different abbreviations are used on fol. 11r, there are 62 different abbreviations on fol. 144r. This results from the fact that the text on fol. 144r is 3.25 times longer in terms of  the  number  of  glyphs  than  the  text  on  fol.  11r.  In  other  words, 7.5% of the glyphs on fol. 11r are abbreviations, while there is a similar amount of abbreviations on fol. 144r (7%).
...

Of course it can not be said if this result represents an average.

Edit: Unfortunately, " You are not allowed to view links. Register or Login to view.  " is currently in digitization and therefore can not be viewed yet.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 12-04-2019

Very interesting paper, bi3.

I haven't followed the progress of software development of abbreviation-aware software, so I enjoyed reading it.


.
As for the magnitude of abbreviations, it's true that human-reading and software-reading are quite different. The software has to be able to recognize individual variations, but...

fortunately, in terms of human reading, it's easy for us to recognize patterns. The abbreviation for pre (p with crossed descender or macron) and the abbreviation for pro (p with looped descender) are different but are easily perceived by a human as two flavors of the same idea, abbreviations in the same category.

Similarly, if you learn pre and pro then even if you've never seen or heard about the common q abbreviations, as soon as you see them, it's immediately apparent that they work in approximately the same way.


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 12-04-2019

(12-04-2019, 12:23 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Very interesting paper, bi3.

I haven't followed the progress of software development of abbreviation-aware software, so I enjoyed reading it.
...

Yes, " You are not allowed to view links. Register or Login to view. " is the result of a research project of the Universities of Bremen and Berlin.
It looks like you could download Diptychon You are not allowed to view links. Register or Login to view. after requesting a password ( email ).


RE: Experimental replica of VMS properties with a given corpus - nablator - 12-04-2019

(12-04-2019, 09:47 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.No I think that's not suitable. A printed book would be something only moderately abbreviated, and we can produce virtually the same result by taking e.g. bi3mw's text and introducing some common abbreviations ourselves. I have done that before and apart from slight increase of entropy there was nothing interesting.

What we need is some heavily abbreviated stuff, which would be a manuscript.
Did anyone ever manage to use heavily abbreviated stuff that cannot be printed as input for encryption? If not why do we need it? As far as I know 15th century ciphers use only a few of the most common abbreviations.


RE: Experimental replica of VMS properties with a given corpus - Koen G - 12-04-2019

Nablator: since so many Voynichese glyphs look and behave like abbreviation signs, it may be worth taking into account? 

Thisis probably a stupid suggestion, but might it be possible to take a standard text and artificially mass-introduce abbreviations with find and replace?


RE: Experimental replica of VMS properties with a given corpus - Anton - 12-04-2019

Quote:Did anyone ever manage to use heavily abbreviated stuff that cannot be printed as input for encryption? If not why do we need it? As far as I know 15th century ciphers use only a few of the most common abbreviations.

In the first place, there are researchers who believe that the VMS is exactly that - an input abbreviated in a way that we can't read (yet).

In the second place, we need it for a benchmark. Heavily abbreviated stuff was one of the ways manuscripts were written back then. Instead of that, the VMS has been compared with, and benchmarked against (in terms of character statistics) with King James' Bible, The Origin of the Species, and the like out-of-time texts.

Quote:Thisis probably a stupid suggestion, but might it be possible to take a standard text and artificially mass-introduce abbreviations with find and replace?

As I said, I have done that on a limited scale. I took and introduced 9-s, I think. I believe I have posted the results in the forum.


RE: Experimental replica of VMS properties with a given corpus - nablator - 12-04-2019

(12-04-2019, 05:08 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Nablator: since so many Voynichese glyphs look and behave like abbreviation signs, it may be worth taking into account?
It would be naive to assume that they are abbreviations, despite the resemblances. There are far too many y, g and m are mostly at the end of lines...

The topic is about cryptography: whether the input text is abbreviated or not does not matter. Enciphering maps symbols to other symbols, whatever they are, and for that they have to be printable in the first place. Some (very rare) printed books managed to keep most of the "heavy" abbreviated Latin intact.


RE: Experimental replica of VMS properties with a given corpus - Anton - 12-04-2019

Yes actually we have wondered a bit off-topic (my fault), but I can't agree that the result does not depend on whether the input is abbreviated or not. It could depend on that. That, in turn, depends on the exact enciphering technique. That is best explained by the example of a simple substitution cipher, where the result is a 1-to-1 mapping. Of course, if two plain texts (abbreviated and non-abbreviated) differ in properties, then ciphertexts will differ accordingly.


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 13-04-2019

(12-04-2019, 11:45 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.To get an idea of the magnitude of the frequency of abbreviations, I have read You are not allowed to view links. Register or Login to view. .

On page 7 is written:

Quote:...
While 46 different abbreviations are used on fol. 11r, there are 62 different abbreviations on fol. 144r. This results from the fact that the text on fol. 144r is 3.25 times longer in terms of  the  number  of  glyphs  than  the  text  on  fol.  11r.  In  other  words, 7.5% of the glyphs on fol. 11r are abbreviations, while there is a similar amount of abbreviations on fol. 144r (7%).
...

Of course it can not be said if this result represents an average.

   

The early printed Nuremberg Chronicle fragment I posted above appears to be similar in this respect: 23/270=8.5% of the glyphs are abbreviations. I guess that a highly abbreviated Latin text should have >10% abbreviations, maybe as high as 20%?


RE: Experimental replica of VMS properties with a given corpus - Anton - 13-04-2019

To consider the amount of abbreviation as the percentage of glyphs does not seem to me methodologically correct. What about ligatures? What about the fact that average word length is different for different languages? The purpose of abbreviation is to make things shorter. So I would calculate the compression (contraction) ratio instead.