The Voynich Ninja
Experimental replica of VMS properties with a given corpus - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Experimental replica of VMS properties with a given corpus (/thread-2737.html)

Pages: 1 2 3 4 5 6 7


RE: Experimental replica of VMS properties with a given corpus - ChenZheChina - 26-04-2019

(12-04-2019, 01:28 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(12-04-2019, 12:23 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Very interesting paper, bi3.

I haven't followed the progress of software development of abbreviation-aware software, so I enjoyed reading it.
...

Yes, " You are not allowed to view links. Register or Login to view. " is the result of a research project of the Universities of Bremen and Berlin.
It looks like you could download Diptychon You are not allowed to view links. Register or Login to view. after requesting a password ( email ).

Hi bi3mw,

This tool seems very interesting.

Is it possible to apply it on VMS so that it automatically extracts every letter?


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 26-04-2019

Hi @ChenZhe,

I wanted to register for download, but no confirmation came back. As I know so far the recognition of single letters should be possible. One should take a closer look at the manual to be sure.


RE: Experimental replica of VMS properties with a given corpus - ChenZheChina - 26-04-2019

(26-04-2019, 10:43 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Hi @ChenZhe,

I wanted to register for download, but no confirmation came back. As I know so far the recognition of single letters should be possible. One should take a closer look at the manual to be sure.

I mailed them, too, waiting for their response of password.


RE: Experimental replica of VMS properties with a given corpus - nablator - 26-04-2019

I managed to compile and run the source from You are not allowed to view links. Register or Login to view. but the example project does not load. No luck with a new project either. Maybe something is missing. Anyway it seems to be an OCR with built-in Latin shapes, not suited to Voynichese.

   

   


RE: Experimental replica of VMS properties with a given corpus - bi3mw - 10-06-2019

If someone is still interested in Diptychon for other purposes, I have sent a request to another address and received the password for the Windows installer (Dr. Björn Gottfried). Just send me a PM for the email adress.


RE: Experimental replica of VMS properties with a given corpus - MarcoP - 02-12-2019

During the last couple of months or so, I have been slowly working on transcribing part of an abbreviated Latin manuscript, trying to preserve as much of the abbreviations as I could. The experiment is based on St. Gallen, Stiftsbibliothek You are not allowed to view links. Register or Login to view., in particular the Soliloquium by Bonaventura, which starts at p.181.


I encountered a number of difficulties, in particular, word spacings and differences between glyphs are not always clear-cut. I am not entirely happy with the result, but since this kind of work is rather boring, in particular when you have already done several "passes" over the text, I decided to stop here.

Currently, I have transcribed ten pages: a little more than 13000 characters, 2000 words.
I have processed the transcription in order to create a "clean" version, where I removed upper-case characters, only considered spaces according to the manuscript, removed punctuation, joined the two halves of words that were hyphened at line breaks.


These are the non-alphabetic symbols that I used:

^  missing ‘r’ - usually as superscript mirriored ‘c’ but also other shapes   
?  hooked curl, similar to ^ (typically ‘tr’ ligature)   
~  'wavy’ macron   
_  macron   
;  comma-like truncated word ending   
"  superscript ‘v’ (typically in natVra)   
)  curl macron ligature   
*  beginning of superscript word ending   
&  loop or vertical bar marking truncation (e.g. -2& -rum, -t& -tis)   
#  double macron (for double ‘s’ in ‘esse’)   
+  crossed p for ‘per’   
|  long-s    
1  superscript ‘i’   
2  both 'et' and round-r   
3 for final ‘m’ but also |3 for 'sed'
4  arabic numeral (similar to l in this ms)   
9 con- / -us   

These symbols appear in the transcription, but were removed from the clean version:

basically the only punctuation used in the ms
space missing in manuscript    
%  space added in manuscript (space in the clean version)   
=  hyphened word at line break  (rejoined in the clean version)
: hyphen missing in manuscript (still split in the clean version)   
{ } notes   
< >  text deleted / corrected in ms

For comparison, I have also edited You are not allowed to view links. Register or Login to view. (Bonaventura Opera 08  -1898- Opuscula Varia Ad Theologiam Mysticam Et Res Ordinis Fratrum Minorum Spectantia) in order to remove the greatest differences from the manuscript version (words were occasionally added / deleted / moved in the two texts). I have "cleaned" the printed edition similarly to what I did with the transcription.
Here is an example that illustrates the usage of most symbols (before "cleaning"):
   

I have overlaid TTR results to one of the plots You are not allowed to view links. Register or Login to view.. For reference, I have included two VMS sections and two "extreme" Latin texts. As one could expect, the printed edition of Bonaventura has a considerably lower TTR than the transcription, yet the difference is not as large as that between different languages, nor as that between very different styles in the same language. With respect to W=1000, TTR differences are:
transcription-printed = 0.11
Virgil-Vulgate = 0.27
VMS_Q20-VMS_Q13 = 0.14

   

In the plot on the right, I have shifted the three transcription samples (Bonaventura, Q20, Q13) so that the transcription of Bonaventura matches the printed edition. The result is that VMS samples, which already were at the bottom of the Latin cloud, are moved towards the bottom of the whole cloud, near English, for instance.
My impression is that this correction is likely to be excessive: Voynichese appears to be more regular than the cursive script of Sang.942 and the number of glyphs appear to be smaller than in a regular alphabet, while an abbreviation system adds to the alphabet. Also, some of the additional variability may be due to my own inconsistencies in the transcription.
Anyway, I think we should consider the possibility that TTR measured on printed texts is reduced by the "normalization" process that takes place when abbreviations and scribal inconsistencies are removed.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 02-12-2019

That's a lot of work, Marco. I commend you on the effort and the information.


RE: Experimental replica of VMS properties with a given corpus - Aga Tentakulus - 02-12-2019

Here's the sample I posted. This was generated in about 15 minutes, but I could probably get closer to Voynichese if I took more time:

[Image: attachment.php?aid=2722]

There is one thing in particular that I was not able to recreate in such a short time that gives away that it is not perfect Voynichese, but since it is natural language and it can be readily decrypted, perhaps it's good enough for the purposes of example.
[/quote]

If you had twins now, it'd be almost perfect.


RE: Experimental replica of VMS properties with a given corpus - -JKP- - 02-12-2019

Just glancing at it, there are three kinds of things that jumped out at me as being not quite Voynichese, but overall, it is quite close. Did you do a verbose cipher?


RE: Experimental replica of VMS properties with a given corpus - Aga Tentakulus - 02-12-2019

        We know one-to-one decryption isn't working.
Not even if I disassemble a combination again.
The Templar code already contains twins. Same characters with a mark. In this case it is a point.
The form of a glyph should give the impression of a foreign language.
A twin is supposed to cause confusion.
I think the VM text is too monotonous for what it pretends to be.
I think I'm dealing with twins, apart from pre- and final syllables, as well as combinations.
This is also the reason why I would never work with EVA code.
A possible error source of 30%+ is too big for me.
Example VM: