The Voynich Ninja

All kinds of graphs related to Voynichese will be collected here, when possible accompanied by explanation from the maker.
If you've seen or made an interesting graph which should be included, you can submit it on another subforum.
Don't forget to include author and/or original location.

Julian Bunn posted a graph which maps glyph frequencies by folio. Red dots are Currier A, Blue dots are Currier B. For the technical details I refer to his blog: You are not allowed to view links. Register or Login to view.

[attachment=1839]

And from a different perspective:
[attachment=1840]

As Julian notes, "It’s clear that the red and blue are well separated, reinforcing Currier’s assignments. Thus this is independent support of Currier’s theory."

Graph You are not allowed to view links. Register or Login to view. by Marco, with his explanation:

Quote:The two attached histograms are based on the length of word occurrences (not word "types" as in Rene's graph discussed above).

Latin has a spike for words 2 characters long. These correspond to a number of hugely frequent particles like prepositions (in, ex, ab, de...), conjunctions (et, ac...), pronouns (id, tu, me...).
Latin also has a long tail of long words, which of course get less frequent with the growing of length. In the vulgate, 9% of the words are 10 or more characters long.

Voynichese has a simpler distribution, with a single peak corresponding to length 5 (corresponding to the central, but secondary, peak for Latin). Long words are rarer: less than 1% are 10 or more characters long.

The mapping from Voynichese to Latin should stretch the histogram in two opposite directions, possibly by making short words shorter and long words longer.

You are not allowed to view links. Register or Login to view. by Marco:

Quote:It is very much simplified with respect to the graphs posted by VViews: I have only produced five-bars histograms (first bar is beginning of word, last bar is end of word). The colouring of the graphs was done manually (more frequent letters are darker).
This is based on unique words in Takahashi’s transcription. I have applied the following substitutions:
ckh -> K
cth -> T
cfh -> F
cph -> P
ch -> C
sh -> S
iin -> M
in -> N
eee -> W
ee -> E
I only include plots for the 25 more frequent “characters”.
The process was rather complex and (as always) I don't exclude I have made errors.

Davidsch shared this table, which compares various stats for over 70 known ciphers: You are not allowed to view links. Register or Login to view.
The VM stands out in a number of ways, not in the least by its length. Only the You are not allowed to view links. Register or Login to view. comes close, all others are 1% or less of the VM's length.

Another graph by Marco. For the full explanation, please see You are not allowed to view links. Register or Login to view..

(29-01-2018, 09:12 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.These graphs represent the expected (red) and actual (green) counts for consecutive words in which the last letter of the first word is the same as the first letter of the second word. For instance (where '.' represent word boundaries):

but.these
others.said
seventh.hour

[...]

In my opinion, this evidence (if confirmed) could support the word-boundaries transformations discussed by You are not allowed to view links. Register or Login to view..

It is clear that, if what we observe in the VMS is due to euphonic transformations, these are quite different from those that happen in Latin languages. In these languages, transformations are limited to short and frequent words (mostly prepositions and articles) and typically affect the end of the words. The phenomena discussed by Emma affect longer words and mostly seem to happen at the beginning of words (I am thinking in particular of the You are not allowed to view links. Register or Login to view. on the basis of the ending of the preceding word).

David Jackson posted some graphs demonstrating that Voynichese obeys Heap's law in You are not allowed to view links. Register or Login to view.. Please refer to the original thread for his full explanation.

Quote:In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation).

Heaps' law means that as more instance text is gathered, there will be diminishing returns in terms of discovery of the full vocabulary from which the distinct terms are drawn.

Quote:These rapid tests should not be relied upon for any real conclusions, but they do seem to point towards Voynichese tokens having unique individual identities. There is more work to be done in analysing random and cipher texts, but I hope the tools I include in this post may help other researchers carry out their own tests.

Koen G

Koen G

Koen G

Koen G

Koen G

Koen G

Koen G