Hey everyone,
I'm new to the forum, so I'm posting some of my findings here in a bit of a data-dump, sorry about that.
Here's the experiment:
1. I looked at the word-length frequency and character-frequency of the Voynich and created a random corpus of equal length which has the same word length and character frequency histograms, but are effectively random. So, an exerpt of my random Voynich simulant looks like this:
whereas, as we all know, real Voynich looks like this
My random Voynich simulant has the same word length frequency and same character frequency as the real Voynich, but it's gibberish.
As a control, I used the latin bible for comparison. Here's my random biblical latin simulant excerpt
…snfeluvu tddr eeee nuqnc tt osiem ili clac udieouenl eaula tnmeu…
and here's real latin
...in principio creavit Deus caelum et terram terra autem erat…
2. Then, I trained an LSTM on the random corpora (both in the control and voynich cases). This LSTM therefore learned to predict the character and word length frequency that we see in the actual Voynich (or bible in the control case).
3. Then, I exposed the model to the actual Voynich and did a continuous character-by-character plotting of the degree to which the model was surprised by the next word it saw after being trained on randomness. So, the higher the orange line, the higher the apparent order in the text.
Here's the bibical control
And here's the Voynich
Interestingly, they both share comparable levels of order relative to their random controls. In the Voynich case, there are pretty clear spikes in orderedness around f57, which is not surprising as that concentric-circle diagram has the same sequence repeated 4 times on the 3rd ring from the inside.
I'd be curious to hear anyone's thoughts on this or if there are any other ideas for experiments like this.