Options

Analysis of Voynich Average Cross-Entropy Loss

Index
Analysis of Voynich Average Cross-Entropy Loss
Analysis of Voynich Average Cross-Entropy Loss

Trithemius > 6 hours ago

Hey everyone,

I'm new to the forum, so I'm posting some of my findings here in a bit of a data-dump, sorry about that.

Here's the experiment:

1. I looked at the word-length frequency and character-frequency of the Voynich and created a random corpus of equal length which has the same word length and character frequency histograms, but are effectively random. So, an exerpt of my random Voynich simulant looks like this:



whereas, as we all know, real Voynich looks like this



My random Voynich simulant has the same word length frequency and same character frequency as the real Voynich, but it's gibberish.

As a control, I used the latin bible for comparison. Here's my random biblical latin simulant excerpt

…snfeluvu tddr eeee nuqnc tt osiem ili clac udieouenl eaula tnmeu…

and here's real latin

...in principio creavit Deus caelum et terram terra autem erat…

2. Then, I trained an LSTM on the random corpora (both in the control and voynich cases). This LSTM therefore learned to predict the character and word length frequency that we see in the actual Voynich (or bible in the control case).

3. Then, I exposed the model to the actual Voynich and did a continuous character-by-character plotting of the degree to which the model was surprised by the next word it saw after being trained on randomness. So, the higher the orange line, the higher the apparent order in the text.

Here's the bibical control



And here's the Voynich



Interestingly, they both share comparable levels of order relative to their random controls. In the Voynich case, there are pretty clear spikes in orderedness around f57, which is not surprising as that concentric-circle diagram has the same sequence repeated 4 times on the 3rd ring from the inside.

I'd be curious to hear anyone's thoughts on this or if there are any other ideas for experiments like this.
RE: Analysis of Voynich Average Cross-Entropy Loss

quimqu > 6 hours ago

Interesting. It is logical, as you have "only" the same word-length frequency and character-frequency and this creates words without any structure. The Voynich language has a very low entropy, so this means the next character is very constricted by the curent one. In your output, the next character it is absolutely random (you can calculate entropy and it will be really high). So, the Voynich has "words" or things that look like words, and your output has ... well, random characters...
Next Oldest Next Newest

Analysis of Voynich Average Cross-Entropy Loss

Index

Analysis of Voynich Average Cross-Entropy Loss

RE: Analysis of Voynich Average Cross-Entropy Loss