04-03-2026, 10:40 AM
Hello folks
I recently put together a little tool to aid with analysing the text - a little project I'd wanted to do for a long while, and finally leveraged some AI coding tools to speed up the process. The tool inputs the ZL transcription - along with configuration as to how you want to pre-process the glyphs - and outputs a ton of aggregations for interrogation.
Tool: You are not allowed to view links. Register or Login to view.
Code & docs: You are not allowed to view links. Register or Login to view.
Currently, it shows the following:
* Basic transition probabilities (including word/line/paragraph/page boundaries)
* Ngram analyser (1-grams, 2-grams, 3-grams)
* Word position preference of glyphs
* Page position preference of glyphs (by character number / line number)
* Page position preference of glyphs (by physical position - data from Voynichese.com)
While most of this has been done before, the more novel part of the tool is that it additionally produces these visualises for a large number of subsets of the manuscript - by language (A, B); by scribe (1,2,3,4,5); by illustration. The cool part is then you can compare (e.g., Currier A vs B; or Hand 2 vs Hand 4) charts side-by-side, and even "diff" them to see at a glance where the large differences lie. This was largely as a response to the common critique of analyses where interesting signal may be lost by aggregating over non homogeneous pages of text.
If one was keen to change the preprocessing applied - while this can't be done in the hosted web app itself, it can be done by cloning the repo & making the changes. I may be happy to take requests here. There are a ton of other things it could show also which could be added to future releases - entropy, LAAFU stuff, word boundaries, glyph equivalence. Again, not that these things haven't been done before, but I believe the ability to interact with them and quickly & visually compare between sections is the most valuable.
Do let me know if it's useful and if you find any bugs, or have any suggestions.
This was inspired by writings by Nick Pelling, Rene Zandbergen, Patrick Feaster, Sean Palmer, Emma May Smith, Marco Ponzi, and many others.
I recently put together a little tool to aid with analysing the text - a little project I'd wanted to do for a long while, and finally leveraged some AI coding tools to speed up the process. The tool inputs the ZL transcription - along with configuration as to how you want to pre-process the glyphs - and outputs a ton of aggregations for interrogation.
Tool: You are not allowed to view links. Register or Login to view.
Code & docs: You are not allowed to view links. Register or Login to view.
Currently, it shows the following:
* Basic transition probabilities (including word/line/paragraph/page boundaries)
* Ngram analyser (1-grams, 2-grams, 3-grams)
* Word position preference of glyphs
* Page position preference of glyphs (by character number / line number)
* Page position preference of glyphs (by physical position - data from Voynichese.com)
While most of this has been done before, the more novel part of the tool is that it additionally produces these visualises for a large number of subsets of the manuscript - by language (A, B); by scribe (1,2,3,4,5); by illustration. The cool part is then you can compare (e.g., Currier A vs B; or Hand 2 vs Hand 4) charts side-by-side, and even "diff" them to see at a glance where the large differences lie. This was largely as a response to the common critique of analyses where interesting signal may be lost by aggregating over non homogeneous pages of text.
If one was keen to change the preprocessing applied - while this can't be done in the hosted web app itself, it can be done by cloning the repo & making the changes. I may be happy to take requests here. There are a ton of other things it could show also which could be added to future releases - entropy, LAAFU stuff, word boundaries, glyph equivalence. Again, not that these things haven't been done before, but I believe the ability to interact with them and quickly & visually compare between sections is the most valuable.
Do let me know if it's useful and if you find any bugs, or have any suggestions.
This was inspired by writings by Nick Pelling, Rene Zandbergen, Patrick Feaster, Sean Palmer, Emma May Smith, Marco Ponzi, and many others.