The Voynich Ninja

Full Version: The oddities of the bigram "ed" pt. 4 : The Chunking of Scribes
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I understand the situation.
If you are using the transliteration files in the IVTFF format, you can get all the information you need from the variables ($I $Q $H etc.) in the page headers.
Among others:
$I gives the illustration type
$Q the quire letter, where A=1 and T=20, and indeed I am skipping 16 and 18.
$H the hand according to Lisa Fagin Davis (1-5)
$B a number for the bibolio/sheet in the quire.
By concatenating the quire letter and the bifolio number, you get a 2-char ID for each sheet.
If your chart are correct then it is amazing how consistent these guys were. Scribe 1 made some basis, scribe 2 deviated from it a bit, scribe 3 deviated even more but exactly in the same way and finally we have scribe 5 who is Scribe 2 on strong steroids but with exactly the same profile.

If your charts are correct then it is amazing how consistent these guys were. Scribe 1 made some basis, scribe 2 deviated from it a bit, scribe 3 deviated even more but exactly in the same way and finally we have scribe 5 who is Scribe 2 on strong steroids but with exactly the same profile.
(29-05-2026, 03:35 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I understand the situation.
If you are using the transliteration files in the IVTFF format, you can get all the information you need from the variables ($I $Q $H etc.) in the page headers.
Among others:
$I gives the illustration type
$Q the quire letter, where A=1 and T=20, and indeed I am skipping 16 and 18.
$H the hand according to Lisa Fagin Davis (1-5)
$B a number for the bibolio/sheet in the quire.
By concatenating the quire letter and the bifolio number, you get a 2-char ID for each sheet.

I started with the original Takahashi years ago it's still my default go-to. And I've downloaded all of the transcriptions from your site.  The problem that I ran into is that having to create multiple parser rules to handle the different tags like <!plant> and <-> and conditional letters meant that every time I wanted to test an idea, the parser had a bunch to chew through.  so, what I did was loaded up every transcription into one big json.  I then wrote a python script that would allow me to select and export any transcription as a single transcription json.  And because I work with just raw text and don't need tags or splats, I have it strip all of those, plus the "." space out so that each line becomes nothing but pure text.  I've been working with json files for years and when doing something like a flat-file, that's about the fastest way to get data into and out of a script without creating a full database... which I've also done in MySQL.  So, I basically use 2 files.  The json transcription and a scribes/quires json so that no matter which transcription I'm using, I can reference sheets, folios, hands, etc.
(29-05-2026, 03:54 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.If your chart are correct then it is amazing how consistent these guys were. Scribe 1 made some basis, scribe 2 deviated from it a bit, scribe 3 deviated even more but exactly in the same way and finally we have scribe 5 who is Scribe 2 on strong steroids but with exactly the same profile.

And that's what I'm seeing.  But, those charts are I think the top 30 n-grams so that may not be the rule for all of them.  And there were a bunch of single character "chunks" it couldn't classify which is why I'm not yet trying to build a generator out of that data.  Most of that was basically noise compared to the big chunks though.
Pages: 1 2