I'm not sure how many members of this forum are comfortable working with Python scripts, but I've just uploaded a version of the code I've been using in case anyone would like to use or adapt it for further experiments (or just look it over, for that matter).
You are not allowed to view links.
Register or
Login to view.
There are two scripts in the zip folder, one that I used for pre-processing the ZL transcription, and then another script that generates images based on it. I think the only dependency that wouldn't come by default with something like Anaconda is OpenCV.
I'm afraid I haven't gone to the trouble of creating any kind of GUI or command-line argument parser. Instead, all the various parameters are laid out in lines 4-56 and need to be set there by editing the script itself. I'll admit that's not ideal, and it can be hard to keep track of all the variables (in fact, I just discovered to my chagrin that I'd inadvertently left a "switch" on limiting analysis to recto pages when I generated the images to go with my forum posts of December 8th and 14th -- oops). I hope my explanations for what the individual "switches" do will be clear, but here are configurations for a few common scenarios in case the overarching logic isn't:
To track distributions of discrete vords:
ignore_spacing=0; by_vord=1; vord_position=0
To track distributions of glyphs, bigrams, etc., irrespective of spacing:
ignore_spacing=1; by_vord=0
To track distributions of exact strings including spaces, such as [or.d]:
ignore_spacing=0; by_vord=0
To track distributions of prefixes (disregarding cases in isolation as self-standing vords):
ignore_spacing=0; by_vord=1; vord_position=1; exclude_total_match=1
I've tried to comment the script enough for someone else to follow what it does and make changes if wanted. I'm sure it's less efficient and streamlined than it could be, and I'm a bit self-conscious about releasing it as a result -- but at least it seems to work, which is probably the important thing.
I agree that separate rightwardness and downwardness graphs as shown on the right in Obelus's displays would be a nice addition.
I'm doubtful about lines corresponding to grammatical sentences for the reasons Marco listed, as well as because of differences in line length that seem attributable to foldout page width (longer) or illustrations (shorter). Why should the sentences on pages where a plant illustration happens to fill up the whole right-hand side of the page be consistently shorter than usual?
But even so, looking for sentence-like patterns in lines might still be just as productive as looking for word-like patterns in vords. I don't think anyone would say the latter approach hasn't been illuminating regardless of whether vords actually correspond to words or not.
As an alternative scenario, it's not hard to imagine ways of encoding lines that would show forms of left-to-right variation scalable to any length and independent of content. Take the following text copied in a clockwise spiral with spaces preserved after the letters that precede them, an asterisk inserted at each turn, and empty cells ignored:
THIS I*HN A*EG*MIS *THE *ES*SE*D*D
Regardless of how long a line is, the average profile of its "words" will change steadily from left to right, with [*] becoming increasingly prevalent and all other glyphs becoming less prevalent.
THIS IS A*HEUER *ENO *R T SIN*OTHER *MBHE*GNOL *A SD*DEN *E T*AR*A*G
The first line of a paragraph might need to include additional signposts to initiate the path, if there are multiple possibilities, leading to the presence there of glyphs that are rare (though not forbidden) anywhere else.
→THIS IS ↓RE RH←PARGA↑OST→HE FI↓NA←PA F↑ T →LI
I don't at all mean to suggest this is actually how Voynichese works, and I'm offering it only as one way paragraph and line patterns could conceivably arise as a byproduct of a simple kind of cipher, rather than as a consequence of grammatical or narrative structure. Which is to say, I don't think anything's off the table.